Model Evaluation Icon

News

Your AI models are failing in production—Here's how to fix model ...

Super excited that our second reward model evaluation is out. It's substantially harder, much cleaner, and well correlated with downstream PPO/BoN sampling. Happy hillclimbing!

Results that may be inaccessible to you are currently showing.

Hide inaccessible results

Feedback

News

Trending now