News
Super excited that our second reward model evaluation is out. It's substantially harder, much cleaner, and well correlated with downstream PPO/BoN sampling. Happy hillclimbing!
Results that may be inaccessible to you are currently showing.
Hide inaccessible results