Wow this looks great! The alignment tax for this is (inference on whole dataset with previous gen model), which is like 10%, and can be much lower if you just use a smaller classifier. Seems like an important part of near term alignment!
Wow this looks great! The alignment tax for this is (inference on whole dataset with previous gen model), which is like 10%, and can be much lower if you just use a smaller classifier. Seems like an important part of near term alignment!