ryan_greenblatt comments on Why Don’t We Just… Shoggoth+Face+Paraphraser?

ryan_greenblatt 20 Nov 2024 19:41 UTC
8 points
5
Testing sounds great though I’d note that the way I’d approach testing is to first construct a general test bed that produces problematic behavior through training incentives for o1-like models. (Possibly building on deepseek’s version of o1.) Then I’d move to trying a bunch of stuff in this setting.

(I assume you agree, but thought this would be worth emphasizing to third parties.)