Thinking about evals as a research fellow at Pivotal. Statistics student at ETH, math graduate from Göttingen. Miscellaneous interests include Japanese linguistics, quantitative humanities and philology.
Lennart Finke
Karma: 45
Thanks, fixed!
Agreed, although that it turn makes me wonder why it does perform a bit better than random. Maybe there is some nondeclarative knowledge about the image, or some blurred position information? I might test next how much vision is bottlenecking here by providing a text representation of the grid, as in Ryan Greenblatt’s work on ARC-AGI.
Good point, and I was conflicted whether to put my thoughts about this at the end of the post. My best theory is that increased persuasion abilities looks something like “totalitarian government agents doing solid scaffolding on open-source models to DM people on Facebook”. We will see that persuasive agents get better, but not know why and how. As stated in the introduction, persuasion detection is dangerous, but one of the few capabilities that could also be used defensively (i.e. detecting persuasion in an incoming email → displaying warning in UI and offer to rephrase).
In conclusion, definitely agree that we should consider closed-sourcing any improvements upon the above baseline and only show them to safety orgs instead. Some people at AISI I have talked to while working on persuasion are probably interested in this.