I had cached impressions that AI safety people were interested in auditing, ELK, and scalable oversight.
A few AIS people who volunteered to give feedback before the workshop (so biased towards people who were interested in the title) each named a unique top choice: scientific understanding (specifically threat models), model editing, and auditing (so 2⁄3 were unexpected for me).
During the workshop, attendees (again, biased, as they self-selected into the session) expressed excitement most about auditing, unlearning, MAD, ELK, and general scientific understanding. I was surprised at the interest in MAD and ELK, I thought there would be more skepticism around those; though I can see how they might be aesthetically appealing for the slightly more academic audience.
Yeah, I think ELK is surprisingly popular in my experience amongst academics, though they tend to frame it in terms of partial observability (as opposed to the measurement tampering framing I often hear EA/AIS people use).
I had cached impressions that AI safety people were interested in auditing, ELK, and scalable oversight.
A few AIS people who volunteered to give feedback before the workshop (so biased towards people who were interested in the title) each named a unique top choice: scientific understanding (specifically threat models), model editing, and auditing (so 2⁄3 were unexpected for me).
During the workshop, attendees (again, biased, as they self-selected into the session) expressed excitement most about auditing, unlearning, MAD, ELK, and general scientific understanding. I was surprised at the interest in MAD and ELK, I thought there would be more skepticism around those; though I can see how they might be aesthetically appealing for the slightly more academic audience.
Thanks!
Yeah, I think ELK is surprisingly popular in my experience amongst academics, though they tend to frame it in terms of partial observability (as opposed to the measurement tampering framing I often hear EA/AIS people use).