What do you think was underrated about it? I think when I read it I have some sort of “yeah, this makes sense” reaction but am not “wow’d” by it.
It seems like the deeper challenge is figuring out how to align incentives. Can we find a structure where labs want to EG give white-box access to a bunch of external researchers and give them a long time to red-team models while somehow also maintaining the independence of the white-box auditors? How do you avoid industry capture?
Same kinds of challenges come up with safety research– how do you give labs the incentive to publish safety research that makes their product or their approach look bad? How do you avoid publication bias and phacking-type concerns?
I don’t think your post is obligated to get into those concerns, but perhaps a post that grappled with those concerns would be something I’d be “wow’d” by, if that makes sense.
What do you think was underrated about it? I think when I read it I have some sort of “yeah, this makes sense” reaction but am not “wow’d” by it.
It seems like the deeper challenge is figuring out how to align incentives. Can we find a structure where labs want to EG give white-box access to a bunch of external researchers and give them a long time to red-team models while somehow also maintaining the independence of the white-box auditors? How do you avoid industry capture?
Same kinds of challenges come up with safety research– how do you give labs the incentive to publish safety research that makes their product or their approach look bad? How do you avoid publication bias and phacking-type concerns?
I don’t think your post is obligated to get into those concerns, but perhaps a post that grappled with those concerns would be something I’d be “wow’d” by, if that makes sense.