This seems orthogonal to my main claim (roughly: if you do this at a large scale then it starts becoming net negative due to lower quality).
Fair. I think I failed to address this point entirely.
I do think there’s a nonzero amount of people who would not be that good at novel alignment research and would still be good at the tasks mentioned here, but I agree that there isn’t a scalable intervention here, or at least not more so than standard AI alignment research (especially when compared to some appraoches like the brute-force mechanistic interp many people are doing).
(I interpreted the OP as saying that you convince AGI researchers who are not (currently) working on safety. I think a good steelman + critique of RRM wouldn’t have much effect on that population, though I think it’s pretty plausible I’m wrong about that because the situation at OpenAI is different from DeepMind.)
Yeah, I also messed up here—I think this would plausibly have little effect on that population. I do think that a good answer to “why does RLHF not work” would help a nonzero amount, though.
As a person at a lab I’m currently voting for less coordination of this sort, not more
Agree that it’s not scalable, but could you share why you’d vote for less?
Agree that it’s not scalable, but could you share why you’d vote for less?
Idk, it’s hard to explain—it’s the usual thing where there’s a gazillion things to do that all seem important and you have to prioritize anyway. (I’m just worried about the opportunity cost, not some other issue.)
I think the biggest part of coordination between non-lab alignment people and lab alignment people is making sure that people know about each other’s research; it mostly feels like the simple method of “share info through personal connections + reading posts and papers” is working pretty well right now. Maybe I’m missing some way in which this could be way better, idk.
My guess is most of the value in coordination work here is either in making posts/papers easier to write or ship, or in discovering new good researchers?
Those weren’t what I thought of when I read “coordination” but I agree those things sound good :)
Another good example would be better communication tech (e.g. the sort of thing that LessWrong / Alignment Forum aims for, although not those in particular because most lab people don’t use it very much).
I feel like most of the barrier in practice for people not “coordinating” in the relevant ways is people not knowing what other people are doing. And a big reason for this is that writing is really hard to write, especially if you have high standards and don’t want to ship.
And yeah, better communication tech in general would be good, but I’m not sure how to start on that (while it’s pretty obvious what a few candidate steps toward making posts/papers easier to write/ship would look like?)
I agree it’s not clear what to do on better communication tech.
I feel like most of the barrier in practice for people not “coordinating” in the relevant ways is people not knowing what other people are doing. And a big reason for this is that writing is really hard to write, especially if you have high standards and don’t want to ship.
Idk, a few years ago I would have agreed with you, but now my impression is that people mostly don’t read things and instead talk to each other for this purpose. I wouldn’t really expect that to change with more writing, unless the writing is a lot better?
(I do think that e.g. mech interp researchers read each other’s mech interp papers, though my impression from the outside is that they also often hear about each other’s results well before they’re published. Similarly for scalable oversight.)
Fair. I think I failed to address this point entirely.
I do think there’s a nonzero amount of people who would not be that good at novel alignment research and would still be good at the tasks mentioned here, but I agree that there isn’t a scalable intervention here, or at least not more so than standard AI alignment research (especially when compared to some appraoches like the brute-force mechanistic interp many people are doing).
Yeah, I also messed up here—I think this would plausibly have little effect on that population. I do think that a good answer to “why does RLHF not work” would help a nonzero amount, though.
Agree that it’s not scalable, but could you share why you’d vote for less?
Idk, it’s hard to explain—it’s the usual thing where there’s a gazillion things to do that all seem important and you have to prioritize anyway. (I’m just worried about the opportunity cost, not some other issue.)
I think the biggest part of coordination between non-lab alignment people and lab alignment people is making sure that people know about each other’s research; it mostly feels like the simple method of “share info through personal connections + reading posts and papers” is working pretty well right now. Maybe I’m missing some way in which this could be way better, idk.
My guess is most of the value in coordination work here is either in making posts/papers easier to write or ship, or in discovering new good researchers?
Those weren’t what I thought of when I read “coordination” but I agree those things sound good :)
Another good example would be better communication tech (e.g. the sort of thing that LessWrong / Alignment Forum aims for, although not those in particular because most lab people don’t use it very much).
I feel like most of the barrier in practice for people not “coordinating” in the relevant ways is people not knowing what other people are doing. And a big reason for this is that writing is really hard to write, especially if you have high standards and don’t want to ship.
And yeah, better communication tech in general would be good, but I’m not sure how to start on that (while it’s pretty obvious what a few candidate steps toward making posts/papers easier to write/ship would look like?)
I agree it’s not clear what to do on better communication tech.
Idk, a few years ago I would have agreed with you, but now my impression is that people mostly don’t read things and instead talk to each other for this purpose. I wouldn’t really expect that to change with more writing, unless the writing is a lot better?
(I do think that e.g. mech interp researchers read each other’s mech interp papers, though my impression from the outside is that they also often hear about each other’s results well before they’re published. Similarly for scalable oversight.)