I wouldn’t call it an infohazard; generally that refers to information that’s harmful simply to know, rather than because it might e.g. advance timelines.
There are arguments to be made about how much overlap there is between capabilities research and alignment research, but I think by default most things that would be classified as capabilities research do not meaningful advance AI alignment. For that to be true, you’d need >50% of all capabilities work to advance alignment “by default” (and without requiring any active effort to “translate” that capabilities work into something helpful for alignment), since the relative levels of effort invested are so skewed to capabilities. See also https://www.lesswrong.com/tag/differential-intellectual-progress.
I think there’s probably value in being on an alignment team at a “capabilities” org, or even embedded in a capabilities team if the role itself doesn’t involve work that contributes to capabilities (either via first-order or second-order effects).
I think that the “in the room” argument might start to make sense when there’s actually a plan for alignment that’s in a sufficiently ready state to be operationalized. AFAICT nobody has such a plan yet. For that reason, I think maintaining & improving lines of communication is very important, but if I had to guess, I’d say you could get most of the anticipated benefit there without directly doing capabilities work.
Yes this does seem to be happening. It also appears to be unavoidable.
Our state of knowledge is nowhere near being able to guarantee that any AGI we develop will not kill us all. We are already developing AI that is superhuman in increasingly many aspects. Those who are actively working right now to bring the rest of the capabilities up to and above human levels obviously can’t be sufficiently concerned, or they would not be doing it.
I wouldn’t call it an infohazard; generally that refers to information that’s harmful simply to know, rather than because it might e.g. advance timelines.
There are arguments to be made about how much overlap there is between capabilities research and alignment research, but I think by default most things that would be classified as capabilities research do not meaningful advance AI alignment. For that to be true, you’d need >50% of all capabilities work to advance alignment “by default” (and without requiring any active effort to “translate” that capabilities work into something helpful for alignment), since the relative levels of effort invested are so skewed to capabilities. See also https://www.lesswrong.com/tag/differential-intellectual-progress.
I think there’s probably value in being on an alignment team at a “capabilities” org, or even embedded in a capabilities team if the role itself doesn’t involve work that contributes to capabilities (either via first-order or second-order effects).
I think that the “in the room” argument might start to make sense when there’s actually a plan for alignment that’s in a sufficiently ready state to be operationalized. AFAICT nobody has such a plan yet. For that reason, I think maintaining & improving lines of communication is very important, but if I had to guess, I’d say you could get most of the anticipated benefit there without directly doing capabilities work.
Yes this does seem to be happening. It also appears to be unavoidable.
Our state of knowledge is nowhere near being able to guarantee that any AGI we develop will not kill us all. We are already developing AI that is superhuman in increasingly many aspects. Those who are actively working right now to bring the rest of the capabilities up to and above human levels obviously can’t be sufficiently concerned, or they would not be doing it.