I think we should just all give up on the word “scalable oversight”; it is used in many conflicting ways, sadly. I mostly talk about “recursive techniques for reward generation”.
The idea I associate with scalable oversight is weaker models overseeing stronger models (probably) combined with safety-by-debate. Is that the same or different from ” recursive techniques for reward generation” ?
Currently, this general class of ideas seems to me the most promising avenue for achieving alignment for vastly superhuman AI (′ superintelligence’ )..
I think we should just all give up on the word “scalable oversight”; it is used in many conflicting ways, sadly. I mostly talk about “recursive techniques for reward generation”.
The idea I associate with scalable oversight is weaker models overseeing stronger models (probably) combined with safety-by-debate. Is that the same or different from ” recursive techniques for reward generation” ?
Currently, this general class of ideas seems to me the most promising avenue for achieving alignment for vastly superhuman AI (′ superintelligence’ )..