Buck comments on Research directions Open Phil wants to fund in technical AI safety

Buck 8 Feb 2025 16:46 UTC
LW: 20 AF: 12
9
AF
I think we should just all give up on the word “scalable oversight”; it is used in many conflicting ways, sadly. I mostly talk about “recursive techniques for reward generation”.
- Alexander Gietelink Oldenziel 8 Feb 2025 19:04 UTC
  4 points
  −1
  Parent
  The idea I associate with scalable oversight is weaker models overseeing stronger models (probably) combined with safety-by-debate. Is that the same or different from ” recursive techniques for reward generation” ?
  Currently, this general class of ideas seems to me the most promising avenue for achieving alignment for vastly superhuman AI (′ superintelligence’ )..