Alex_Altair comments on Shallow review of technical AI safety, 2024

Alex_Altair 9 Jan 2025 21:19 UTC
LW: 4 AF: 2
0
AF
Some small corrections/additions to my section (“Altair agent foundations”). I’m currently calling it “Dovetail research”. That’s not publicly written anywhere yet, but if it were listed as that here, it might help people who are searching for it later this year.
Which orthodox alignment problems could it help with?: 9. Humans cannot be first-class parties to a superintelligent value handshake
I wouldn’t put number 9. Not intended to “solve” most of these problems, but is intended to help make progress on understanding the nature of the problems through formalization, so that they can be avoided or postponed, or more effectively solved by other research agenda.
Target case: worst-case
definitely not worst-case, more like pessimistic-case
Some names: Alex Altair, Alfred Harwood, Daniel C, Dalcy K
Add “José Pedro Faustino”
Estimated # FTEs: 1-10
I’d call it 2, averaged throughout 2024.
Some outputs in 2024: mostly exposition but it’s early days
Basically right; I’d add this post and this post.
- technicalities 11 Jan 2025 15:02 UTC
  2 points
  0
  Parent
  Done, thanks!