Jonathan Claybrough comments on Dialogue introduction to Singular Learning Theory

Jonathan Claybrough 4 Oct 2024 7:39 UTC
1 point
0
I don’t strongly disagree but do weakly disagree on some points so I guess I’ll answer

Re first- if you buy into automated alignment work by human level AGI, then trying to align ASI now seems less worth it. The strongest counterargument to this I see is that “human level AGI” is impossible to get with our current understanding, as it will be superhuman in some things and weirdly bad at others.

Re second- disagreements might be nitpicking on “few other approaches” vs “few currently pursued approaches”. There are probably a bunch of things that would allow fundamental understanding if they panned out (various agent foundations agendas, probably safe ai agendas like davidad’s), though one can argue they won’t apply to deep learning or are less promising to explore than SLT
- Davidmanheim 6 Oct 2024 13:37 UTC
  3 points
  0
  Parent
  In addition to the point that current models are already strongly superhuman in most ways, I think that if you buy the idea that we’ll be able to do automated alignment of ASI, you’ll still need some reliable approach to “manual” alignment of current systems. We’re already far past the point where we can robustly verify LLMs claims’ or reasoning in a robust fashion outside of narrow domains like programming and math.
  
  But on point two, I strongly agree that Agent foundations and Davidad’s agendas are also worth pursuing. (And in a sane world, we should have tens or hundreds of millions of dollars in funding for each every year.) Instead, it looks like we have Davidad’s ARIA funding, Jaan Talinn and LTFF funding some agent foundations and SLT work, and that’s basically it. And MIRI abandoned agent foundations, while Openphil isn’t, it seems, putting money or effort into them.