Zach Stein-Perlman comments on [Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA

Zach Stein-Perlman 1 Jul 2022 4:00 UTC
LW: 3 AF: 2
AF
How optimistic should we be about alignment & safety for brain-like-AGI, relative to prosaic AGI?
- Steven Byrnes 5 Jul 2022 19:38 UTC
  LW: 4 AF: 3
  AF Parent
  That’s a hard question for me to answer, because I have a real vivid inside-view picture of researchers eventually building AGI via the “brain-like” route, and what the resulting AGI would look like, whereas when I try to imagine other R&D routes to AGI, I can’t, except by imagining that future researchers will converge towards the brain-like path. :-P
  In particular:
  - I think a model trained purely on self-supervised learning (not RL) would be safer than brain-like AGI. But I don’t think a model trained purely on self-supervised learning would be “AGI” in the first place. (For various reasons, one of which is the discussion of “RL-on-thoughts” here.) And those two beliefs are very related!! So then I do Murphyjitsu by saying to myself: OK but if I’m wrong, and self-supervised learning did scale to AGI, how did that happen? Then I imagine future models acquiring, umm, “agency”, either by future programmers explicitly incorporating RL etc. deeply into the training / architecture, or else by agency emerging somehow e.g. because it’s “simulating” agential humans., and either of those brings us much closer to brain-like AGI, and thus I stop feeling like it’s safer than brain-like AGI.
  - I do think the Risks-From-Learned-Optimization model could in principle create AGI (obviously, that’s how evolution made humans). But I don’t think it would happen, for reasons in Post 8. If it did happen, the only way I can concretely imagine it happening is that the inner model is a brain-like AGI. In that case, I think it would be worse than making brain-like AGI directly, for reasons in §8.3.3.