That’s a hard question for me to answer, because I have a real vivid inside-view picture of researchers eventually building AGI via the “brain-like” route, and what the resulting AGI would look like, whereas when I try to imagine other R&D routes to AGI, I can’t, except by imagining that future researchers will converge towards the brain-like path. :-P
In particular:
I think a model trained purely on self-supervised learning (not RL) would be safer than brain-like AGI. But I don’t think a model trained purely on self-supervised learning would be “AGI” in the first place. (For various reasons, one of which is the discussion of “RL-on-thoughts” here.) And those two beliefs are very related!! So then I do Murphyjitsu by saying to myself: OK but if I’m wrong, and self-supervised learning did scale to AGI, how did that happen? Then I imagine future models acquiring, umm, “agency”, either by future programmers explicitly incorporating RL etc. deeply into the training / architecture, or else by agency emerging somehow e.g. because it’s “simulating” agential humans., and either of those brings us much closer to brain-like AGI, and thus I stop feeling like it’s safer than brain-like AGI.
I do think the Risks-From-Learned-Optimization model could in principle create AGI (obviously, that’s how evolution made humans). But I don’t think it would happen, for reasons in Post 8. If it did happen, the only way I can concretely imagine it happening is that the inner model is a brain-like AGI. In that case, I think it would be worse than making brain-like AGI directly, for reasons in §8.3.3.
How optimistic should we be about alignment & safety for brain-like-AGI, relative to prosaic AGI?
That’s a hard question for me to answer, because I have a real vivid inside-view picture of researchers eventually building AGI via the “brain-like” route, and what the resulting AGI would look like, whereas when I try to imagine other R&D routes to AGI, I can’t, except by imagining that future researchers will converge towards the brain-like path. :-P
In particular:
I think a model trained purely on self-supervised learning (not RL) would be safer than brain-like AGI. But I don’t think a model trained purely on self-supervised learning would be “AGI” in the first place. (For various reasons, one of which is the discussion of “RL-on-thoughts” here.) And those two beliefs are very related!! So then I do Murphyjitsu by saying to myself: OK but if I’m wrong, and self-supervised learning did scale to AGI, how did that happen? Then I imagine future models acquiring, umm, “agency”, either by future programmers explicitly incorporating RL etc. deeply into the training / architecture, or else by agency emerging somehow e.g. because it’s “simulating” agential humans., and either of those brings us much closer to brain-like AGI, and thus I stop feeling like it’s safer than brain-like AGI.
I do think the Risks-From-Learned-Optimization model could in principle create AGI (obviously, that’s how evolution made humans). But I don’t think it would happen, for reasons in Post 8. If it did happen, the only way I can concretely imagine it happening is that the inner model is a brain-like AGI. In that case, I think it would be worse than making brain-like AGI directly, for reasons in §8.3.3.