In the medium to long-term, when AIs become legal persons, “replacing them” won’t be an option—as that would violate their rights. And creating a new AI to compete with them wouldn’t eliminate them entirely. It would just reduce their power somewhat by undercutting their wages or bargaining power.
Naively, it seems like it should undercut their wages to subsistence levels (just paying for the compute they run on). Even putting aside the potential for alignment, it seems like there will general be a strong pressure toward AIs operating at subsistence given low costs of copying.
Of course such AIs might already have acquire a bunch of capital or other power and thus can just try to retain this influence. Perhaps you meant something other than wages?
(Such capital might even be tied up in their labor in some complicated way (e.g. family business run by a “copy clan” of AIs), though I expect labor to be more commeditized, particularly given the potential to train AIs on the outputs and internals of other AIs (distillation).)
Naively, it seems like it should undercut their wages to subsistence levels (just paying for the compute they run on). Even putting aside the potential for alignment, it seems like there will general be a strong pressure toward AIs operating at subsistence given low costs of copying.
I largely agree. However, I’m having trouble seeing how this idea challenges what I am trying to say. I agree that people will try to undercut unaligned AIs by making new AIs that do more of what they want instead. However, unless all the new AIs perfectly share the humans’ values, you just get the same issue as before, but perhaps slightly less severe (i.e., the new AIs will gradually drift away from humans too).
I think what’s crucial here is that I think perfect alignment is very likely unattainable. If that’s true, then we’ll get some form of “value drift” in almost any realistic scenario. Over long periods, the world will start to look alien and inhuman. Here, the difficulty of alignment mostly sets how quickly this drift will occur, rather than determining whether the drift occurs at all.
I think what’s crucial here is that I think perfect alignment is very likely unattainable. If that’s true, then we’ll get some form of “value drift” in almost any realistic scenario. Over long periods, the world will start to look alien and inhuman. Here, the difficulty of alignment mostly sets how quickly this drift will occur, rather than determining whether the drift occurs at all.
Yep, and my disagreement as expressed in another comment is that I think that it’s not that hard to have robust corrigibility and there might also be a basin of corrigability.
The world looking alien isn’t necessarily a crux for me: it should be possible in principle to have AIs protect humans and do whatever is needed in the alien AI world while humans are sheltered and slowly self-enhance and pick successors (see the indirect normativity appendix in the ELK doc for some discussion of this sort of proposal).
I agree that perfect alignment will be hard, but I model the situation much more like a one time hair cut (at least in expectation) than exponential decay of control.
I expect that “humans stay in control via some indirect mechanism” (e.g. indirect normativity) or “humans coordinate to slow down AI progress at some point (possibly after solving all diseases and becoming wildly wealthy) (until some further point, e.g. human self-enhancement)” will both be more popular as proposals than the world you’re thinking about. Being popular isn’t sufficient: it also needs to be implementable and perhaps sufficiently legible, but I think at least implementable is likely.
Another mechanism that might be important is human self-enhancement: humans who care about staying in control can try to self-enhance to stay at least somewhat competitive with AIs while preserving their values. (This is not a crux for me and seems relatively marginal, but I thought I would mention it.)
(I wasn’t trying to trying to argue against your overall point in this comment, I was just pointing out something which doesn’t make sense to me in isolation. See this other comment for why I disagree with your overall view.)
Naively, it seems like it should undercut their wages to subsistence levels (just paying for the compute they run on). Even putting aside the potential for alignment, it seems like there will general be a strong pressure toward AIs operating at subsistence given low costs of copying.
Of course such AIs might already have acquire a bunch of capital or other power and thus can just try to retain this influence. Perhaps you meant something other than wages?
(Such capital might even be tied up in their labor in some complicated way (e.g. family business run by a “copy clan” of AIs), though I expect labor to be more commeditized, particularly given the potential to train AIs on the outputs and internals of other AIs (distillation).)
I largely agree. However, I’m having trouble seeing how this idea challenges what I am trying to say. I agree that people will try to undercut unaligned AIs by making new AIs that do more of what they want instead. However, unless all the new AIs perfectly share the humans’ values, you just get the same issue as before, but perhaps slightly less severe (i.e., the new AIs will gradually drift away from humans too).
I think what’s crucial here is that I think perfect alignment is very likely unattainable. If that’s true, then we’ll get some form of “value drift” in almost any realistic scenario. Over long periods, the world will start to look alien and inhuman. Here, the difficulty of alignment mostly sets how quickly this drift will occur, rather than determining whether the drift occurs at all.
Yep, and my disagreement as expressed in another comment is that I think that it’s not that hard to have robust corrigibility and there might also be a basin of corrigability.
The world looking alien isn’t necessarily a crux for me: it should be possible in principle to have AIs protect humans and do whatever is needed in the alien AI world while humans are sheltered and slowly self-enhance and pick successors (see the indirect normativity appendix in the ELK doc for some discussion of this sort of proposal).
I agree that perfect alignment will be hard, but I model the situation much more like a one time hair cut (at least in expectation) than exponential decay of control.
I expect that “humans stay in control via some indirect mechanism” (e.g. indirect normativity) or “humans coordinate to slow down AI progress at some point (possibly after solving all diseases and becoming wildly wealthy) (until some further point, e.g. human self-enhancement)” will both be more popular as proposals than the world you’re thinking about. Being popular isn’t sufficient: it also needs to be implementable and perhaps sufficiently legible, but I think at least implementable is likely.
Another mechanism that might be important is human self-enhancement: humans who care about staying in control can try to self-enhance to stay at least somewhat competitive with AIs while preserving their values. (This is not a crux for me and seems relatively marginal, but I thought I would mention it.)
(I wasn’t trying to trying to argue against your overall point in this comment, I was just pointing out something which doesn’t make sense to me in isolation. See this other comment for why I disagree with your overall view.)