peterbarnett comments on Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI

peterbarnett 26 Jan 2024 23:18 UTC
15 points
4
This post doesn’t intend to rely on there being a discrete transition between “roughly powerless and unable to escape human control” to “basically a god, and thus able to accomplish any of its goals without constraint”. We argue that an AI which is able to dramatically speed up scientific research (i.e. effectively automate science), it will be extremely hard to both safely constrain and get useful work from.
Such AIs won’t effectively hold all the power (at least initially), and so will initially be forced to comply with whatever system we are attempting to use to control it (or at least look like they are complying, while they delay, sabotage, or gain skills that would allow them to break out of the system). This system could be something like a Redwood-style control scheme, or a system of laws. I imagine with a system of laws, the AIs very likely lie in wait, amass power/trust etc, until they can take critical bad actions without risk of legal repercussions. If the AIs have goals that are better achieved by not obeying the laws, then they have an incentive to get into a position where they can safely get around laws (and likely take over). This applies with a population of AIs or a single AI, assuming that the AIs are goal directed enough to actually get useful work done. In Section 5 of the post we discussed control schemes, which I expect also to be inadequate (given current levels of security mindset/paranoia), but seem much better than legal systems for safely getting work out of misaligned systems.
AIs also have an obvious incentive to collude with each other. They could either share all the resources (the world, the universe, etc) with the humans, where the humans get the majority of resources; or the AIs could collude, disempower humans, and then share resources amongst themselves. I don’t really see a strong reason to expect misaligned AIs to trade with humans much, if the population of AIs were capable of together taking over. (This is somewhat an argument for your point 2)
- Matthew Barnett 27 Jan 2024 0:03 UTC
  4 points
  −7
  Parent
  I imagine with a system of laws, the AIs very likely lie in wait, amass power/trust etc, until they can take critical bad actions without risk of legal repercussions.
  It seems to me our main disagreement is about whether it’s plausible that AIs will:
  1. Utilize a strategy to covertly and forcefully take over the world
  2. Do this at a time during which humans are still widely seen as “in charge”, nominally
  I think it’s both true that future AI agents will likely not have great opportunities to take over the entire world (which I think will include other non-colluding AI agents), and that even if they had such opportunities, it is likely more cost-effective for them to amass power lawfully without resorting to violence. One could imagine, for example, AIs will just get extremely rich through conventional means, leaving humans in the dust, but without taking the extra (somewhat costly) step of taking over the world to get rid of all the humans.
  Here’s another way to understand what I’m saying. The idea that “humans will be weak compared to AIs” can be viewed from two opposing perspectives. On the one hand, yes, it means that AIs can easily kill us if they all ganged up on us, but it also means there’s almost no point in killing us, since we’re not really a threat to them anyway. (Compare to a claim that e.g. Jeff Bezos has an instrumental incentive to steal from a minimum wage worker because they are a threat to his power.)
  The fact that humans will be relatively useless, unintelligent, and slow in the future mostly just means our labor won’t be worth much. This cuts both ways: we will be easy to defeat in a one-on-one fight, but we also pose no real threat to the AI’s supremacy. If AIs simply sold their labor honestly on an open market, they could easily become vastly richer than humans, but without needing to take the extra step of overthrowing the whole system to kill us.
  Now, there is some nuance here. Humans will want to be rich in the future by owning capital, and not just by owning labor. But here too we can apply an economic argument against theft or revolution: since AIs will be much better than us at accumulating wealth and power, it is not in their interest to weaken property rights by stealing all our wealth.
  Like us, AIs will also have an incentive to prevent against future theft and predation from other AIs. Weaking property norms would likely predictably harm their future prospects of maintaining a stable system of law in which they could accumulate their own power. Among other reasons, this provides one explanation for why well-functioning institutions don’t just steal all the wealth of people over the age of 80. If that happened, people would likely think: if the system can steal all those people’s wealth, maybe I’ll be next?
  They could either share all the resources (the world, the universe, etc) with the humans, where the humans get the majority of resources; or the AIs could collude, disempower humans, and then share resources amongst themselves. I don’t really see a strong reason to expect misaligned AIs to trade with humans much, if the population of AIs were capable of together taking over. (This is somewhat an argument for your point 2)
  I think my fundamental objection here is that I don’t think there will necessarily be a natural, unified coalition of AIs that works against all humans. To prevent misinterpretations I need to clarify: I think some AIs will eventually be able to coordinate with each other much better than humans can coordinate with each other. But I’m still skeptical of the rational argument in favor of collusion in these circumstances. You can read about what I had to say about this argument recently in this comment, and again more recently in this comment.
  - ryan_greenblatt 27 Jan 2024 0:21 UTC
    6 points
    4
    Parent
    I expect that Peter and Jeremy aren’t particularly commited to covert and forceful takeover and they don’t think of this as a key conclusion (edit: a key conclusion of this post).
    
    Instead they care more about arguing about how resources will end up distributed in the long run.
    
    Separately, if humans didn’t attempt to resist AI resource acquisition or AI crime at all, then I personally don’t really see a strong reason for AIs to go out of their way to kill humans, though I could imagine large collateral damage due to conflict over resources between AIs.
    - Matthew Barnett 27 Jan 2024 0:32 UTC
      3 points
      0
      Parent
      I expect that Peter and Jeremy aren’t particularly committed to covert and forceful takeover and they don’t think of this as a key conclusion.
      Instead they care more about arguing about how resources will end up distributed in the long run.
      If the claim is, for example, that AIs could own 99.99% of the universe, and humans will only own 0.01%, but all of us humans will be many orders of magnitude richer (because the universe is so big), and yet this still counts as a “catastrophe” because of the relative distribution of wealth and resources, I think that needs to be way more clear in the text.
      I could imagine large collateral damage due to conflict over resources between AIs.
      To be clear: I’m also very concerned about future AI conflict, and I think that if such a widespread conflict occurred (imagine: world war 3 but with robot armies in addition to nanotech and anti-matter bombs), I would be very worried, not only for my own life, but for the state of the world generally. My own view on this issue is simply that it is imprecise and approximately inaccurate to round such an problem off to generic problems of technical misalignment, relative to broader structural problems related to the breakdown of institutions designed to keep the peace among various parties in the world.
      - ryan_greenblatt 27 Jan 2024 1:20 UTC
        2 points
        0
        Parent
        Also, for the record, I totally agree with:
        
        yet this is still counts as a “catastrophe” because of the relative distribution of wealth and resources, I think that needs to be way more clear in the text.
        
        (But I think they do argue for violent conflict in text. It would probably be more clear if they were like “we mostly aren’t arguing for violent takeover or loss of human life here, though this has been discussed in more detail elsewhere”)
      - ryan_greenblatt 27 Jan 2024 0:40 UTC
        2 points
        0
        Parent
        TBC, they discuss negative consequences of powerful, uncontrolled, and not-particularly-aligned AI in section 6, but they don’t argue for “this will result in violent conflict” in that much detail. I think the argument they make is basically right and suffices for thinking that the type of scenario they describe is reasonably likely to end in violent conflict (though more like 70% than 95% IMO). I just don’t see this as one of the main arguments of this post and probably isn’t a key crux for them.
      - Jeremy Gillen 31 Jan 2024 10:05 UTC
        1 point
        −2
        Parent
        I agree that it’d be extremely misleading if we defined “catastrophe” in a way that includes futures where everyone is better off than they currently are in every way (without being very clear about it). This is not what we mean by catastrophe.
  - ryan_greenblatt 27 Jan 2024 0:12 UTC
    2 points
    0
    Parent
    
    If AIs simply sold their labor honestly on an open market, they could easily become vastly richer than humans …
    
    I mean, this depends on competition right? Like it’s not clear that the AIs can reap these gains because you can just train an AI to compete? (And the main reason why this competition argument could fail is that it’s too hard to ensure that your AI works for you productively because ensuring sufficient alignment/etc is too hard. Or legal reasons.)
    
    [Edit: I edited this comment to make it clear that I was just arguing about whether AIs could easily become vastly richer and about the implications of this. I wasn’t trying to argue about theft/murder here though I do probably disagree here also in some important ways.]
    
    Separately, in this sort of scenario, it sounds to me like AIs gain control over a high fraction of the cosmic endowment. Personally, what happens with the cosmic endowment is a high fraction of what I care about (maybe about 95% of what I care about), so this seems probably about as bad as violent takeover (perhaps one difference is in the selection effects on AIs).
    - Matthew Barnett 27 Jan 2024 0:27 UTC
      4 points
      2
      Parent
      I mean, this depends on competition right? Like it’s not clear that the AIs can reap these gains because you can just train an AI to compete?
      [ETA: Apologies, it appears I misinterpreted you as defending the claim that AIs will have an incentive to steal or commit murder if they are subject to competition.]
      That’s true for humans too, at various levels of social organization, and yet I don’t think humans have a strong incentive to kill off or steal from weaker/less intelligent people or countries etc. To understand what’s going on here, I think it’s important to analyze these arguments in existing economic frameworks—and not because I’m applying a simplistic “AIs will be like humans” argument but rather because I think these frameworks are simply our best existing, empirically validated models of what happens when a bunch of agents with different values and levels of power are in competition with each other.
      In these models, it is generally not accurate to say that powerful agents have strong convergent incentives to kill or steal from weaker agents, which is the primary thing I’m arguing against. Trade is not assumed to happen in these models because all agents consider themselves roughly all equally powerful, or because the agents have the same moral views, or because there’s no way to be unseated by cheap competition, and so on. These models generally refer to abstract agents of varying levels of power and differing values, in a diverse range of circumstances, and yet still predict peaceful trade because of the efficiency advantages of lawful interactions and compromise.
      - ryan_greenblatt 27 Jan 2024 0:32 UTC
        2 points
        0
        Parent
        Oh, sorry, to be clear I wasn’t arguing that this results in an incentive to kill or steal. I was just pushing back on a local point that seemed wrong to me.
  - Jeremy Gillen 31 Jan 2024 8:33 UTC
    1 point
    0
    Parent
    Trying to find the crux of the disagreement (which I don’t think lies in takeoff speed):
    If we assume a multipolar, slow-takeoff, misaligned AI world, where there are many AIs that slowly takeover the economy and generally obey laws to the extent that they are enforced (by other AIs). And they don’t particularly care about humans, in a similar manner to the way humans don’t particularly care about flies.
    In this situation, humans eventually have approximately zero leverage, and approximately zero value to trade. There would be much more value in e.g. mining cities for raw materials than in human labor.
    I don’t know much history, but my impression is that in similar scenarios between human groups, with a large power differential and with valuable resources at stake, it didn’t go well for the less powerful group, even if the more powerful group was politically fragmented or even partially allied with the less powerful group.
    Which part of this do you think isn’t analogous?
    My guesses are either that you are expecting some kind of partial alignment of the AIs. Or that the humans can set up very robust laws/institutions of the AI world such that they remain in place and protect humans even though no subset of the agents is perfectly happy with this, and there exist laws/institutions that they would all prefer.
    - Matthew Barnett 31 Jan 2024 9:03 UTC
      2 points
      −1
      Parent
      
      In this situation, humans eventually have approximately zero leverage, and approximately zero value to trade. There would be much more value in e.g. mining cities for raw materials than in human labor.
      
      Generally speaking, the optimistic assumption is that humans will hold leverage by owning capital, or more generally by receiving income from institutions set up ahead of time (e.g. pensions) that provide income streams to older agents in the society. This system of income transfers to those whose labor is not worth much anymore already exists and benefits old people in human societies, though obviously this happens in a more ordinary framework than you might think will be necessary with AI.
      
      Or that the humans can set up very robust laws/institutions of the AI world such that they remain in place and protect humans even though no subset of the agents is perfectly happy with this, and there exist laws/institutions that they would all prefer.
      
      Assuming AIs are agents that benefit from acting within a stable, uniform, and predictable system of laws, they’d have good reasons to prefer the rule of law to be upheld. If some of those laws support income streams to humans, AIs may support the enforcement of these laws too. This doesn’t imply any particular preference among AIs for human welfare directly, except insofar as upholding the rule of law sometimes benefits humans too. Partial alignment would presumably also help to keep humans safe.
      
      (Plus, AIs may get “old” too, in the sense of becoming obsolete in the face of newer generations of AIs. These AIs may therefore have much in common with us, in this sense. Indeed, they may see us as merely one generation in a long series, albeit having played a unique role in history, as a result of having been around during the transition from biology to computer hardware.)