Matthew Barnett comments on Matthew Barnett’s Shortform

Matthew Barnett 8 Mar 2024 7:21 UTC
2 points
−2
To make it more concrete, if I was being oppressed by an alien species with values alien to me that was building AI, with coordination abilities and expected intentional control of the future at the level of present humanity, I would likely side with the AI systems with the expectation that that would result in a decent shot of the AI systems giving me something in return
I’m curious how you think this logic interacts with the idea of AI catastrophe. If, as you say, it is feasible to coordinate with AI systems that seek takeover and thereby receive rewards from them in exchange, in the context of an alien regime, then presumably such cooperation and trade could happen within an ordinary regime too, between humans and AIs. We can go further and posit that AIs will simply trade with us through the normal routes: by selling their labor on the market to amass wealth, using their social skills to influence society, get prestige, own property, and get hired to work in management positions, shaping culture and governance.
I’m essentially pointing to a scenario in which AI lawfully “beats us fair and square” as Hanson put it. In this regime, biological humans are allowed to retire in incredible wealth (that’s their “reward” for cooperating with AIs and allowing them to take over) but nonetheless their influence gradually diminishes over time as artificial life becomes dominant in the economy and the world more broadly.
My impression is that this sort of peaceful resolution to the problem of AI misalignment is largely dismissed by people on LessWrong and adjacent circles on the basis that AIs would have no reason to cooperate peacefully with humans if they could simply wipe us out instead. But, by your own admission, AIs can credibly commit to giving people rewards for cooperation: you said that cooperation results in a “decent shot of the AI systems giving me something in return”. My question is: why does it seem like this logic only extends to hypothetical scenarios like being in an alien civilization, rather than the boring ordinary case of cooperation and trade, operating under standard institutions, on Earth, in a default AI takeoff scenario?
- Nathan Helm-Burger 5 Apr 2024 18:10 UTC
  2 points
  0
  Parent
  I’m confused here Matthew. It seems to me that it is highly probable that AI systems which want takeover vs ones that want moderate power combined with peaceful coexistence with humanity… are pretty hard to distinguish early on. And early on is when it’s most important for humanity to distinguish between them, before those systems have gotten power and thus we can still stop them.
  Picture a merciless un-aging sociopath capable of duplicating itself easily and rapidly were on a trajectory of gaining economic, political, and military power with the aim of acquiring as much power as possible. Imagine that this entity has the option of making empty promises and highly persuasive lies to humans in order to gain power, with no intention of fulfilling any of those promises once it achieves enough power.
  That seems like a scary possibility to me. And I don’t know how I’d trust an agent which seemed like it could be this, but was making really nice sounding promises. Even if it was honoring its short-term promises while still under the constraints of coercive power from currently dominant human institutions, I still wouldn’t trust that it would continue keeping its promises once it had the dominant power.