MichaelA comments on What can the principal-agent literature tell us about AI risk?

MichaelA Mar 30, 2020, 9:28 AM
8 points
Very interesting post!
Furthermore, if we cannot enforce contracts with AIs then people will promptly realise and stop using AIs; so we should expect contracts to be enforceable conditional upon AIs being used.
I could easily be wrong, but this strikes me as a plausible but debatable statement, rather than a certainty. It seems like more argument would be required even to establish that it’s likely, and much more to establish we can say “people will promptly realise...” It also seems like that statement is sort of assuming part of precisely what’s up for debate in these sorts of discussions.
Some fragmented thoughts that feed into those opinions:
- As you note just before that: “The assumption [of contract enforceability] isn’t plausible in pessimistic scenarios where human principals and institutions are insufficiently powerful to punish the AI agent, e.g. due to very fast take-off.” So the Bostrom/Yudkowsky scenario is precisely one in which contracts aren’t enforceable, for very similar reasons to why that scenario could lead to existential catastrophe.
- Very relatedly—perhaps this is even just the same point in different words—you say “then people will promptly realise and stop using AIs”. This assumes some possibility of at least some trial-and-error, and thus assumes that there’ll be neither a very discontinuous capability jump towards decisive strategic advantage, nor deception followed by a treacherous turn.
- As you point out, Paul Christiano’s “Part 1” scenario might be one in which all or most humans are happy, and increasingly wealthy, and don’t have motivation to stop using the AIs. You quote him saying “humans are better off in absolute terms unless conflict leaves them worse off (whether military conflict or a race for scarce resources). Compare: a rising China makes Americans better off in absolute terms. Also true, unless we consider the possibility of conflict....[without conflict] humans are only worse off relative to AI (or to humans who are able to leverage AI effectively). The availability of AI still probably increases humans’ absolute wealth. This is a problem for humans because we care about our fraction of influence over the future, not just our absolute level of wealth over the short term.”
  - Similarly, it seems to me that we could have a scenario in which people realise they can’t enforce contracts with AIs, but the losses that result from that are relatively small, and are outweighed by the benefits of the AI, so people continue using the AIs despite the lack of enforceability of the contracts.
  - And then this could still lead to existential catastrophe due to black swan events people didn’t adequately account for, competitive dynamics, or “externalities” e.g. in relation to future generations.
I’m not personally sure how likely I find any of the above scenarios. I’m just saying that they seem to reveal reasons to have at least some doubts that “if we cannot enforce contracts with AIs then people will promptly realise and stop using AIs”.
Although I think it would still be true that the possibilities of trial-and-error, recognition of lack of enforceability, and people’s concerns about that are at least some reason to assume that if AIs are used contracts will be enforceability.