And the AI would have got away with it too, if...

Paul Christiano presented some low-key AI catastrophe scenarios; in response, Robin Hanson argued that Paul’s scenarios were not consistent with the “large (mostly economic) literature on agency failures”.

He concluded with:

For concreteness, imagine a twelve year old rich kid, perhaps a king or queen, seeking agents to help manage their wealth or kingdom. It is far from obvious that this child is on average worse off when they choose a smarter more capable agent, or when the overall pool of agents from which they can choose becomes smarter and more capable. And its even less obvious that the kid becomes maximally worse off as their agents get maximally smart and capable. In fact, I suspect the opposite.

Thinking on that example, my mind went to Edward the Vth of England (one of the “Princes in the Tower”), deposed then likely killed by his “protector” Richard III. Or of the Guangxu Emperor of China, put under house arrest by the Regent Empress Dowager Cixi. Or maybe the ten year-old Athitayawong, king of Ayutthaya, deposed by his main administrator after only 36 days of reign. More examples can be dug out from some of Wikipedia’s list of rulers deposed as children.

We have no reason to restrict to child-monarchs—so many Emperors, Kings, and Tsars have been deposed by their advisers or “agents”. So yes, there are many cases where agency fails catastrophically for the principal and where having a smarter or more rational agent was a disastrous move.

By restricting attention to agency problems in economics, rather than in politics, Robin restricts attention to situations where institutions are strong and behaviour is punished if it gets too egregious. Though even today, there is plenty of betrayal by “agents” in politics, even if the results are less lethal than in times gone by. In economics, too, we have fraudulent investors, some of which escape punishment. Agents betray their principals to the utmost—when they can get away with it.

So Robin’s argument is entirely dependent on the assumption that institutions or rivals will prevent AIs from being able to abuse their agency power. Absent that assumption, most of the “large (mostly economic) literature on agency failures” becomes irrelevant.

So, would institutions be able to detect and punish abuses by future powerful AI agents? I’d argue we can’t count on it, but it’s a question that needs its own exploration, and is very different from what Robin’s economic point seemed to be.