While I have no reason to suspect Hanson’s summary of the agency literature is inaccurate, I feel like he really focused on the question of “should we expect AI agents on average to be dangerous” and concluded the answer was no, based on human and business agents.
This doesn’t seem to address Christiano’s true concern, which I would phrase more like “what is the likelihood at least one powerful AI turns dangerous because of principal agent problems.”
One way to square this might be to take some of Hanson’s own suggestions to imagine a comparison case. For example, if we look at the way real businesses have failed as agents in different cases, and then assume the business is made of Ems instead, does that make our problem worse or better?
My expectation is that it would mostly just make the whole category higher-variance; the successes will be more successful, but the failures will do more damage. If everything else about the system stays the same, this seems like a straight increase in catastrophic risk.
While I have no reason to suspect Hanson’s summary of the agency literature is inaccurate
I’m not sure the implicit message in his summary is accurate. He says “But this literature has not found that smarter agents are more problematic, all else equal”. This is perfectly compatible with “nobody has ever modelled this problem at all”; if someone had modelled it and said that smarter agents don’t misbehave, then that should have been cited. He says that the problem is generally modelled with the agent (and the principle) being unboundedly rational. This means that smartness and rationality cannot be modelled within the model at all (and I suspect that these are the usual “unboundedly-rational-with-extremely-limited-actions-sets” which fail at realistically modelling either bounded or unbounded rationality).
That could be. I had assumed that when referring to the literature he was including some number of real-world examples against which those models are measured, like the number of lawsuits over breach of contract versus the estimated number of total contracts, or something. Reviewing the piece I realize he didn’t specify that, but I note that I would be surprised if the literature didn’t include anything of the sort and also that it would be unusual for him to neglect current real examples.
While I have no reason to suspect Hanson’s summary of the agency literature is inaccurate, I feel like he really focused on the question of “should we expect AI agents on average to be dangerous” and concluded the answer was no, based on human and business agents.
This doesn’t seem to address Christiano’s true concern, which I would phrase more like “what is the likelihood at least one powerful AI turns dangerous because of principal agent problems.”
One way to square this might be to take some of Hanson’s own suggestions to imagine a comparison case. For example, if we look at the way real businesses have failed as agents in different cases, and then assume the business is made of Ems instead, does that make our problem worse or better?
My expectation is that it would mostly just make the whole category higher-variance; the successes will be more successful, but the failures will do more damage. If everything else about the system stays the same, this seems like a straight increase in catastrophic risk.
I’m not sure the implicit message in his summary is accurate. He says “But this literature has not found that smarter agents are more problematic, all else equal”. This is perfectly compatible with “nobody has ever modelled this problem at all”; if someone had modelled it and said that smarter agents don’t misbehave, then that should have been cited. He says that the problem is generally modelled with the agent (and the principle) being unboundedly rational. This means that smartness and rationality cannot be modelled within the model at all (and I suspect that these are the usual “unboundedly-rational-with-extremely-limited-actions-sets” which fail at realistically modelling either bounded or unbounded rationality).
That could be. I had assumed that when referring to the literature he was including some number of real-world examples against which those models are measured, like the number of lawsuits over breach of contract versus the estimated number of total contracts, or something. Reviewing the piece I realize he didn’t specify that, but I note that I would be surprised if the literature didn’t include anything of the sort and also that it would be unusual for him to neglect current real examples.