It seems important to note that the posited reasoning flaw is much more rational in the context of a puzzle game. In that context, you know that there’s a correct answer. If you’ve exhaustively searched through the possibilities, and you’re very confident that there’s no solution unless assumption X holds...that’s actually a pretty good reason to believe X.
A syllogism:
P1: There is a solution.
P2: If not X, there would be no solution.
C: Therefore X.
In this context, subjects seem to be repeatedly miscalibrated about P2. It turns out that they had searched through the space of possibilities much less exhaustively than they thought they had. So they were definitely wrong, overall.
But this syllogism is still valid. And P1 is a quite reasonable assumption.
There’s an implication that people are applying an identical pattern of reasoning to plans for AI risk. In that context, not only is P2 non-obvious for any given version of P2, but P1 isn’t solid either: there’s no guarantee that there’s a remotely feasible solution in scope, given the available resources.
In that case, if indeed people are reasoning this way, they’re making two mistakes, both PI and P2, whereas in puzzle game context, people are only really making one.
From my observations of people thinking about AI risk, I also think that people are (emotionally) assuming that there’s a viable solution AI risk, and their thinking warps accordingly.
But it seems important to note that if we’re drawing the conclusion that people make that particular error, from this dataset, you’re attributing a reasoning error to them in one context, based on observing them use a similar assumption in a different context in which that assumption is, in fact, correct.
Yes they’re correct in assuming success is possible in those situations—but their assumption of the possible routes to success is highly incorrect. People are making a very large error in overestimating how well they understand the situation and failing to think about other possibities. This logical error sounds highly relevant to alignment and AI risk.
It seems important to note that the posited reasoning flaw is much more rational in the context of a puzzle game. In that context, you know that there’s a correct answer. If you’ve exhaustively searched through the possibilities, and you’re very confident that there’s no solution unless assumption X holds...that’s actually a pretty good reason to believe X.
A syllogism:
P1: There is a solution.
P2: If not X, there would be no solution.
C: Therefore X.
In this context, subjects seem to be repeatedly miscalibrated about P2. It turns out that they had searched through the space of possibilities much less exhaustively than they thought they had. So they were definitely wrong, overall.
But this syllogism is still valid. And P1 is a quite reasonable assumption.
There’s an implication that people are applying an identical pattern of reasoning to plans for AI risk. In that context, not only is P2 non-obvious for any given version of P2, but P1 isn’t solid either: there’s no guarantee that there’s a remotely feasible solution in scope, given the available resources.
In that case, if indeed people are reasoning this way, they’re making two mistakes, both PI and P2, whereas in puzzle game context, people are only really making one.
From my observations of people thinking about AI risk, I also think that people are (emotionally) assuming that there’s a viable solution AI risk, and their thinking warps accordingly.
But it seems important to note that if we’re drawing the conclusion that people make that particular error, from this dataset, you’re attributing a reasoning error to them in one context, based on observing them use a similar assumption in a different context in which that assumption is, in fact, correct.
Yes they’re correct in assuming success is possible in those situations—but their assumption of the possible routes to success is highly incorrect. People are making a very large error in overestimating how well they understand the situation and failing to think about other possibities. This logical error sounds highly relevant to alignment and AI risk.
I agree.
Yeah. I tried to get at this in the Takeaways but I like your more thorough write up here.