Is what you see as the issue that the AI knows what outcome would result from the different actions (showing introductory text A vs B)? As far as I can tell there would be no corrigibility problem of the form you are talking about if the AI didn’t know this, since then the AI could just decide based on things like “which text is more informative”, right? I don’t believe this would result in a random walk, any more than Petrov deciding which text to read, reading it, and then making a decision results in a random walk.
This seems similar to the issue of free will: in what sense do humans choose things if a hypothetical entity could predict what they choose? I think if you accept that humans make choices in a deterministic universe then you should also accept that Petrov can make choices even if the AI knows what choice he would make given the different introductory texts.
There is still a remaining issue that if the AI knows the outcome, then the AI’s decision making might take the outcome into account when making the decision, and this would create an external pressure towards some outcome. While “AI gives introductory text A without taking the result into account, Petrov warns superiors” and “AI gives introductory text B without taking the result into account, Petrov does not warn superiors” are both individually timelines in which the meaningful choice is made by Petrov (regardless of whether the AI knew the result), the AI is now deciding between different timelines, one with the same consequences as the first and one with the same consequences as the second, and so the AI is also exercising choice.
Is the issue solved by having the AI just not take the outcome of deliberation into account? If the AI is using something like UDT then things like “decide to make a decision without taking information X into account” have to already be the type of things the AI can do, as these decisions make the AI predictable to a counterparty who does not know X (which assists in making contracts with this counterparty). You say this could result in a random walk but this seems false given that in this case the choice of the outcome of deliberation is made by Petrov alone.
It doesn’t seem like you need sophisticated technology to “decide to make a decision without taking information X into account” in this case—the AI can just make the decision on the basis of particular features that aren’t X.
Is what you see as the issue that the AI knows what outcome would result from the different actions (showing introductory text A vs B)? As far as I can tell there would be no corrigibility problem of the form you are talking about if the AI didn’t know this, since then the AI could just decide based on things like “which text is more informative”, right? I don’t believe this would result in a random walk, any more than Petrov deciding which text to read, reading it, and then making a decision results in a random walk.
This seems similar to the issue of free will: in what sense do humans choose things if a hypothetical entity could predict what they choose? I think if you accept that humans make choices in a deterministic universe then you should also accept that Petrov can make choices even if the AI knows what choice he would make given the different introductory texts.
There is still a remaining issue that if the AI knows the outcome, then the AI’s decision making might take the outcome into account when making the decision, and this would create an external pressure towards some outcome. While “AI gives introductory text A without taking the result into account, Petrov warns superiors” and “AI gives introductory text B without taking the result into account, Petrov does not warn superiors” are both individually timelines in which the meaningful choice is made by Petrov (regardless of whether the AI knew the result), the AI is now deciding between different timelines, one with the same consequences as the first and one with the same consequences as the second, and so the AI is also exercising choice.
Is the issue solved by having the AI just not take the outcome of deliberation into account? If the AI is using something like UDT then things like “decide to make a decision without taking information X into account” have to already be the type of things the AI can do, as these decisions make the AI predictable to a counterparty who does not know X (which assists in making contracts with this counterparty). You say this could result in a random walk but this seems false given that in this case the choice of the outcome of deliberation is made by Petrov alone.
It doesn’t seem like you need sophisticated technology to “decide to make a decision without taking information X into account” in this case—the AI can just make the decision on the basis of particular features that aren’t X.