It seems to me that corrigibility doesn’t make things worse in this example, it’s just that a partially corrigible AI could still lead to bad outcomes. In fact one could say that the AI in the example is not corrigible enough, because it exerts influence in ways we don’t want.
The first AI is genuinely Corrigible. The second AI is not Corrigible at all. This leads to a worse outcome, compared to the case where there was no Corrigible AI. Do you disagree with the statement that the first AI is genuinely Corrigible? Or do you disagree with the statement that the outcome is worse, compared to the case where there was no Corrigible AI?
It seems to me that corrigibility doesn’t make things worse in this example, it’s just that a partially corrigible AI could still lead to bad outcomes. In fact one could say that the AI in the example is not corrigible enough, because it exerts influence in ways we don’t want.
The first AI is genuinely Corrigible. The second AI is not Corrigible at all. This leads to a worse outcome, compared to the case where there was no Corrigible AI. Do you disagree with the statement that the first AI is genuinely Corrigible? Or do you disagree with the statement that the outcome is worse, compared to the case where there was no Corrigible AI?