Thomas Kwa comments on Corrigibility could make things worse

Thomas Kwa 11 Jun 2024 1:12 UTC
4 points
0
It seems to me that corrigibility doesn’t make things worse in this example, it’s just that a partially corrigible AI could still lead to bad outcomes. In fact one could say that the AI in the example is not corrigible enough, because it exerts influence in ways we don’t want.
- ThomasCederborg 11 Jun 2024 1:22 UTC
  3 points
  0
  Parent
  The first AI is genuinely Corrigible. The second AI is not Corrigible at all. This leads to a worse outcome, compared to the case where there was no Corrigible AI. Do you disagree with the statement that the first AI is genuinely Corrigible? Or do you disagree with the statement that the outcome is worse, compared to the case where there was no Corrigible AI?