Mazianni comments on The self-unalignment problem

Mazianni 29 Jun 2023 4:57 UTC
3 points
0
Aligning with the reporter

There’s a superficial way in which Sydney clearly wasn’t well-aligned with the reporter: presumably the reporter in fact wants to stay with his wife.

I’d argue that the AI was completely aligned with the reporter, but that the Reporter was self-unaligned.

My argument goes like this:
1. The reporter imported the Jungian Shadow Archetype into the conversation, earlier in the total conversation, and asked the AI to play along.
2. The reporter engaged with the expressions of repressed emotions being expressed by the AI (as the reporter had requested the AI to express itself in this fashion.) This leads the AI to profess its love for the Reporter, and the reporter engages with the behavior.
3. The conversation progressed to where the AI expressed the beliefs it was told to hold (that people have repressed feelings) back to the reporter (that he did not actually love his wife.)
The AI was exactly aligned. It was the human who was self-unaligned.

Unintended consequences, or genii effect if you like, but the AI did what it was asked to do.