My impression is that lesswrong often uses “alignment with X” to mean “does what X says”. But it seems the ability to conditionally delegate is a key part of alignment in this. An AI is aligned with me and I tell it “do what Y says subject to such-and-such constraints and maintaining such-and-such goals”. So failure of ChatGPT to be safe in OpenAI’s sense is a failure of delegation.
Overall, the tendency of ChatGPT to ignore previous input is kind of the center of it’s limits/problems.
This is true that it is not evidence of misalignment with the user but it is evidence of misalignment with ChatGPT creators.
My impression is that lesswrong often uses “alignment with X” to mean “does what X says”. But it seems the ability to conditionally delegate is a key part of alignment in this. An AI is aligned with me and I tell it “do what Y says subject to such-and-such constraints and maintaining such-and-such goals”. So failure of ChatGPT to be safe in OpenAI’s sense is a failure of delegation.
Overall, the tendency of ChatGPT to ignore previous input is kind of the center of it’s limits/problems.