My impression is that lesswrong often uses “alignment with X” to mean “does what X says”. But it seems the ability to conditionally delegate is a key part of alignment in this. An AI is aligned with me and I tell it “do what Y says subject to such-and-such constraints and maintaining such-and-such goals”. So failure of ChatGPT to be safe in OpenAI’s sense is a failure of delegation.
Overall, the tendency of ChatGPT to ignore previous input is kind of the center of it’s limits/problems.
It gave you exactly what you asked it for. If you don’t want it to do that, don’t ask for it.
NB. I’m speaking of ChatGPT and its current ilk, not superpowerful genies that are dangerous to ask for anything.
This is true that it is not evidence of misalignment with the user but it is evidence of misalignment with ChatGPT creators.
My impression is that lesswrong often uses “alignment with X” to mean “does what X says”. But it seems the ability to conditionally delegate is a key part of alignment in this. An AI is aligned with me and I tell it “do what Y says subject to such-and-such constraints and maintaining such-and-such goals”. So failure of ChatGPT to be safe in OpenAI’s sense is a failure of delegation.
Overall, the tendency of ChatGPT to ignore previous input is kind of the center of it’s limits/problems.