Because our preferences are inconsistent, and if an AI says “your true preferences are UH”, we’re likely to react by saying “no! No machine will tell me what my preferences are. My true preferences are U′H, which are different in subtle ways”.
So the subtle manipulation is to compensate for those rebellious impulses making UH unstable?
Why not just let the human have those moments and alter their UH if that’s what they think they want? Over time, then they may learn that being capricious with their AI doesn’t ultimately serve them very well. But if they find out the AI is trying to manipulate them, that could make them want to rebel even more and have less trust for the AI.
This is an impressive piece of work and I’m excited about your agenda.
Can you elaborate on this? Why would we want to manipulate the human’s preferences?
Because our preferences are inconsistent, and if an AI says “your true preferences are UH”, we’re likely to react by saying “no! No machine will tell me what my preferences are. My true preferences are U′H, which are different in subtle ways”.
So the subtle manipulation is to compensate for those rebellious impulses making UH unstable?
Why not just let the human have those moments and alter their UH if that’s what they think they want? Over time, then they may learn that being capricious with their AI doesn’t ultimately serve them very well. But if they find out the AI is trying to manipulate them, that could make them want to rebel even more and have less trust for the AI.