An AI that can be aligned to preferences of even just one person is already an aligned AI, and we have no idea how to do that.
An AI that’s able to ~perfectly simulate what a person would feel would not necessarily want to perform actions that would make the person feel good. Humans are somewhat likely to do that because we have actual (not simulated) empathy, that makes us feel bad when someone close feels bad, and the AI is unlikely to have that. We even have humans that act like that (i.e. sociopaths), and they are still humans, not AIs!
To clarify, I meant the AI is unlikely to have it by default (being able to perfectly simulate a person does not in itself require having empathy as part of the reward function).
I’m thinking that even if it didn’t break when going out of distribution, it would still not be a good idea to try to train AI to do things that will make us feel good, because what if it decided it wanted to hook us up to morphine pumps?
An AI that can be aligned to preferences of even just one person is already an aligned AI, and we have no idea how to do that.
An AI that’s able to ~perfectly simulate what a person would feel would not necessarily want to perform actions that would make the person feel good. Humans are somewhat likely to do that because we have actual (not simulated) empathy, that makes us feel bad when someone close feels bad, and the AI is unlikely to have that. We even have humans that act like that (i.e. sociopaths), and they are still humans, not AIs!
Is there some particular reason to assume that it’d be hard to implement?
To clarify, I meant the AI is unlikely to have it by default (being able to perfectly simulate a person does not in itself require having empathy as part of the reward function).
If we try to hardcode it, Goodhart’s curse seems relevant: https://arbital.com/p/goodharts_curse/
But note that Reward is not the optimization target
I’m thinking that even if it didn’t break when going out of distribution, it would still not be a good idea to try to train AI to do things that will make us feel good, because what if it decided it wanted to hook us up to morphine pumps?