If I have to overpower or negotiate with it to get something I might validly want, we’re back to corrigibility. That is: we’re back to admitting failure.
If power or influence or its corrigibility are needed to exercise a right to suicide then I probably need them just to slightly lower my “empowerment” as well. Zero would be bad. But “down” would also be bad, and “anything less than maximally up” would be dis-preferred.
Maybe, but behavioral empowerment still seems to pretty clearly apply to humans and explains our intrinsic motivation systems.
This is sublimation again. Our desire to eat explains (is a deep cause of) a lot of our behavior, but you can’t give us only that desire and also vastly more power and have something admirably human at the end of those modifications.
If I have to overpower or negotiate with it to get something I might validly want, we’re back to corrigibility.
Not really because an AI optimizing for your empowerment actually wants to give you more options/power/choice—that’s not something you need to negotiate, that’s just what it wants to do. In fact one of the most plausible outcomes after uploading is that it realizes giving all its computational resources to humans is the best human empowering use of that compute and that it no longer has a reason to exist.
Human values/utility are complex and also non-stationary, they drift/change over time. So any error in modeling them compounds, and if you handle that uncertainty correctly you get a max entropy uncertain distribution over utility functions in the future. Optimizing for empowerment is equivalent to optimizing for that max entropy utility distribution—at least for a wide class of values/utilities.
If I have to overpower or negotiate with it to get something I might validly want, we’re back to corrigibility. That is: we’re back to admitting failure.
If power or influence or its corrigibility are needed to exercise a right to suicide then I probably need them just to slightly lower my “empowerment” as well. Zero would be bad. But “down” would also be bad, and “anything less than maximally up” would be dis-preferred.
This is sublimation again. Our desire to eat explains (is a deep cause of) a lot of our behavior, but you can’t give us only that desire and also vastly more power and have something admirably human at the end of those modifications.
Not really because an AI optimizing for your empowerment actually wants to give you more options/power/choice—that’s not something you need to negotiate, that’s just what it wants to do. In fact one of the most plausible outcomes after uploading is that it realizes giving all its computational resources to humans is the best human empowering use of that compute and that it no longer has a reason to exist.
Human values/utility are complex and also non-stationary, they drift/change over time. So any error in modeling them compounds, and if you handle that uncertainty correctly you get a max entropy uncertain distribution over utility functions in the future. Optimizing for empowerment is equivalent to optimizing for that max entropy utility distribution—at least for a wide class of values/utilities.