We could steer it into a motivational system in which it happily accepts steering signals, hopefully, right?
That’s true. I should have said “a misaligned steered optimizer”
don’t want to rely on [things like AGI learning curves], even if it seems intuitively probable.
Strongly agree
What if the hypercake was laced with a special nanobot that would travel around your brain and deactivate the “this is empty and meaningless” gut feeling and replace it with a “this is deeply fulfilling” feeling? Would you eat it then?
Indeed not! I’m not sure if this is obvious (because the example was not excellently chosen), but I meant to suggest something like “if I had to choose my best guess at a thing that would be selfishly good for me in the future, I would care more about my actual experience of it (and subcortically-generated reward) than my guess of what I would feel now”.
I think the difference is outward-facing goals are in the first category, and goals that mainly impact myself are in the second category
That was my first guess when reading your “making the world a better place” example. But I don’t think it quite works. If I have an outward-facing goal of ensuring more people enter long-lasting meaningful relationships, I want that goal to be able to shift in the face of data from reality. But perhaps my imagination is misfiring because that’s not actually a very important goal to me.
That’s true. I should have said “a misaligned steered optimizer”
Strongly agree
Indeed not! I’m not sure if this is obvious (because the example was not excellently chosen), but I meant to suggest something like “if I had to choose my best guess at a thing that would be selfishly good for me in the future, I would care more about my actual experience of it (and subcortically-generated reward) than my guess of what I would feel now”.
That was my first guess when reading your “making the world a better place” example. But I don’t think it quite works. If I have an outward-facing goal of ensuring more people enter long-lasting meaningful relationships, I want that goal to be able to shift in the face of data from reality. But perhaps my imagination is misfiring because that’s not actually a very important goal to me.