Wait a sec: I’m not sure people do outright avoid modifying their own desires so as to make the desires easier to satisfy, as you are claiming here:
We, ourselves, do not imagine the future and judge, that any future in which our brains want something, and that thing exists, is a good future. If we did think this way, we would say: “Yay! Go ahead and modify us to strongly want something cheap!”
Isn’t that exactly what people do when they study ascetic philosophies and otherwise try to see what living simply is like? And would people turn down a pill that made vegetable juice taste like a milkshake and vice versa?
That’s a good point. I think the distinction is that these people are modifying their own instrumental values, but leaving their terminal values (the big meaning of life blob of computation) unchanged. I’d go so far as to say that people frequently do this trick by mistake, when they convince themselves that they have various terminal values. This certainly explains things like happy death spirals.
On the other hand, this would be very difficult (impossible?) to test.
EDIT: I’ve given this a bit more thought, and I wonder what it would feel like from the inside to be a machine learning algorithm that could make limited small self-modifications to it’s own utility function, including it’s optimization criteria. This seems like a “simple” enough hack that evolution could have generated it. This also seems to mirror real human psychology surprisingly well.
I’m imagining trying to answer the question “what I would like to change my utility function to”, while simultaneously not fully understanding the dangers of messing around like that. It seems like this could easily generate people like religious extremists, even if earlier versions of those people would never have deliberately tried to become that twisted. If the other side seems completely wrong and evil, then I can picture disliking parts of myself that resemble the other side, as well as well as any empathy I may have for them. I can imagine how suppressing those parts of myself would lead to extremism.
I wonder what the official Yudkowski position on this is. More importantly, I wonder what happens if you get this question wrong while trying to build a Friendly AI. It seems like there might be issues if you assume a static Coherent Extrapolated Volition if it is actually dynamically changing, or vice versa.
Wait a sec: I’m not sure people do outright avoid modifying their own desires so as to make the desires easier to satisfy, as you are claiming here:
We, ourselves, do not imagine the future and judge, that any future in which our brains want something, and that thing exists, is a good future. If we did think this way, we would say: “Yay! Go ahead and modify us to strongly want something cheap!”
Isn’t that exactly what people do when they study ascetic philosophies and otherwise try to see what living simply is like? And would people turn down a pill that made vegetable juice taste like a milkshake and vice versa?
That’s a good point. I think the distinction is that these people are modifying their own instrumental values, but leaving their terminal values (the big meaning of life blob of computation) unchanged. I’d go so far as to say that people frequently do this trick by mistake, when they convince themselves that they have various terminal values. This certainly explains things like happy death spirals.
On the other hand, this would be very difficult (impossible?) to test.
EDIT: I’ve given this a bit more thought, and I wonder what it would feel like from the inside to be a machine learning algorithm that could make limited small self-modifications to it’s own utility function, including it’s optimization criteria. This seems like a “simple” enough hack that evolution could have generated it. This also seems to mirror real human psychology surprisingly well.
I’m imagining trying to answer the question “what I would like to change my utility function to”, while simultaneously not fully understanding the dangers of messing around like that. It seems like this could easily generate people like religious extremists, even if earlier versions of those people would never have deliberately tried to become that twisted. If the other side seems completely wrong and evil, then I can picture disliking parts of myself that resemble the other side, as well as well as any empathy I may have for them. I can imagine how suppressing those parts of myself would lead to extremism.
I wonder what the official Yudkowski position on this is. More importantly, I wonder what happens if you get this question wrong while trying to build a Friendly AI. It seems like there might be issues if you assume a static Coherent Extrapolated Volition if it is actually dynamically changing, or vice versa.