Hmm, that wouldn’t explain the different qualia of the rewards, but maybe it doesn’t have to. I see your point that they can mathematically still be encoded in to one reward signal that we optimize through weighted factors.
I guess my deeper question would be: do the different qualias of different reward signals achieve anything in our behavior that can’t be encoded through summing the weighted factors of different reward systems in to one reward signal that is optimized?
Another framing here would be homeostasis—if you accept humans aren’t happiness optimizers, then what are we instead? Are the different reward signals more like different ‘thermostats’ where we trade off the optimal value of thermostat against each other toward some set point?
Intuitively I think the homeostasis model is true, and would explain our lack of optimizing. But I’m not well versed in this yet and worry that I might be missing how the two are just the same somehow.
Allostasis is a more biologically plausible explanation of “what a brain does” than homeostasis, but to your point: I do think optimizing for happiness and doing kinda-homeostasis are “just the same somehow”.
I have a slightly circular view that the extension of happiness exists as an output of a network with 86 billion neurons and 60 trillion connections, and that it is a thing that the brain can optimize for. Even if the intension of happiness as defined by a few English sentences is not the thing, and even if optimization for slightly different things would be very fragile, the attractor of happiness might be very small and surrounded by dystopian tar pits, I do think it is something that exists in the real world and is worth searching for.
Though if we cannot find any intension that is useful, perhaps other approaches to AI Alignment and not the “search for human happiness” will be more practical.
Hmm, that wouldn’t explain the different qualia of the rewards, but maybe it doesn’t have to. I see your point that they can mathematically still be encoded in to one reward signal that we optimize through weighted factors.
I guess my deeper question would be: do the different qualias of different reward signals achieve anything in our behavior that can’t be encoded through summing the weighted factors of different reward systems in to one reward signal that is optimized?
Another framing here would be homeostasis—if you accept humans aren’t happiness optimizers, then what are we instead? Are the different reward signals more like different ‘thermostats’ where we trade off the optimal value of thermostat against each other toward some set point?
Intuitively I think the homeostasis model is true, and would explain our lack of optimizing. But I’m not well versed in this yet and worry that I might be missing how the two are just the same somehow.
Allostasis is a more biologically plausible explanation of “what a brain does” than homeostasis, but to your point: I do think optimizing for happiness and doing kinda-homeostasis are “just the same somehow”.
I have a slightly circular view that the extension of happiness exists as an output of a network with 86 billion neurons and 60 trillion connections, and that it is a thing that the brain can optimize for. Even if the intension of happiness as defined by a few English sentences is not the thing, and even if optimization for slightly different things would be very fragile, the attractor of happiness might be very small and surrounded by dystopian tar pits, I do think it is something that exists in the real world and is worth searching for.
Though if we cannot find any intension that is useful, perhaps other approaches to AI Alignment and not the “search for human happiness” will be more practical.