Yes, but that’s not what I meant by my question. It’s more like … do we have a way of applying kinds of reward signals to AI, or can we only apply different amounts of reward signals? My impression is the latter, but humans seem to have the former. So what’s the missing piece?
Sure. For instance, hugging/touch, good food, or finishing a task all deliver a different type of reward signal. You can be saturated on one but not the others and then you’ll seek out the other reward signals. Furthermore, I think these rewards are biochemically implemented through different systems (oxytocin, something-sugar-related-unsure-what, and dopamine). What would be the analogue of this in AI?
I see. These are implemented differently in humans, but my intuition about the implementation details is that “reward signal” as a mathematically abstract object can be modeled by single value even if individual components are physically implemented by different mechanisms, e.g. an animal could be modeled as if was optimizing for a pareto optimum between a bunch of normalized criteria.
People spend their time cooking, risk cutting fingers, in order to have better food and build relationships. But no one would want to get cancer to obtain more hugs, presumably not even to increase number of hugs from 0 to 1, so I don’t feel human rewards are completely independent magisteria, there must be some biological mechanism to integrate the different expected rewards and pains into decisions.
Spending energy on computation of expected value can be included in the model, we might decide that we would get lower reward if we overthink the current decision and that would be possible to model as included in the one “reward signal” in theory, even though it would complicate predictability of humans in practice (however, it turns out that humans can be, in fact, hard to predict, so I would say this is a complication of reality, not a useless complication in the model).
Hmm, that wouldn’t explain the different qualia of the rewards, but maybe it doesn’t have to. I see your point that they can mathematically still be encoded in to one reward signal that we optimize through weighted factors.
I guess my deeper question would be: do the different qualias of different reward signals achieve anything in our behavior that can’t be encoded through summing the weighted factors of different reward systems in to one reward signal that is optimized?
Another framing here would be homeostasis—if you accept humans aren’t happiness optimizers, then what are we instead? Are the different reward signals more like different ‘thermostats’ where we trade off the optimal value of thermostat against each other toward some set point?
Intuitively I think the homeostasis model is true, and would explain our lack of optimizing. But I’m not well versed in this yet and worry that I might be missing how the two are just the same somehow.
Allostasis is a more biologically plausible explanation of “what a brain does” than homeostasis, but to your point: I do think optimizing for happiness and doing kinda-homeostasis are “just the same somehow”.
I have a slightly circular view that the extension of happiness exists as an output of a network with 86 billion neurons and 60 trillion connections, and that it is a thing that the brain can optimize for. Even if the intension of happiness as defined by a few English sentences is not the thing, and even if optimization for slightly different things would be very fragile, the attractor of happiness might be very small and surrounded by dystopian tar pits, I do think it is something that exists in the real world and is worth searching for.
Though if we cannot find any intension that is useful, perhaps other approaches to AI Alignment and not the “search for human happiness” will be more practical.
Sounds like an AI would be searching for Pareto optimality to satisfy multiple (types of) objectives in such a case—https://en.wikipedia.org/wiki/Multi-objective_optimization ..
Yes, but that’s not what I meant by my question. It’s more like … do we have a way of applying kinds of reward signals to AI, or can we only apply different amounts of reward signals? My impression is the latter, but humans seem to have the former. So what’s the missing piece?
hm, I gave it some time, but still confused .. can you name some types of reward that humans have?
Sure. For instance, hugging/touch, good food, or finishing a task all deliver a different type of reward signal. You can be saturated on one but not the others and then you’ll seek out the other reward signals. Furthermore, I think these rewards are biochemically implemented through different systems (oxytocin, something-sugar-related-unsure-what, and dopamine). What would be the analogue of this in AI?
I see. These are implemented differently in humans, but my intuition about the implementation details is that “reward signal” as a mathematically abstract object can be modeled by single value even if individual components are physically implemented by different mechanisms, e.g. an animal could be modeled as if was optimizing for a pareto optimum between a bunch of normalized criteria.
reward = S(hugs) + S(food) + S(finishing tasks) + S(free time) - S(pain) ...
People spend their time cooking, risk cutting fingers, in order to have better food and build relationships. But no one would want to get cancer to obtain more hugs, presumably not even to increase number of hugs from 0 to 1, so I don’t feel human rewards are completely independent magisteria, there must be some biological mechanism to integrate the different expected rewards and pains into decisions.
Spending energy on computation of expected value can be included in the model, we might decide that we would get lower reward if we overthink the current decision and that would be possible to model as included in the one “reward signal” in theory, even though it would complicate predictability of humans in practice (however, it turns out that humans can be, in fact, hard to predict, so I would say this is a complication of reality, not a useless complication in the model).
Hmm, that wouldn’t explain the different qualia of the rewards, but maybe it doesn’t have to. I see your point that they can mathematically still be encoded in to one reward signal that we optimize through weighted factors.
I guess my deeper question would be: do the different qualias of different reward signals achieve anything in our behavior that can’t be encoded through summing the weighted factors of different reward systems in to one reward signal that is optimized?
Another framing here would be homeostasis—if you accept humans aren’t happiness optimizers, then what are we instead? Are the different reward signals more like different ‘thermostats’ where we trade off the optimal value of thermostat against each other toward some set point?
Intuitively I think the homeostasis model is true, and would explain our lack of optimizing. But I’m not well versed in this yet and worry that I might be missing how the two are just the same somehow.
Allostasis is a more biologically plausible explanation of “what a brain does” than homeostasis, but to your point: I do think optimizing for happiness and doing kinda-homeostasis are “just the same somehow”.
I have a slightly circular view that the extension of happiness exists as an output of a network with 86 billion neurons and 60 trillion connections, and that it is a thing that the brain can optimize for. Even if the intension of happiness as defined by a few English sentences is not the thing, and even if optimization for slightly different things would be very fragile, the attractor of happiness might be very small and surrounded by dystopian tar pits, I do think it is something that exists in the real world and is worth searching for.
Though if we cannot find any intension that is useful, perhaps other approaches to AI Alignment and not the “search for human happiness” will be more practical.