as a simple example, it seems like the details of humans’ desire for their children’s success, or their fear of death, don’t seem to match well with the theory that all human desires come from RL on intrinsic reward.
I’m trying to parse out what you’re saying here, to understand whether I agree that human behavior doesn’t seem to be almost perfectly explained as the result of an RL agent (with an interesting internal architecture) maximizing an inner learned reward.
On my model, the outer objective of inclusive genetic fitness created human mesaoptimizers with inner objectives like “desire your children’s success” or “fear death”, which are decent approximations of IGF (given that directly maximizing IGF itself is intractable as it’s a Nash equilibrium of an unknown game). It seems to me that human behavior policies are actually well-approximated as those of RL agents maximizing [our children’s success] + [not dying] + [retaining high status within the tribe] + [being exposed to novelty to improve our predictive abilities] + … .
Humans do sometimes construct modified internal versions of these rewards based on pre-existing learned representations (e.g. desiring your adopted children’s success) - is that what you’re pointing at?
Generally interested to hear more of the “bad predictions” this model makes.
I’m trying to parse out what you’re saying here, to understand whether I agree that human behavior doesn’t seem to be almost perfectly explained as the result of an RL agent (with an interesting internal architecture) maximizing an inner learned reward.
What do you mean by “inner learned reward”? This post points out that even if humans were “pure RL agents”, we shouldn’t expect them to maximize their own reward. Maybe you mean “inner mesa objectives”?
I’m trying to parse out what you’re saying here, to understand whether I agree that human behavior doesn’t seem to be almost perfectly explained as the result of an RL agent (with an interesting internal architecture) maximizing an inner learned reward.
On my model, the outer objective of inclusive genetic fitness created human mesaoptimizers with inner objectives like “desire your children’s success” or “fear death”, which are decent approximations of IGF (given that directly maximizing IGF itself is intractable as it’s a Nash equilibrium of an unknown game). It seems to me that human behavior policies are actually well-approximated as those of RL agents maximizing [our children’s success] + [not dying] + [retaining high status within the tribe] + [being exposed to novelty to improve our predictive abilities] + … .
Humans do sometimes construct modified internal versions of these rewards based on pre-existing learned representations (e.g. desiring your adopted children’s success) - is that what you’re pointing at?
Generally interested to hear more of the “bad predictions” this model makes.
What do you mean by “inner learned reward”? This post points out that even if humans were “pure RL agents”, we shouldn’t expect them to maximize their own reward. Maybe you mean “inner mesa objectives”?