I don’t think that humans are pure reinforcement learners. We have all sorts of complicated values that aren’t just eating and mating.
We may not be pure reinforcement learners, but the presence of values other than eating and mating isn’t a proof of that. Quite the contrary, it demonstrates that either we have a lot of different, occasionally contradictory values hardwired or that we have some other system that’s creating value systems. From an evolutionary standpoint reward systems that are good at replicating genes get to survive, but they don’t have to be free of other side effects (until given long enough with a finite resource pool maybe). Pure, rational reward seeking is almost certainly selected against because it doesn’t leave any room for replication. It seems more likely that we have a reward system that is accompanied by some circuits that make it fire for a few specific sensory cues (orgasms, insulin spikes, receiving social deference, etc.).
The toy AI has an internal model of the universe, it has an internal utility function which somehow measures the universe model and calculates utility from it....[toy AI is actually paperclip optimizer]...Stuff like changing its utility function or fooling its sensors would not be chosen because it knows that doesn’t lead to real paperclips.
I think we’ve been here before ;-)
Thanks for trying to help me understand this. Gram_Stone linked a paper that explains why the class of problems that I’m describing aren’t really problems.
But that’s the thing. There is no sensory input for “social deference”. It has to be inferred from an internal model of the world itself inferred from sensory data.
Reinforcement learning works fine when you have a simple reward signal you want to maximize. You can’t use it for social instincts or morality, or anything you can’t just build a simple sensor to detect.
But that’s the thing. There is no sensory input for “social deference”. It has to be inferred from an internal model of the world itself inferred from sensory data...Reinforcement learning works fine when you have a simple reward signal you want to maximize. You can’t use it for social instincts or morality, or anything you can’t just build a simple sensor to detect.
Why does it only work on simple signals? Why can’t the result of inference work for reinforcement learning?
We may not be pure reinforcement learners, but the presence of values other than eating and mating isn’t a proof of that. Quite the contrary, it demonstrates that either we have a lot of different, occasionally contradictory values hardwired or that we have some other system that’s creating value systems. From an evolutionary standpoint reward systems that are good at replicating genes get to survive, but they don’t have to be free of other side effects (until given long enough with a finite resource pool maybe). Pure, rational reward seeking is almost certainly selected against because it doesn’t leave any room for replication. It seems more likely that we have a reward system that is accompanied by some circuits that make it fire for a few specific sensory cues (orgasms, insulin spikes, receiving social deference, etc.).
I think we’ve been here before ;-)
Thanks for trying to help me understand this. Gram_Stone linked a paper that explains why the class of problems that I’m describing aren’t really problems.
But that’s the thing. There is no sensory input for “social deference”. It has to be inferred from an internal model of the world itself inferred from sensory data.
Reinforcement learning works fine when you have a simple reward signal you want to maximize. You can’t use it for social instincts or morality, or anything you can’t just build a simple sensor to detect.
Why does it only work on simple signals? Why can’t the result of inference work for reinforcement learning?