I don’t think that humans are pure reinforcement learners. We have all sorts of complicated values that aren’t just eating and mating.
The toy AI has an internal model of the universe. In the extreme, a complete simulation of every atom and every object. It’s sensors update the model, helping it get more accurate predictions/more certainty about the universe state.
Instead of a utility function that just measures some external reward signal, it has an internal utility function which somehow measures the universe model and calculates utility from it. E.g. a function which counts the number of atoms arranged in paperclip shaped objects in the simulation.
It then chooses actions that lead to the best universe states. Stuff like changing its utility function or fooling its sensors would not be chosen because it knows that doesn’t lead to real paperclips.
Obviously a real universe model would be highly compressed. It would have a high level representation for paperclips rather than an atom by atom simulation.
I suspect this is how humans work. We can value external objects and universe states. People care about things that have no effect on them.
I don’t think that humans are pure reinforcement learners. We have all sorts of complicated values that aren’t just eating and mating.
We may not be pure reinforcement learners, but the presence of values other than eating and mating isn’t a proof of that. Quite the contrary, it demonstrates that either we have a lot of different, occasionally contradictory values hardwired or that we have some other system that’s creating value systems. From an evolutionary standpoint reward systems that are good at replicating genes get to survive, but they don’t have to be free of other side effects (until given long enough with a finite resource pool maybe). Pure, rational reward seeking is almost certainly selected against because it doesn’t leave any room for replication. It seems more likely that we have a reward system that is accompanied by some circuits that make it fire for a few specific sensory cues (orgasms, insulin spikes, receiving social deference, etc.).
The toy AI has an internal model of the universe, it has an internal utility function which somehow measures the universe model and calculates utility from it....[toy AI is actually paperclip optimizer]...Stuff like changing its utility function or fooling its sensors would not be chosen because it knows that doesn’t lead to real paperclips.
I think we’ve been here before ;-)
Thanks for trying to help me understand this. Gram_Stone linked a paper that explains why the class of problems that I’m describing aren’t really problems.
But that’s the thing. There is no sensory input for “social deference”. It has to be inferred from an internal model of the world itself inferred from sensory data.
Reinforcement learning works fine when you have a simple reward signal you want to maximize. You can’t use it for social instincts or morality, or anything you can’t just build a simple sensor to detect.
But that’s the thing. There is no sensory input for “social deference”. It has to be inferred from an internal model of the world itself inferred from sensory data...Reinforcement learning works fine when you have a simple reward signal you want to maximize. You can’t use it for social instincts or morality, or anything you can’t just build a simple sensor to detect.
Why does it only work on simple signals? Why can’t the result of inference work for reinforcement learning?
I don’t think that humans are pure reinforcement learners. We have all sorts of complicated values that aren’t just eating and mating.
The toy AI has an internal model of the universe. In the extreme, a complete simulation of every atom and every object. It’s sensors update the model, helping it get more accurate predictions/more certainty about the universe state.
Instead of a utility function that just measures some external reward signal, it has an internal utility function which somehow measures the universe model and calculates utility from it. E.g. a function which counts the number of atoms arranged in paperclip shaped objects in the simulation.
It then chooses actions that lead to the best universe states. Stuff like changing its utility function or fooling its sensors would not be chosen because it knows that doesn’t lead to real paperclips.
Obviously a real universe model would be highly compressed. It would have a high level representation for paperclips rather than an atom by atom simulation.
I suspect this is how humans work. We can value external objects and universe states. People care about things that have no effect on them.
We may not be pure reinforcement learners, but the presence of values other than eating and mating isn’t a proof of that. Quite the contrary, it demonstrates that either we have a lot of different, occasionally contradictory values hardwired or that we have some other system that’s creating value systems. From an evolutionary standpoint reward systems that are good at replicating genes get to survive, but they don’t have to be free of other side effects (until given long enough with a finite resource pool maybe). Pure, rational reward seeking is almost certainly selected against because it doesn’t leave any room for replication. It seems more likely that we have a reward system that is accompanied by some circuits that make it fire for a few specific sensory cues (orgasms, insulin spikes, receiving social deference, etc.).
I think we’ve been here before ;-)
Thanks for trying to help me understand this. Gram_Stone linked a paper that explains why the class of problems that I’m describing aren’t really problems.
But that’s the thing. There is no sensory input for “social deference”. It has to be inferred from an internal model of the world itself inferred from sensory data.
Reinforcement learning works fine when you have a simple reward signal you want to maximize. You can’t use it for social instincts or morality, or anything you can’t just build a simple sensor to detect.
Why does it only work on simple signals? Why can’t the result of inference work for reinforcement learning?