Humans are not choosing to reward specific instances of actions of the AI—when we build intelligent agents, at some point they will leave the confines of curated training data and go operate on new experiences in the real world. At that point, their circuitry and rewards are out of human control, so that makes our position perfectly analogous to evolution’s. We are choosing the reward mechanism, not the reward.
Note that this provides an obvious route to alignment using conventional engineering practice.
Why does the AGI system need to update at all “out in the world”. This is highly unreliable. As events happen in the real world that the system doesn’t expect, add the (expectation, ground truth) tuples to a log and then train a simulator on the log from all instances of the system, then train the system on the updated simulator.
So only train in batches and use code in the simulator that “rewards” behavior that accomplishes the intent of the designers.
Humans are not choosing to reward specific instances of actions of the AI—when we build intelligent agents, at some point they will leave the confines of curated training data and go operate on new experiences in the real world. At that point, their circuitry and rewards are out of human control, so that makes our position perfectly analogous to evolution’s. We are choosing the reward mechanism, not the reward.
Note that this provides an obvious route to alignment using conventional engineering practice.
Why does the AGI system need to update at all “out in the world”. This is highly unreliable. As events happen in the real world that the system doesn’t expect, add the (expectation, ground truth) tuples to a log and then train a simulator on the log from all instances of the system, then train the system on the updated simulator.
So only train in batches and use code in the simulator that “rewards” behavior that accomplishes the intent of the designers.