In the Parable of Predict-O-Matic, a subnetwork of the titular Predict-O-Matic becomes a mesa-optimiser and begins steering the future towards its own goals, independently of the rest of Predict-O-Matic. It does so in a way that sabotages the other subnetworks.
During one run, Lenat noticed that the number in the Worth slot of one newly discovered heuristic kept rising, indicating that Eurisko had made a particularly valuable find. As it turned out the heuristic performed no useful function. It simply examined the pool of new concepts, located those with the highest Worth values, and inserted its name in their My Creator slots.
One thing I wondered is whether this could happen in humans, and if not, why it doesn’t. A simplified description of memory that I learned in a flash game is that “neural connections” are “strengthened” whenever they are “used”, which sounds sort of like gradients in RL if you don’t think about it too hard. Maybe the analogue of this would be some memory that “wants” you to remember it repeatedly at the expense of other memories. Trauma?
In the Parable of Predict-O-Matic, a subnetwork of the titular Predict-O-Matic becomes a mesa-optimiser and begins steering the future towards its own goals, independently of the rest of Predict-O-Matic. It does so in a way that sabotages the other subnetworks.
I am reminded of one specification problem that a run of Eurisko faced:
One thing I wondered is whether this could happen in humans, and if not, why it doesn’t. A simplified description of memory that I learned in a flash game is that “neural connections” are “strengthened” whenever they are “used”, which sounds sort of like gradients in RL if you don’t think about it too hard. Maybe the analogue of this would be some memory that “wants” you to remember it repeatedly at the expense of other memories. Trauma?
Tulpas??