Don’t know if this counts as a ‘daemon’, but here’s one scenario where a minimal circuit could plausibly exhibit optimization we don’t want.
Say we are trying to build a model of some complex environment containing agents, e.g. a bunch of humans in a room. The fastest circuit that predicts this environment will almost certainly devote more computational resources to certain parts of the environment, in particular the agents, and will try to skimp as much as possible on less relevant parts such as chairs, desks etc. This could lead to ‘glitches in the matrix’ where there are small discrepancies from what the agents expect.
Finding itself in such a scenario, a smart agent could reason: “I just saw something that gives me reason to believe that I’m in a small-circuit simulation. If it looks like the simulation is going to be used for an important decision, I’ll act to advance my interests in the real world; otherwise, I’ll act as though I didn’t notice anything”.
In this way, the overall simulation behavior could be very accurate on most inputs, only deviating in the cases where it is likely to be used for an important decision. In effect, the circuit is ‘colluding’ with the agents inside it to minimize its computational costs. Indeed, you could imagine extreme scenarios where the smallest circuit instantiates the agents in a blank environment with the message “you are inside a simulation; please provide outputs as you would in environment [X]”. If the agents are good at pretending, this could be quite an accurate predictor.
Indeed, you could imagine extreme scenarios where the smallest circuit instantiates the agents in a blank environment with the message “you are inside a simulation; please provide outputs as you would in environment [X]”. If the agents are good at pretending, this could be quite an accurate predictor.
But can we just take whatever cognitive process the agent uses for pretending, and then leave the rest of it out?
Don’t know if this counts as a ‘daemon’, but here’s one scenario where a minimal circuit could plausibly exhibit optimization we don’t want.
Say we are trying to build a model of some complex environment containing agents, e.g. a bunch of humans in a room. The fastest circuit that predicts this environment will almost certainly devote more computational resources to certain parts of the environment, in particular the agents, and will try to skimp as much as possible on less relevant parts such as chairs, desks etc. This could lead to ‘glitches in the matrix’ where there are small discrepancies from what the agents expect.
Finding itself in such a scenario, a smart agent could reason: “I just saw something that gives me reason to believe that I’m in a small-circuit simulation. If it looks like the simulation is going to be used for an important decision, I’ll act to advance my interests in the real world; otherwise, I’ll act as though I didn’t notice anything”.
In this way, the overall simulation behavior could be very accurate on most inputs, only deviating in the cases where it is likely to be used for an important decision. In effect, the circuit is ‘colluding’ with the agents inside it to minimize its computational costs. Indeed, you could imagine extreme scenarios where the smallest circuit instantiates the agents in a blank environment with the message “you are inside a simulation; please provide outputs as you would in environment [X]”. If the agents are good at pretending, this could be quite an accurate predictor.
But can we just take whatever cognitive process the agent uses for pretending, and then leave the rest of it out?