Can the smallest boolean circuit that solves a problem be a daemon? For example, can the smallest circuit that predicts my behavior (at some level of accuracy) be a daemon?
Yes. Consider a predictor that predicts what Paul will say if given an input and n time-steps to think about it, where n can be any integer up to some bound k. One possible circuit would have k single-step simulators chained together, plus a mux which takes the output of the nth single-step simulator. But a circuit which consisted of k single-step simulators and took the output of the *last* one would be smaller, and if Paul commits to not use the extra time steps to change his output on any input which could possibly be a training input, then this circuit is a valid predictor. He could then use the extra time steps to implement a daemon strategy for any input which he can reliably recognize as one that would not be used during training.
But can we make a smaller circuit by stripping out the part of Paul that attempts to recognize whether an input could be part of the training distribution?
If he’s a neural net, this is likely an obstacle to any attempts to simplify out parts of him; those parts would still be contributing to the result, it’s just that within the test input domain those contributions would look like noise.
In general: if a circuit implements a prediction problem that sometimes but doesn’t always require simulating an agent, and if that agent is capable of making itself implement the identity function for interesting subsets of its inputs, then it can potentially show up as a daemon for any input that didn’t require simulating it.
Why couldn’t you just use a smaller circuit that runs one single-step simulator, and outputs the result? It seems like that would output an accurate prediction of Paul’s behavior iff the k-step simulator outputs an accurate prediction.
Yes. Consider a predictor that predicts what Paul will say if given an input and n time-steps to think about it, where n can be any integer up to some bound k. One possible circuit would have k single-step simulators chained together, plus a mux which takes the output of the nth single-step simulator. But a circuit which consisted of k single-step simulators and took the output of the *last* one would be smaller, and if Paul commits to not use the extra time steps to change his output on any input which could possibly be a training input, then this circuit is a valid predictor. He could then use the extra time steps to implement a daemon strategy for any input which he can reliably recognize as one that would not be used during training.
But can we make a smaller circuit by stripping out the part of Paul that attempts to recognize whether an input could be part of the training distribution?
If he’s a neural net, this is likely an obstacle to any attempts to simplify out parts of him; those parts would still be contributing to the result, it’s just that within the test input domain those contributions would look like noise.
In general: if a circuit implements a prediction problem that sometimes but doesn’t always require simulating an agent, and if that agent is capable of making itself implement the identity function for interesting subsets of its inputs, then it can potentially show up as a daemon for any input that didn’t require simulating it.
Why couldn’t you just use a smaller circuit that runs one single-step simulator, and outputs the result? It seems like that would output an accurate prediction of Paul’s behavior iff the k-step simulator outputs an accurate prediction.