In the real network, there are a lot more than two activations. Our results involve a 32,768-dimensional cheese vector, subtracted from about halfway through the network:
Did you try other locations in the network?
I would expect it to work pretty much anywhere, and I’m interested to know if my prediction is correct.
I’m pretty sure that what happens is (as you also suggest) that the agent stops seeing the cheese.
Imagine you did the cheese subtraction on the input layer (i.e. the pixel values of the maze). In this case this just trivially removed the cheese from the picture, resulting in behaviour that is identical to no cheese. So I expect something similar to happen to later layer, as long as what the network is mostly doing is just de-coding the image. So at what ever layer this trick stops working, this should mean that the agent has started planing it’s moves.
Did you try other locations in the network?
I would expect it to work pretty much anywhere, and I’m interested to know if my prediction is correct.
I’m pretty sure that what happens is (as you also suggest) that the agent stops seeing the cheese.
Imagine you did the cheese subtraction on the input layer (i.e. the pixel values of the maze). In this case this just trivially removed the cheese from the picture, resulting in behaviour that is identical to no cheese. So I expect something similar to happen to later layer, as long as what the network is mostly doing is just de-coding the image. So at what ever layer this trick stops working, this should mean that the agent has started planing it’s moves.