What evidence is there for/against the hypothesis that this NN benefits from extra thinking time because it uses the extra time to do planning/lookahead/search?
As opposed to, for instance, using the extra time to perform more more complex convolutional feature extraction on the current state, effectively simulating a deeper (but non-recurrent) feedforward network?
(In Hubinger et al, “internally searching through a search space” is a necessary condition for being an “optimizer,” and hence for being a “mesaoptimizer.”)
What evidence is there for/against the hypothesis that this NN benefits from extra thinking time because it uses the extra time to do planning/lookahead/search?
As opposed to, for instance, using the extra time to perform more more complex convolutional feature extraction on the current state, effectively simulating a deeper (but non-recurrent) feedforward network?
(In Hubinger et al, “internally searching through a search space” is a necessary condition for being an “optimizer,” and hence for being a “mesaoptimizer.”)