I’m not objecting to your assertion that some sort of search takes place. I’m objecting to your characterization of what sorts of objectives the search ends up pointed towards. Basically, I’m saying that “situationally activated heuristics that steer towards environment-dependent goals” is totally in-line with a simplicity prior over cognitive structures leading to a search-like process.
The whole reason you say that we should expect search processes is because they can compress many different environment and beliefs dependent plans into a simpler generator of such plans (the search), which takes in environment info, beliefs, and the agent’s simple, supposedly environment-independent, objectives, and produces a plan. So, the agent only needs to store the search process and its environment-independent objectives.
I’m saying you can apply a similar “compress into an environment / beliefs conditioned generator” trick to the objectives as well, and get a generator of objectives that condition on the environment and current beliefs to produce objectives for the search process.
Thus, objectives remain environment-dependent, and will probably steer towards world states that resemble those which were rewarded during training. I think this is quite similar to “a spread of situationally-activated computations which steer its actions towards historical reward-correlates”, if involving rather more sophisticated cognition than phrases like “contextually activated heuristics” often imply.
I’m not objecting to your assertion that some sort of search takes place. I’m objecting to your characterization of what sorts of objectives the search ends up pointed towards. Basically, I’m saying that “situationally activated heuristics that steer towards environment-dependent goals” is totally in-line with a simplicity prior over cognitive structures leading to a search-like process.
The whole reason you say that we should expect search processes is because they can compress many different environment and beliefs dependent plans into a simpler generator of such plans (the search), which takes in environment info, beliefs, and the agent’s simple, supposedly environment-independent, objectives, and produces a plan. So, the agent only needs to store the search process and its environment-independent objectives.
I’m saying you can apply a similar “compress into an environment / beliefs conditioned generator” trick to the objectives as well, and get a generator of objectives that condition on the environment and current beliefs to produce objectives for the search process.
Thus, objectives remain environment-dependent, and will probably steer towards world states that resemble those which were rewarded during training. I think this is quite similar to “a spread of situationally-activated computations which steer its actions towards historical reward-correlates”, if involving rather more sophisticated cognition than phrases like “contextually activated heuristics” often imply.