Postscript 1: Are humans direct or amortized optimizers?
There is actually a large literature in cognitive science which studies this exact question, although typically under the nomenclature of model-based vs model-free reinforcement learners. The answer appears to be that humans are both.
Deontic actions are actions for which the underlying policy has acquired a deontic value; namely, the shared, or socially admitted value of a policy (Constant et al., 2019). A deontic action is guided by the consideration of “what would a typical other do in my situation.” For instance, stopping at the red traffic light at 4 am when no one is present may be viewed as such a deontically afforded action.
This also roughly corresponds to the distinction between representationalism and dynamicism.
The Active Inference framework agrees: e.g., see Constant et al., 2021 (https://www.frontiersin.org/articles/10.3389/fpsyg.2020.598733/full), the distinction between direct and amortised optimisation manifests as planning-as-inference vs. so-called “deontic action”:
This also roughly corresponds to the distinction between representationalism and dynamicism.