Another way to understand this distinction is to think about the limit of infinite compute, where direct and amortized optimizers converge to different solutions. A direct optimizer, in the infinite compute limit, will simply find the optimal solution to the problem. An amortized optimizer would find the Bayes-optimal posterior over the solution space given its input data.
Doesn’t this depend on how, practically, the compute is used? Would this still used in a setting where the compute can be used to generate the dataset?
Taking the example of training an RL agent, an increase in compute will mostly (and more effectively) be used towards getting the agent more experience rather than using a larger network or model. ‘Infinite’ compute would be used to generate a correspondingly ‘infinite’ dataset, rather than optimising the model ‘infinitely’ well on some fixed experience.
In this case, while it remains true that the amortised optimiser will act closely following the data distribution it was trained on, this will include all outcomes—just like in the case of the direct optimiser, unless the data generation policy (ie. exploration) doesn’t preclude visiting some of the states. So as long as the unwanted solutions that would be generated by a direct optimiser find a way in the dataset, the amortised agent will learn to produce them.
In the limit of compute, to have an (amortised) agent with predictable behaviour you would have to analyse / prune this dataset. Wouldn’t that be as costly and hard as pruning the search of a direct optimiser of undesired outcomes?
Doesn’t this depend on how, practically, the compute is used? Would this still used in a setting where the compute can be used to generate the dataset?
Taking the example of training an RL agent, an increase in compute will mostly (and more effectively) be used towards getting the agent more experience rather than using a larger network or model. ‘Infinite’ compute would be used to generate a correspondingly ‘infinite’ dataset, rather than optimising the model ‘infinitely’ well on some fixed experience.
In this case, while it remains true that the amortised optimiser will act closely following the data distribution it was trained on, this will include all outcomes—just like in the case of the direct optimiser, unless the data generation policy (ie. exploration) doesn’t preclude visiting some of the states. So as long as the unwanted solutions that would be generated by a direct optimiser find a way in the dataset, the amortised agent will learn to produce them.
In the limit of compute, to have an (amortised) agent with predictable behaviour you would have to analyse / prune this dataset. Wouldn’t that be as costly and hard as pruning the search of a direct optimiser of undesired outcomes?