Jonas Hallgren comments on Beren’s “Deconfusing Direct vs Amortised Optimisation”

Jonas Hallgren 5 Apr 2023 17:18 UTC
3 points
0
When reading this, I have a question of where between a quantiliser and optimiser amortised optimisation lies. Like, how much do we run into maximised VNM-utility style problems if we were to scale this up into AGI-like systems?
My vibe is that it seems less maximising than a pure RL version would, but then again, I’m not certain to what extent optimising for function approximation is different from optimising for a reward.
- DragonGod 5 Apr 2023 19:37 UTC
  5 points
  4
  Parent
  I think amortised optimisation doesn’t lie on the same spectrum as “quantiliser - (direct) optimiser” but is another dimension entirely. I.e. your question is like asking: “where between the x and y axis does the line for the z axis lie”?
  
  Amortised optimisation is just a fundamentally different approach where we learn to approximate some function from a dataset and then just evaluate the learned function.
  
  The behaviour of the amortised policy may look similar to a direct optimiser on the training distribution, but diverge arbitrarily far on another distribution where the correlation between the learned policy and a particular objective breaks down.