Daniel Kokotajlo comments on Seriously, what goes wrong with “reward the agent when it makes you smile”?

Daniel Kokotajlo 15 Aug 2022 23:18 UTC
4 points
0
Seems like you can have a yet-simpler policy by factoring the fixed “simple objective(s)” into implicit, modular elements that compress many different objectives that may be useful across many different environments. Then at runtime, you feed the environmental state into your factored representation of possible objectives and produce a mix of objectives tailored to your current environment, which steer towards behaviors that achieved high reward on training runs similar to the current environment.
Can you explain why this policy is yet-simpler? It sounds more complicated to me.
- Quintin Pope 16 Aug 2022 19:39 UTC
  2 points
  0
  Parent
  I’m saying that it’s simpler to have a goal generator that can be conditioned on the current environment, rather than memorizing each goal individually.