gwern comments on $100/$50 rewards for good references

gwern 3 Dec 2021 18:12 UTC
LW: 2 AF: 1
AF

What we’d want is some neural-net style design that generates the coin reward and the move-right reward just from the game data, without any previous knowledge of the setting.

So you’re looking for curriculum design/exploration in meta-reinforcement-learning? Something like Enhanced POET/PLR/REPAIRED but where it’s not just moving-right but a complicated environment with arbitrary reward functions (eg. using randomly initialized CNNs to map state to ‘reward’)? Or would hindsight or successor methods count as they relabel rewards for executed trajectories? Would relatively complex generative games like Alchemy or LIGHT count? Self-play, like robotics self-play?
- Stuart_Armstrong 4 Apr 2022 11:33 UTC
  LW: 2 AF: 2
  AF Parent
  Hey there! Sorry for the delay. $50 awarded to you for fastest good reference. PM me your bank details.