What we’d want is some neural-net style design that generates the coin reward and the move-right reward just from the game data, without any previous knowledge of the setting.
So you’re looking for curriculum design/exploration in meta-reinforcement-learning? Something like Enhanced POET/PLR/REPAIRED but where it’s not just moving-right but a complicated environment with arbitrary reward functions (eg. using randomly initialized CNNs to map state to ‘reward’)? Or would hindsight or successor methods count as they relabel rewards for executed trajectories? Would relatively complex generative games like Alchemy or LIGHT count? Self-play, like robotics self-play?
So you’re looking for curriculum design/exploration in meta-reinforcement-learning? Something like Enhanced POET/PLR/REPAIRED but where it’s not just moving-right but a complicated environment with arbitrary reward functions (eg. using randomly initialized CNNs to map state to ‘reward’)? Or would hindsight or successor methods count as they relabel rewards for executed trajectories? Would relatively complex generative games like Alchemy or LIGHT count? Self-play, like robotics self-play?
Hey there! Sorry for the delay. $50 awarded to you for fastest good reference. PM me your bank details.