I don’t know if you’ll consider these reward-function-learning posts as candidates, or whether you’ll see them are rephrasing of old ideas. There are some new results in there—the different value functions and the fact that Q-learning agents can learn un-riggable learning processes is new.
Hey there!
I don’t know if you’ll consider these reward-function-learning posts as candidates, or whether you’ll see them are rephrasing of old ideas. There are some new results in there—the different value functions and the fact that Q-learning agents can learn un-riggable learning processes is new.
https://www.lesswrong.com/posts/55hJDq5y7Dv3S4h49/reward-function-learning-the-value-function
https://www.lesswrong.com/posts/upLot6eG8cbXdKiFS/reward-function-learning-the-learning-process
In any case, thanks for running these competitions!