Stuart, does it count against my entry that it’s not actually a very novel idea? (If so, I might want to think about other ideas to submit.)
What is the exact relationship between all these ideas? What are the pros and cons of doing human imitation using this kind of counterfactual/online-learning setup, versus other training methods such as GAN (see Safe training procedures for human-imitators for one proposal)? It seems like there are lots of posts and comments about human imitations spread over LW, Arbital, Paul’s blog and maybe other places, and it would be really cool if someone (with more knowledge in this area than I do) could write a review/distillation post summarizing what we know about it so far.
It looks like my entry is pretty close to the ideas of Human-in-the-counterfactual-loop and imitation learning and apprenticeship learning. Questions:
Stuart, does it count against my entry that it’s not actually a very novel idea? (If so, I might want to think about other ideas to submit.)
What is the exact relationship between all these ideas? What are the pros and cons of doing human imitation using this kind of counterfactual/online-learning setup, versus other training methods such as GAN (see Safe training procedures for human-imitators for one proposal)? It seems like there are lots of posts and comments about human imitations spread over LW, Arbital, Paul’s blog and maybe other places, and it would be really cool if someone (with more knowledge in this area than I do) could write a review/distillation post summarizing what we know about it so far.
I encourage you to submit other ideas anyway, since your ideas are good.
Not sure yet about how all these things relate; will maybe think of that more later.