Could you explain how this differs from the standard Reinforcement Learning formulation? (See eg. http://incompleteideas.net/book/first/ebook/node28.html for an introduction)
Current theme: default
Less Wrong (text)
Less Wrong (link)
Could you explain how this differs from the standard Reinforcement Learning formulation? (See eg. http://incompleteideas.net/book/first/ebook/node28.html for an introduction)