I think it’s hard because it requires some planning and puzzle solving in a new, somewhat complex environment. The AI results on Montezuma’s Revenge seem pretty unimpressive to me because they’re going to a new room, trying random stuff until they make progress, then “remembering” that for future runs. Which means they need quite a lot of training data.
For short term RL given lots of feedback, there are already decent results e.g. in starcraft and DOTA. So the difficulty is more figuring out how to automatically scope out narrow RL problems that can be learned without too much training time.
I think it’s hard because it requires some planning and puzzle solving in a new, somewhat complex environment. The AI results on Montezuma’s Revenge seem pretty unimpressive to me because they’re going to a new room, trying random stuff until they make progress, then “remembering” that for future runs. Which means they need quite a lot of training data.
For short term RL given lots of feedback, there are already decent results e.g. in starcraft and DOTA. So the difficulty is more figuring out how to automatically scope out narrow RL problems that can be learned without too much training time.