A Ray comments on What problem would you like to see Reinforcement Learning applied to?

A Ray 17 Jul 2020 1:34 UTC
27 points
I’m surprised this hasn’t got more comments. Julian, I’ve been incredibly impressed by your work in RL so far, and I’m super excited to see what you end up working on next.
I hope folks will forgive me just putting down some opinions about what problems in RL to work on:
I think I want us (RL, as a field) to move past games—board games, video games, etc—and into more real-world problems.
Where to go looking for problems?
These are much harder to make tractable! Most of the unsolved problems are very hard. I like referencing the NAE’s Engineering Grand Challenges and the UN’s Sustainable/Millennium Development Goals when I want to think about global challenges. Each one is much bigger than a research project, but I find them “food for thought” when I think about problems to work on.
What characteristics probably make for good problems for deep RL?
1. Outside of human factors—either too big for humans, or too small, or top fast, or too precise, etc.
2. Episodic/resettable—has some sort of short periodicity, giving bounds on long-term credit assignment
3. Already connected to computers—solving a task with RL in a domain that isn’t already hooked up to software/sensors/computers is going to be 99% setup and 1% RL
4. Supervised/Unsupervised failed—I think in general it makes sense only to try RL after we’ve tried the simpler methods and they’ve failed to work (perhaps the data is too few, or labels too weak/noisy)
What are candidate problem domains?
Robotics is usually the first thing people say, so best just get it out of the way first. I think this is exactly right, but I think the robots we have access to today are terrible, so this turns into mostly a robot design problem with a comparatively smaller ML problem on top. (After working with robots & RL for years I have hours of this but saving that for another time)
Automatic control systems is underrated as a domain. Many problems involving manufacturing with machines involve all sorts of small/strange adjustments to things like “feed rate” “rotor speed” “head pressure” etc etc etc. Often these are tuned/adjusted by people who build up intuition over time, then transfer intuition to other humans. I expect it would be possible for RL to learn how to “play” these machines better and faster than any human. (Machines include: CNC machines, chemical processing steps, textile manufacture machines, etc etc etc)
Language models have been very exciting to me lately, and I really like this approach to RL with language models: https://openai.com/blog/fine-tuning-gpt-2/ I think the large language models are a really great substrate to work with (so far much better than robots!) but specializing them to particular purposes remains difficult. I think having much better RL science here would be really great.
Some ‘basic research’ topics
Fundamental research into RL scaling. It seems to me that we still don’t really have a great understanding of the science of RL. Compared to scaling laws in other domains, RL is hard to predict, and has a much less well understood set of scaling laws (model size, batch size, etc etc). https://arxiv.org/abs/2001.08361 is a great example of the sort of thing I’d like to have for RL.
Multi-objective RL. In general if you ask RL people about multi-objective, you’ll get a “why don’t you just combine them into a single objective” or “just use one as an aux goal”, but it’s much more complex than that in the deep RL case, where the objective changes the exploration distribution. I think having multiple objectives is a much more natural way of expressing what we want systems to do. I’m very excited about at some point having transferrable objectives (since there are many things we want many systems to do, like “don’t run into the human” and “don’t knock over the shelf”, in addition to whatever specific goal).
Trying to find some concrete examples, I’m coming up short.
I’m sorry I didn’t meet the recommendation for replies, but glad to have put something on here. I think this is far too few replies for a question like this.