Discussion of concrete near-to-middle term trends in AI
Instead of prognosticating on AGI/Strong AI/Singularities, I’d like to discuss more concrete advancements to expect in the near-term in AI. I invite those who have an interest in AI to discuss predictions or interesting trends they’ve observed.
This discussion should be useful for anyone looking to research or work in companies involved in AI, and might guide longer-term predictions.
With that, here are my predictions for the next 5-10 years in AI. This is mostly straightforward extrapolation, so it won’t excite those who know about these areas but may interest those who don’t:
Speech Processing, the task of turning the spoken words into text, will continue to improve until it is essentially a solved problem. Smartphones and even weaker devices will be capable of quite accurately transcribing heavily-accented speech in many languages and noisy environments. This is the simple continuation of the rapid improvements in speech processing that have allowed brought us from Dragon Naturally-Speaking to Google Now and Siri.
Assistant and intent-based (they try to figure out the “intent” of your input) systems, like Siri, that need to interpret a sentence as a particular command they are capable of, will become substantially more accurate and varied and take cues like tone and emphasis into account. So for example, if you’re looking for directions you won’t have to repeat yourself in an increasingly loud, slowed and annoyed voice. You’ll be able to phrase your requests naturally and conversationally. New tasks like “Should I get this rash checked out” will be available. A substantial degree of personalization and use of your personal history might also allow “show me something funny/sad/stimulating [from the internet]”.
Natural language processing, the task of parsing the syntax and semantics of language, will improve substantially. Look at this list of traditional tasks with standard benchmarks: on Wikipedia. Every one of these tasks will have a several percentage point improvement, particularly in the understudied areas of informal text (Chat logs, tweet, anywhere where grammar and vocabulary are less rigorous). It won’t get so good that it can be confused with solving AI-complete aspects of NLP, but it will allow vast improvements in text mining and information extraction. For instance, search queries like “What papers are critical of VerHoeven and Michaels ’08” or “Summarize what twitter thinks of the 2018 superbowl” will be answerable. Open source libraries will continue to improve from their current just-above-boutique state (NLTK, CoreNLP). Medical diagnosis based on analysis of medical texts will be a major area of research. Large-scale analysis of scientific literature in areas where it is difficult for researchers to read all relevant texts will be another. Machine translation will not be ready for most diplomatic business, but it will be very very good across a wide variety of languages.
Computer Vision, interpreting the geometry and contents of images an video, will undergo tremendous advances. In act, it already has in the past 5 years, but now it makes sense for major efforts, academic, military and industrial, to try to integrate different modules that have been developed for subtasks like object recognition, motion/gesture recognition, segmentation, etc. I think the single biggest impact this will have will be the foundation for robotics development, since a lot of the arduous work of interpreting sensor input will be partly taken care of by excellent vision libraries. Those general foundations will make it easy to program specialist tasks (like differentiating weeds from crops in an image, or identifying activity associated with crime in a video). This will be complemented by a general proliferation of cheap high-quality cameras and other sensors. Augmented reality also rests on computer vision, and the promise of the most fanciful tech demo videos will be realized in practice.
Robotics will advance rapidly. The foundational factors of computer vision, growing availability of cheap platforms, and fast progress on tasks like motion planning and grasping has the potential to fuel an explosion of smarter industrial and consumer robotics that can perform more complex and unpredictable tasks than most current robots. Prototype ideas like search-and-rescue robots, more complex drones, and autonomous vehicles will come to fruition (though 10 years may be too short a time frame for ubiquity). Simpler robots with exotic chemical sensors will have important applications in medical and environmental research.
- 13 Feb 2015 22:35 UTC; 2 points) 's comment on Open thread, Feb. 9 - Feb. 15, 2015 by (
Robotics will get scary very soon. quoted from link:
It’s debatable how much a “remote controlled helicopters with a camera” should fall under “robotics”; progress in that area seems pretty orthogonal to issues like manipulation and autonomy.
(Though on the other hand modern drones are better at mechanical control “just” remote control: good drones have a feedback loop so that they correct their position)
I think drones will probably serve as the driver of more advanced technologies—e.g. drones that can deposit and pick up payloads, ground-based remote-controlled robots with an integration of human and automatic motion control.
But I bet the $5,000 worth of drones doesn’t include the cost of buying armour-piercing explosives.
I have anti-predictions:
We won’t have robot butlers or maids in the next ten years.
Academic CV researchers will write a lot of papers, but there won’t be any big commercial successes that are based on dramatic improvements in CV science. This is a subtle point: there may be big CV successes, but they will be based on figuring out ways to use CV-like technology that avoids grappling with the real hardness of the problem. For example, the main big current uses of CV are in industrial applications where you can control precisely things like lighting, clutter, camera position, and so on.
Assistant and intent-based technology will continue to be annoying and not very useful.
Similar to CV, robotics will work okay when you can control precisely the nature of the task and environment. We won’t have, for example, robot construction workers.
Do driverless cars that drive on normal streets count?
Driverless cars are actually a good illustration of my point. These cars use CV at some level, but they depend fundamentally on laser range finders, GPS, and precompiled road maps. There’s no way modern CV alone could work reliably enough in such a potentially dangerous and legally fraught situation.
(for what it’s worth, I work on this robot for a living)
And how is it going?
Okay, though we’re still far from a true robot butler. I don’t know if we’re ten years away though, especially if you’re tolerant in what you expect a butler to be able to do (welcome guests, take their names, point them in the right direction, answer basic questions? We can already do it. Go up a flight of stairs? Not yet.)
You can always just weld the butler on top of Spot https://www.youtube.com/watch?v=M8YjvHYbZ9w (this does not seem to be a significant blocker)
Cool project! Do you think those robots are going to be a big commercial success?
There are already quite a few of them deployed in stores in Japan, interacting with customers, so for now it’s going okay :)
The prediction about CV doesn’t seem to have aged that well in my view. Others are going fairly well!
I would be surprised if any of these predictions come true. There have already been huge advances in machine vision and they are starting to beat humans at many tasks. Obviously it takes time for new technology to reach the market, but 10 years is plenty. Right now there are a number of startups working on it, and the big tech companies have hired all the researchers.
The idea that computers are better than humans at any kind of everyday vision task is just not true. Papers that report “better than human” performance typically just mean that their algorithms do better than cross-annotator agreement. The field should actually regard the fact that people can write papers reporting such things as more of an embarrassment than a success, since they are really illustrating a (profound) failure of the evaluation paradigm, not deep conceptual or technical achievements.
You don’t know what you are talking about. Last year’s ImageNet Large Scale Visual Recognition Challenge, the top competitor got 6.66% classification error on guessing the correct classification in 5 guesses.
A human tried this challenge and estimated his performance at 5.1%, and that requires extensive time practicing and finding reference images.
Just recently a paper came out reporting 4.94% error. And for the last few years, the best competitor has consistently halved the best error from the year before. So by the time this year’s competition comes out it should be down to 3%!
I’m not sure ImageNet is of sufficiently high quality that a 3% error rate is meaningful. No point in overfitting noise in the supposed right labels. I think the take-away is that image recognition has gotten really good and now we need a new benchmark/corpus, possibly focused on the special-cases where humans still seem better.
You are only actually disagreeing with Daniel in so far as
in the ILSVRC is actually
which is far from clear to me.
Well the algorithms used are fairly general. If you can classify an image, you can detect the objects in them and where they are.
The tasks are high interrelated. In classification they search different parts of the images at different scales to try to find a match. And in localization they run a general classifier across the image and find where it detects objects.
In fact the classifier is now being used to actually describe images in natural language.
None of that has much to do with whether the task in question is an “everyday vision task”.
(And: How closely did you read the article about a human trying the challenge? Something like 2⁄3 of his errors were (1) a matter of not being able to identify specific varieties of dog etc. reliably, (2) not being familiar with the specific set of 1000 labels used by the ILSVRC, and (3) not having seen enough examples—typically of particular varieties of dog etc. -- in the training set to be able to make a good call. I think the comparison of error rates gives a poor indication of relative performance—unless what you’re mostly interested in is classifying breeds of dog, I guess.)
He estimates an ensemble of humans could get up to 3% error, under extremely idealistic and totally hypothetical conditions, and with lots of hindsight bias over the mistakes he made the first time.
I did mention that even getting 5% error requires extreme amount of effort sorting through reference images and stuff. While the machine can spit out answers in milliseconds.
In the next few years computers will mop up humans on all vision tasks. Machine vision is quite nearly a solved problem.
I’m not saying “I think humans will always get scores better than computers on this task”. I’m saying:
Score on this task is clearly related to actual object recognition ability, but as the error rates get low and we start looking at the more difficult examples the relationship gets more complicated and it starts to be important to look at what kind of failures we’re seeing on each side.
What humans find difficult here is fine-grained identification of a zillion different breeds of dog, coping with having an objectively-inadequate training set (presumably to avoid intolerable boredom), and keeping track of the details of what categories the test is concerned with.
What computers find difficult here is identifying small or thin things, identifying things whose colours and contrast are unexpected, identifying things that are at unexpected angles, identifying things represented “indirectly” (paintings, models, shadows, …), identifying objects when there are a bunch of other objects also in the frame, identifying objects parts of which are obscured by other things, identifying objects by labels on them, …
To put it differently, it seems to me that almost none of the problems that a skilled human has here are actually vision failures in any useful sense, whereas most of the problems the best computers have are. And that while it’s nice that images that elicit these failures are fairly rare in the ILSVRC dataset, it’s highly plausible that difficulty in handling such images might be a much more serious handicap in “everyday vision tasks” than not being able to distinguish between dozens of species of dog, or finding it difficult to remember hundreds of specific categories that one’s expected to classify things into.
For the avoidance of doubt, I think identifying ILSVRC images with ~95% accuracy (in the sense relevant here) is really impressive. Doing it in milliseconds, even more so. There is no question that in some respects computer vision is already way ahead of human vision. But this is not at all the same thing as saying computers are better overall at “any kind of everyday vision task” and I think the evidence from ILSVRC results is that there are some quite fundamental ways in which computers are still much worse at vision than humans, and it’s not obvious to me that their advantages are going to make up for those deficiencies in the next few years.
They might. The best computers are now much better at chess than the best humans overall, even though there are (I think) still some quite fundamental things they do worse than humans. Perhaps vision is like chess in this respect. But I don’t see that the evidence is there yet that it is.
You’ve been making very confident pronouncements in this discussion, and telling other people they don’t know what they’re talking about. May I ask what your expertise is in this area? E.g., are you a computer vision researcher yourself? (I am not. I’m a mathematician working in industry, I’ve spent much of my career working with computer input devices, and have seen many times how something can (1) work well 99% of the time and (2) be almost completely unusable because of that last 1%. But there’s no AI in these devices and the rare failures of something like GoogLeNet may be less harmful.)
All sorts of really cool stuff in the next few years. Deepmind had amazing results on reinforcement learning beating Atari games with just raw video data (video.) Google bought them a month later for half a billion dollars.
Reinforcement learning is of interest because it’s not just machine learning, predicting outputs given inputs. It is AI, it’s very general.
Some other interesting work is neural turing machines. The NNs can learn to take advantage of an infinite tape. As opposed to learning individual memory and i/o cells, they can operate on arrays. So you can theoretically learn arbitrary programs with gradient descent.
Deep neural networks have shown a huge amount of progress in a lot of AI domains, from vision to natural language, speech. Recently a paper showed they could predict the move an expert Go player would make 44% of the time.
Machine vision has consistently been decreasing the error rate by half every year for the past few years, and just today surpassed human performance.
Stuart Russel said that there has been more investment in AI in the last five years, than since the field was founded. And it’s increasing exponentially.
I think it’s well within the realm of possibility we could get strong AI within 10-20 years.
When Roomba came out I expected vast progress by now. Some company would actually make one that works all the time for the whole house. Now I am not second guessing the the IRobot corporation—maybe they could do it but the market is happy now. How hard is it with today’s know how to make one that
doesn’t get stuck on rugs, cords, clothes, or under things ever
can remember where it needs to clean and you don’t have to use virtual walls
can remember how to get back to its docking station before its battery runs out every single time
make a docking station where it can drop off its dirt so I don’t have to check it more then once a month
Its stuff like this that makes me wonder how much progress we are actually making. Is it a solved problem with no market (at the price point) or is it a problem in robotics?
I find it strange too. I was looking at Roombas 2 months ago because I was wondering if it would make cleaning up after my cat easier, and I experienced a feeling of deja vu looking at the Amazon listings: “these prices, physical shapes, features, and ratings… they look almost exactly the same as I remember them being a decade ago”.
I don’t know. It’s not like robotics in general has stagnated—iRobot has done a lot of robots beside the Roomba (and has pretty good sales, although I wonder how much comes from their military customers, which they seem to really be focusing on); and the robots that General Dynamics has been showing off, like their latest “Spot” quadruped, are simply astonishing.
I wonder if Roombas are trapped in a local optimum: you can’t improve a small disc-shaped wheeled robot vaccuum much beyond what it is now without completely changing the design (appendages like hands would help it get unstuck or pick up stuff) or much improved battery technology?
Roomba’s “intelligence” is a bag of random numbers with some constraints on it. Their competitor is a bunch brainier in terms of room mapping and general navigation; for instance, it doesn’t require a special beacon to tell it where a doorway is.
If true, that just sharpens the question: why isn’t iRobot improving their Roombas’ software if a competitor is doing so?
They’re not selling brains; they’re selling convenience?
Stupid Roombas don’t seem very convenient. (I don’t think people enjoy getting their Roombas out of corners or stuck places.) Or do you mean that the Neatos, despite their greater intelligence, are much more inconvenient in some other way (also explaining why Roombas continue to sell as much as they do)?
I’m guessing stuff like dropping off dirt or running out of battery could be solved without any AI improvements, so they are probably problems iRobot has decided aren’t worth solving at the moment.
I agree. I was just trying to motivate my rant.
doesn’t try to kill you in your sleep
Automated proving and conjecturing systems will also continue to improve in mathematics. I predict that within 20 years a major conjecture will be made that is essentially found by a computer with no human intervention. Note that this has already happened in some limited contexts for minor math problems. See e.g. here. More narrowly, I’d be willing to be that within 30 years a computer program will discover some commutative diagram on its own which was previously not known to do so.
Any thoughts beyond the applications of NLP, computer vision, and robotics?
That’s what I know most about. I could go into much more depth on any of them.
I think Go, the board game, will likely fall to the machines. The driving engine of advances will shift somewhat from academia to industry.
Basic statistical techniques are advancing, but not nearly as fast as these more downstream applications, partly because they’re harder to put to work in industry. But in general we’ll have substantially faster algorithms to solve many probabilistic inference problems, much the same way that convex programming solvers will be faster. But really, model specification has already become the bottleneck for many problems.
I think at the tail end of 10 years we might start to see the integration of NLP-derived techniques into computer program analysis. Simple prototypes of this are on the bleeding edge in academia, so it’ll take a while. I don’t know exactly what it would look like, beyond better bug identification.
What more specific things would you like thoughts on?
This is a sucker bet. I don’t know if you’ve kept up to date, but AI techniques for Go-playing have advanced dramatically over the last couple of years, and they’re rapidly catching up to the best human players. They’ve already passed the 1-dan mark.
Interestingly, from my reading this is by way of general techniques rather than writing programs that are terribly specialized to Go.
Advanced quickly for a while due to a complete change in algorithm, but then we seem to have hit a plateau again. It’s still an enormous climb to world champion level. It’s not obvious that this will be achieved.
Right—I agree that Go computers will beat human champions.
In a sense you’re right that the techniques are general, but are they the general techniques that work specifically for Go, if you get what I’m saying. That is, would the produce similar improvements when applied to Chess or other games? I don’t know but it’s always something to ask.
Advances in planning engines, knowledge representation and concept forming, and agent behavior would be interesting predictions to have, I think. Also any opinion you have on AGI if you care to share.
I think NLP, text mining and information extraction have essentially engulfed knowledge representation.
You can take large text corpora like the and extract facts (like Obama IS President of the US) using fairly simple parsing techniques (and soon, more complex ones) put this in your database in either semi-raw form (e.g. subject—verb—object, instead of trying to transform verb into a particular relation) or use a small variety of simple relations. In general it seems that simple representations (that could include non-interpretable ones real-valued vectors) that accommodate complex data and high-powered inference are more powerful than trying to load more complexity into the data’s structure.
Problems with logic-based approaches don’t have a clear solution, other than to replace logic with probabilistic inference. In the real world, logical quantifiers and set-subset relations are really really messy. For instance a taxonomy of dogs is true and useful from a genetic perspective, but from a functional perspective a chihuahua may be more similar to a cat than a St. Bernard. I think instead of solving that with a profusion of logical facts in a knowledge base, it might be solved by non-human interpretable vector-based representations produced from, say, a million youtube videos of chihuahuas and a billion words of text on chihuahuas.
Google’s Knowledge Graph is a good example of this in action.
I know very little about planning and agents. Do you have any thoughts on them?
You’re still thinking in a NLP mindset :P
By knowledge representation and concept formation I meant something more general than linguistic fact storage. For example seeing lots of instances of chairs and not just being able to recognize other instances of chairs – machine learning handles that – but also derive that the function of a chair is to provide a shape that enables bipedal animals to support their bodies in a resting position. It would then be able to derive that an adequately sized flat rock could also serve as a chair, even as it doesn’t match the training set.
Or to give another example, given nothing but a large almanac of accurate planet sightings from a fixed location on the Earth, derive first the heliocentric model then a set of differential equations governing their motion (Kepler’s laws). As an Ockham causal model, predict a 1/r^2 attractive force to explain these laws. Then notice an object can travel between these objects by adjusting their speed relative to the central object, the Sun. It might also notice that for the Earth, the only object it has rotational information about, it is possible for an object to fall around the Earth at such a distance that it remains at a fixed location in the sky.
The latter example isn’t science fiction btw. It was accomplished by Pat Langley’s BACON program in the 70’s and 80’s (but sadly this area hasn’t seen much work since). I think it would be interesting to see what happens if machine learning and modern big data and knowledge representation systems were combined with this sort of model formation and concept mixing codes.
Probabilistic inference is interesting and relevant, I think, because where it doesn’t suffer from combinatorial explosion it is able to make inferences that require an inordinate number of example cases for statistical methods. Combined with concept nets, it’s possible to teach such a system with just one example per learned concept, which is very efficient. The trick of course is identifying those +1 examples.
Regarding planning and agents… they already run our lives. Obviously self-driving cars will be a big thing, but I hesitate from making predictions because it is what we don’t foresee that will have the largest impact, typically.
I am in the NLP mindset. I don’t personally predict much progress on the front you described. Specifically, I think this is because industrial uses mesh well with the machine learning approach. You won’t ask an app “where could I sit” because you can figure that out. You might ask it ’what brand of chair is that” though, at which point your app has to have some object recognition abilities.
So you mean agent in the sense that an autonomous taxi would be an agent, or an Ebay bidding robot? I think there’s more work in economics, algorithmic game theory and operations research on those sorts of problems than in anything I’ve studied a lot of. These fields are developing, but I don’t see them as being part of AI (since the agents are still quite dumb).
For the same reason, a program that figures out the heliocentric model mainly interests academics.
There is work on solvers that try to fit simple equations to data, I’m not that familiar.
I’m not asking for sexy predictions; I’m explicitly looking for more grounded ones, stuff that wouldn’t win you much in a prediction market if you were right but which other people might not be informed about.
Does anyone know if any companies are applying NLP to software? Specifically, to the software ASTs (abstract syntax trees)?
I have been playing around with unfolding autoencoders and feeding them Python code but if there are researchers or companies doing similar I’d be interested in hearing about it.
Learning to Execute—they feed a neural network python code character by character, and have it predict what the program will output.
Thanks, this is helpful
I’ve formalized your implied prediction for speech processing on Predictionbook here. Please let me know if that’s a fair summary of your prediction. For your other statements I am not able to make them precise enough in obvious ways for using on Predictionbook. Are there more specific predictions you would like to make in those fields?
That’s too strong. For instance, multi-person and high-noise environments will still have room for improvement. Unpopular languages will lag behind in development. I’d consider “solved’ to mean that the speech-processing element of a Babelfish-like vocal translator would work seamlessly across many many languages and virtually all environments.
I’d say it will be just below the level of a trained stenographer with something like 80% probability, and “solved” (somewhat above that level in many different languages) with 30% probability.
With 98% probability it will be good enough that your phone won’t make you repeat yourself 3 times for a simple damn request for directions.
Clarified version here.
Robots existing now—pets and toys, vacuum cleaners, military equipment and the development of private companies that will not soon be widely available. In addition, a full-fledged AI will not be seen for a long time. Although there are very interesting developments—the company Festo develops new types of robots based on living beings. Here’s an interesting video about their flying robots. Video
However, unfortunately, existing copies have a huge number of problems. Humanity is still far from creating perfect machines like in Treminator or Transformers so we do not have enough strong but flexible materials, and the development of technology even in 2018 leaves much to be desired. At home pets, this is a fairly obvious dullness and limited functions. The robot cleaner has problems with orientation in the space of the room and with replacing batteries—https://bestvacuum.reviews/roomba-replacement-batteries/. In short, there is much to develop.