This seems like a fun exercise, so I spent half an hour jotting down possibilities. I’m more interested in putting potential considerations on peoples’ radars and helping with brainstorming than I am in precision. None of these points are to be taken too seriously since this is fairly extemporaneous and mostly for fun.
2022
Multiple Codex alternatives are available. The financial viability of training large models is obvious.
Research models start interfacing with auxiliary tools such as browsers, Mathematica, and terminals.
2023
Large pretrained models are distinctly useful for sequential decision making (SDM) in interactive environments, displacing previous reinforcement learning research in much the same way BERT rendered most previous work in natural language processing wholly irrelevant. Now SDM methods don’t require as much tuning, can generalize with fewer samples, and can generalize better.
For all of ImageNet’s 1000 classes, models can reliably synthesize images that are realistic enough to fool humans.
Models have high enough accuracy to pass the multistate bar exam.
Models for contract review and legal NLP see economic penetration; it becomes a further source of economic value and consternation among attorneys and nontechnical elites. This indirectly catalyzes regulation efforts.
Programmers become markedly less positive about AI due to the prospect of reducing demand of some of their labor.
~10 trillion parameter (nonsparse) models attain human-level accuracy on LAMBADA (a proxy for human-level perplexity) and expert-level accuracy on LogiQA (a proxy for nonsymbolic reasoning skills). With models of this size, multiple other capabilities(this gives proxies for many capabilities) are starting to be useful, whereas with smaller models these capabilities were too unreliable to lean on. (Speech recognition started “working” only after it crossed a certain reliability threshold.)
Generated data (math, code, models posing questions for themselves to answer) help ease data bottleneck issues since Common Crawl is not enough. From this, many capabilities are bootstrapped.
Elon re-enters the fight to build safe advanced AI.
2024
A major chatbot platform offers chatbots personified through video and audio.
Although forms of search/optimization are combined with large models for reasoning tasks, state-of-the-art models nonetheless only obtain approximately 40% accuracy on MATH.
Chatbots are able to provide better medical diagnoses than nearly all doctors.
Video understanding finally reaches human-level accuracy on video classification datasets like Something Something V2. This comports with the heuristic that video understanding is around 10 years behind image understanding.
2025
Upstream vision advancements help autonomous driving but do not solve it for all US locations, as the long tail is really long.
ML models are competitive forecasters on platforms like Metaculus.
Nearly all AP high school homework and exam questions (including long-form questions) can be solved by answers generated from publicly available models. Similar models cut into typical Google searches since these models give direct and reliable answers.
Contract generation is now mostly automatable, further displacing attorneys.
2026
Machine learning systems become great at using Metasploit and other hacking tools, increasing the accessibility, potency, success rate, scale, stealth, and speed of cyberattacks. This gets severe enough to create global instability and turmoil. EAs did little to use ML to improve cybersecurity and reduce this risk.
Strong-upvoted because this was exactly the sort of thing I was hoping to inspire with this post! Also because I found many of your suggestions helpful.
I think model size (and therefore model ability) probably won’t be scaled up as fast as you predict, but maybe. I think getting models to understand video will be easier than you say it is. I also think that in the short term all this AI stuff will probably create more programming jobs than it destroys. Again, I’m not confident in any of this.
The 2023 predictions seem to hold up really well, so far, especially the SDM in interactive environment one, image synthesis, passing the bar exam, legal NLP systems, enthusiasm of programmers, and Elon Musk re-entering the space of building AI systems.
So far 2022 predictions were correct. There is Codegeex and others. Copilot, DALLE-2 and Stable Diffusion made financial prospects obvious (somewhat arguably).
ACT-1 is in a browser, I have neural search in Warp Terminal (not a big deal but qualifies), not sure about Mathematica but there was definitely significant progress in formalization and provers (Minerva).
And even some later ones
2023 ImageNet—nobody measured it exactly but probably already achievable.
2024 Chatbots personified through video and audio—Replica sort of qualifies?
This seems like a fun exercise, so I spent half an hour jotting down possibilities. I’m more interested in putting potential considerations on peoples’ radars and helping with brainstorming than I am in precision. None of these points are to be taken too seriously since this is fairly extemporaneous and mostly for fun.
2022
Multiple Codex alternatives are available. The financial viability of training large models is obvious.
Research models start interfacing with auxiliary tools such as browsers, Mathematica, and terminals.
2023
Large pretrained models are distinctly useful for sequential decision making (SDM) in interactive environments, displacing previous reinforcement learning research in much the same way BERT rendered most previous work in natural language processing wholly irrelevant. Now SDM methods don’t require as much tuning, can generalize with fewer samples, and can generalize better.
For all of ImageNet’s 1000 classes, models can reliably synthesize images that are realistic enough to fool humans.
Models have high enough accuracy to pass the multistate bar exam.
Models for contract review and legal NLP see economic penetration; it becomes a further source of economic value and consternation among attorneys and nontechnical elites. This indirectly catalyzes regulation efforts.
Programmers become markedly less positive about AI due to the prospect of reducing demand of some of their labor.
~10 trillion parameter (nonsparse) models attain human-level accuracy on LAMBADA (a proxy for human-level perplexity) and expert-level accuracy on LogiQA (a proxy for nonsymbolic reasoning skills). With models of this size, multiple other capabilities(this gives proxies for many capabilities) are starting to be useful, whereas with smaller models these capabilities were too unreliable to lean on. (Speech recognition started “working” only after it crossed a certain reliability threshold.)
Generated data (math, code, models posing questions for themselves to answer) help ease data bottleneck issues since Common Crawl is not enough. From this, many capabilities are bootstrapped.
Elon re-enters the fight to build safe advanced AI.
2024
A major chatbot platform offers chatbots personified through video and audio.
Although forms of search/optimization are combined with large models for reasoning tasks, state-of-the-art models nonetheless only obtain approximately 40% accuracy on MATH.
Chatbots are able to provide better medical diagnoses than nearly all doctors.
Adversarial robustness for CIFAR-10 (assuming an attacker with eps=8/255) is finally over 85%.
Video understanding finally reaches human-level accuracy on video classification datasets like Something Something V2. This comports with the heuristic that video understanding is around 10 years behind image understanding.
2025
Upstream vision advancements help autonomous driving but do not solve it for all US locations, as the long tail is really long.
ML models are competitive forecasters on platforms like Metaculus.
Nearly all AP high school homework and exam questions (including long-form questions) can be solved by answers generated from publicly available models. Similar models cut into typical Google searches since these models give direct and reliable answers.
Contract generation is now mostly automatable, further displacing attorneys.
2026
Machine learning systems become great at using Metasploit and other hacking tools, increasing the accessibility, potency, success rate, scale, stealth, and speed of cyberattacks. This gets severe enough to create global instability and turmoil. EAs did little to use ML to improve cybersecurity and reduce this risk.
Strong-upvoted because this was exactly the sort of thing I was hoping to inspire with this post! Also because I found many of your suggestions helpful.
I think model size (and therefore model ability) probably won’t be scaled up as fast as you predict, but maybe. I think getting models to understand video will be easier than you say it is. I also think that in the short term all this AI stuff will probably create more programming jobs than it destroys. Again, I’m not confident in any of this.
The 2023 predictions seem to hold up really well, so far, especially the SDM in interactive environment one, image synthesis, passing the bar exam, legal NLP systems, enthusiasm of programmers, and Elon Musk re-entering the space of building AI systems.
So far 2022 predictions were correct. There is Codegeex and others. Copilot, DALLE-2 and Stable Diffusion made financial prospects obvious (somewhat arguably).
ACT-1 is in a browser, I have neural search in Warp Terminal (not a big deal but qualifies), not sure about Mathematica but there was definitely significant progress in formalization and provers (Minerva).
And even some later ones
2023
ImageNet—nobody measured it exactly but probably already achievable.
2024
Chatbots personified through video and audio—Replica sort of qualifies?
40% on MATH already reached.
Oddly specific and correct. Cool.