DL so far has been easy to predict—if you bought into a specific theory of connectionism & scaling espoused by Schmidhuber, Moravec, Sutskever, and a few others, as I point out in https://www.gwern.net/newsletter/2019/13#what-progress & https://www.gwern.net/newsletter/2020/05#gpt-3 . Even the dates are more or less correct! The really surprising thing is that that particular extreme fringe lunatic theory turned out to be correct. So the question is, was everyone else wrong for the right reasons (similar to the Greeks dismissing heliocentrism for excellent reasons yet still being wrong), or wrong for the wrong reasons, and why, and how can we prevent that from happening again and spending the next decade being surprised in potentially very bad ways?
Personally, these two comments have kicked me into thinking about theories of AI in the same context as also-ran theories of physics like vortex atoms or the Great Debate. It really is striking how long one person with a major prior success to their name can push for a theory when the evidence is being stacked against it.
A bit closer to home than DM and GB, it also feels like a lot of AI safety people have missed the mark. It’s hard for me to criticise too loudly because, well, ‘AI anxiety’ doesn’t show up in my diary until June 3rd (and that’s with a link to your May newsletter). But a lot of AI safety work increasingly looks like it’d help make a hypothetical kind of AI safe, rather than helping with the prosaic ones we’re actually building.
I’m committing something like the peso problem here in that lots of safety work was—is—influenced by worries about the worst-case world, where something self-improving bootstraps itself out of something entirely innocuous. In that sense we’re kind of fortunate that we’ve ended up with a bloody language model fire-alarmof all things, but I can’t claim that helps me sleep at night.
I’m imagining a tiny AI Safety organization, circa 2010, that focused on how to achieve probable alignment for scaled-up versions of that year’s state-of-the-art AI designs. It’s interesting to ask whether that organization would have achieved more or less than MIRI has, in terms of generalizable work and in terms of field-building.
Certainly it would have resulted in a lot of work that was initially successful but ultimately dead-end. But maybe early concrete results would have attracted more talent/attention/respect/funding, and the org could have thrown that at DL once it began to win the race.
On the other hand, maybe committing to 2010′s AI paradigm would have made them a laughingstock by 2015, and killed the field. Maybe the org would have too much inertia to pivot, and it would have taken away the oxygen for anyone else to do DL-compatible AI safety work. Maybe it would have stated its problems less clearly, inviting more philosophical confusion and even more hangers-on answering the wrong questions.
Or, worst, maybe it would have made a juicy target for a hostile takeover. Compare what happened to nanotechnology research (and nanotech safety research) when too much money got in too early—savvy academics and industry representatives exiled Drexler from the field he founded so that they could spend the federal dollars on regular materials science and call it nanotechnology.
One thing they could have achieved was dataset and leaderboard creation (MSCOCO, GLUE, and imagenet for example). These have tended to focus and help research and persist in usefulness for some time, as long as they are chosen wisely.
a lot of AI safety work increasingly looks like it’d help make a hypothetical kind of AI safe
I think there are many reasons a researcher might still prioritize non-prosaic AI safety work. Off the top of my head:
You think prosaic AI safety is so doomed that you’re optimizing for worlds in which AGI takes a long time, even if you think it’s probably soon.
There’s a skillset gap or other such cost, such that reorienting would decrease your productivity by some factor (say, .6) for an extended period of time. The switch only becomes worth it in expectation once you’ve become sufficiently confident AGI will be prosaic.
Disagreement about prosaic AGI probabilities.
Lack of clear opportunities to contribute to prosaic AGI safety / shovel-ready projects (the severity of this depends on how agentic the researcher is).
Entirely seriously: I can never decide whether the drunkard’s search is a parable about the wisdom in looking under the streetlight, or the wisdom of hunting around in the dark.
I think the drunkard’s search is about the wisdom of improving your tools. Sure, spend some time out looking, but let’s spend a lot of time making better streetlights and flashlights, etc.
Look at, for example, Moravec. His extrapolation assumes that supercomputer will not be made available for AI work until AI work has already been proven successful (correct) and that AI will have to wait for hardware to become so powerful that even a grad student can afford it with $1k (also correct, see AlexNet), and extrapolating from ~1998, estimates:
At the present rate, computers suitable for humanlike robots will appear in the 2020s.
Feels worth pasting in this other comment of yours from last week, which dovetails well with this:
Personally, these two comments have kicked me into thinking about theories of AI in the same context as also-ran theories of physics like vortex atoms or the Great Debate. It really is striking how long one person with a major prior success to their name can push for a theory when the evidence is being stacked against it.
A bit closer to home than DM and GB, it also feels like a lot of AI safety people have missed the mark. It’s hard for me to criticise too loudly because, well, ‘AI anxiety’ doesn’t show up in my diary until June 3rd (and that’s with a link to your May newsletter). But a lot of AI safety work increasingly looks like it’d help make a hypothetical kind of AI safe, rather than helping with the prosaic ones we’re actually building.
I’m committing something like the peso problem here in that lots of safety work was—is—influenced by worries about the worst-case world, where something self-improving bootstraps itself out of something entirely innocuous. In that sense we’re kind of fortunate that we’ve ended up with a bloody language model fire-alarm of all things, but I can’t claim that helps me sleep at night.
I’m imagining a tiny AI Safety organization, circa 2010, that focused on how to achieve probable alignment for scaled-up versions of that year’s state-of-the-art AI designs. It’s interesting to ask whether that organization would have achieved more or less than MIRI has, in terms of generalizable work and in terms of field-building.
Certainly it would have resulted in a lot of work that was initially successful but ultimately dead-end. But maybe early concrete results would have attracted more talent/attention/respect/funding, and the org could have thrown that at DL once it began to win the race.
On the other hand, maybe committing to 2010′s AI paradigm would have made them a laughingstock by 2015, and killed the field. Maybe the org would have too much inertia to pivot, and it would have taken away the oxygen for anyone else to do DL-compatible AI safety work. Maybe it would have stated its problems less clearly, inviting more philosophical confusion and even more hangers-on answering the wrong questions.
Or, worst, maybe it would have made a juicy target for a hostile takeover. Compare what happened to nanotechnology research (and nanotech safety research) when too much money got in too early—savvy academics and industry representatives exiled Drexler from the field he founded so that they could spend the federal dollars on regular materials science and call it nanotechnology.
One thing they could have achieved was dataset and leaderboard creation (MSCOCO, GLUE, and imagenet for example). These have tended to focus and help research and persist in usefulness for some time, as long as they are chosen wisely.
Predicting and extrapolating human preferences is a task which is part of nearly every AI Alignment strategy. Yet we have few datasets for it, the only ones I found are https://github.com/iterative/aita_dataset, https://www.moralmachine.net/
So this hypothetical ML Engineering approach to alignment might have achieved some simple wins like that.
EDIT Something like this was just released Aligning AI With Shared Human Values
I think there are many reasons a researcher might still prioritize non-prosaic AI safety work. Off the top of my head:
You think prosaic AI safety is so doomed that you’re optimizing for worlds in which AGI takes a long time, even if you think it’s probably soon.
There’s a skillset gap or other such cost, such that reorienting would decrease your productivity by some factor (say, .6) for an extended period of time. The switch only becomes worth it in expectation once you’ve become sufficiently confident AGI will be prosaic.
Disagreement about prosaic AGI probabilities.
Lack of clear opportunities to contribute to prosaic AGI safety / shovel-ready projects (the severity of this depends on how agentic the researcher is).
Entirely seriously: I can never decide whether the drunkard’s search is a parable about the wisdom in looking under the streetlight, or the wisdom of hunting around in the dark.
I think the drunkard’s search is about the wisdom of improving your tools. Sure, spend some time out looking, but let’s spend a lot of time making better streetlights and flashlights, etc.
In the Gwern quote, what does “Even the dates are more or less correct!” refer to? Which dates were predicted for what?
Look at, for example, Moravec. His extrapolation assumes that supercomputer will not be made available for AI work until AI work has already been proven successful (correct) and that AI will have to wait for hardware to become so powerful that even a grad student can afford it with $1k (also correct, see AlexNet), and extrapolating from ~1998, estimates:
Guess what year today is.