Are you familiar with MIRI’s technical agenda? You may also want to check out the AI Impacts project, if you think we should be prioritizing forecasting work at this point rather than object-level mathematical research.
Yes I’m familiar with the technical agenda. What do you mean by “forecasting work”—AI impacts? That seems to be of near-zero utility to me.
What MIRI should be doing, what I’ve advocated MIRI to do from the start, and which I can’t get a straight answer on why they are not doing that does not in some way terminate in referencing the more speculative sections of the sequences I take issue with, is this: build artificial general intelligence and study it. Not a provably-safe-from-first-principles-before-we-touch-a-single-line-of-code AGI. Just a regular, run of the mill AGI using any one of the architectures presently being researched in the artificial intelligence community. Build it and study it.
The closer we get to AGI, the more profitable further improvements in AI capabilities become. This means that the more we move the clock toward AGI, the more likely we are to engender an AI arms race between different nations or institutions, and the more (apparent) incentives there are to cut corners on safety and security. At the same time, AGI is an unusual technology in that it can potentially be used to autonomously improve on our AI designs—so that the more advanced and autonomous AI becomes, the likelier it is to undergo a speed-up in rates of improvement (and the likelier these improvements are to be opaque to human inspection). Both of these facts could make it difficult to put the brakes on AI progress.
Both of these facts also make it difficult to safely ‘box’ an AI. First, different groups in an arms race may simply refuse to stop reaping the economic or military/strategic benefits of employing their best AI systems. If there are many different projects that are near or at AGI-level when your own team suddenly stops deploying your AI algorithms and boxes them, it’s not clear there is any force on earth that can compel all other projects to freeze their work too, and to observe proper safety protocols. We are terrible at stopping the flow of information, and we have no effective mechanisms in place to internationally halt technological progress on a certain front. It’s possible we could get better at this over time, but the sooner we get AGI, the less intervening time we’ll have to reform our institutions and scientific protocols.
A second reason speed-ups make it difficult to safely box an AGI is that we may not arrest its self-improvement in the (narrow?) window between ‘too dumb to radically improve on our understanding of AGI’ and ‘too smart to keep in a box’. We can try to measure capability levels, but only using imperfect proxies; there is no actual way to test how hard it would be for an AGI to escape a box beyond ‘put the AGI in the box and see what happens’. Which means we can’t get much of a safety assurance until after we’ve done the research you’re talking about us doing on the boxed AI. If you aren’t clear on exactly how capable the AI is, or how well measures of its apparent capabilities in other domains transfer to measures of its capability at escaping boxes, there are limits to how confident you can be that the AI is incapable of finding clever methods to bridge air gaps, or simply adjusting its software in such a way the methods we’re using to inspect and analyze the AI compromise the box.
‘AGI’ is not actually a natural kind. It’s just an umbrella term for ‘any mind we could build that’s at least as powerful as a human’. Safe, highly reliable AI in particular is likely to be an extremely special and unusual subcategory. Studying a completely arbitrary AGI may tell as about as much about how to build a safe AGI as studying nautilus ecology would tell us about how to safely keep bees and farm their honey. Yes, they’re both ‘animals’, and we probably could learn a lot, but not as much as if we studied something a bit more bee-like. But in this case that presupposes that we understand AI safety well enough to build an AGI that we expect to look at least a little like our target safe AI. And our understanding just isn’t there yet.
We already have seven billion general intelligences we can study in the field, if we so please; it’s not obvious that a rushed-to-completion AGI would resemble a highly reliable safe AGI in all that much more detail than humans resemble either of those two hypothetical AGIs.
(Of course, our knowledge would obviously improve! Knowing about a nautilus and a squirrel really does tell us a lot more about beekeeping than either of those species would on its own, assuming we don’t have prior experience with any other animals. But if the nautilus is a potential global catastrophic risk, we need to weigh those gains against the risk and promise of alternative avenues of research.)
Yes I’m familiar with the technical agenda. What do you mean by “forecasting work”—AI impacts? That seems to be of near-zero utility to me.
What MIRI should be doing, what I’ve advocated MIRI to do from the start, and which I can’t get a straight answer on why they are not doing that does not in some way terminate in referencing the more speculative sections of the sequences I take issue with, is this: build artificial general intelligence and study it. Not a provably-safe-from-first-principles-before-we-touch-a-single-line-of-code AGI. Just a regular, run of the mill AGI using any one of the architectures presently being researched in the artificial intelligence community. Build it and study it.
A few quick concerns:
The closer we get to AGI, the more profitable further improvements in AI capabilities become. This means that the more we move the clock toward AGI, the more likely we are to engender an AI arms race between different nations or institutions, and the more (apparent) incentives there are to cut corners on safety and security. At the same time, AGI is an unusual technology in that it can potentially be used to autonomously improve on our AI designs—so that the more advanced and autonomous AI becomes, the likelier it is to undergo a speed-up in rates of improvement (and the likelier these improvements are to be opaque to human inspection). Both of these facts could make it difficult to put the brakes on AI progress.
Both of these facts also make it difficult to safely ‘box’ an AI. First, different groups in an arms race may simply refuse to stop reaping the economic or military/strategic benefits of employing their best AI systems. If there are many different projects that are near or at AGI-level when your own team suddenly stops deploying your AI algorithms and boxes them, it’s not clear there is any force on earth that can compel all other projects to freeze their work too, and to observe proper safety protocols. We are terrible at stopping the flow of information, and we have no effective mechanisms in place to internationally halt technological progress on a certain front. It’s possible we could get better at this over time, but the sooner we get AGI, the less intervening time we’ll have to reform our institutions and scientific protocols.
A second reason speed-ups make it difficult to safely box an AGI is that we may not arrest its self-improvement in the (narrow?) window between ‘too dumb to radically improve on our understanding of AGI’ and ‘too smart to keep in a box’. We can try to measure capability levels, but only using imperfect proxies; there is no actual way to test how hard it would be for an AGI to escape a box beyond ‘put the AGI in the box and see what happens’. Which means we can’t get much of a safety assurance until after we’ve done the research you’re talking about us doing on the boxed AI. If you aren’t clear on exactly how capable the AI is, or how well measures of its apparent capabilities in other domains transfer to measures of its capability at escaping boxes, there are limits to how confident you can be that the AI is incapable of finding clever methods to bridge air gaps, or simply adjusting its software in such a way the methods we’re using to inspect and analyze the AI compromise the box.
‘AGI’ is not actually a natural kind. It’s just an umbrella term for ‘any mind we could build that’s at least as powerful as a human’. Safe, highly reliable AI in particular is likely to be an extremely special and unusual subcategory. Studying a completely arbitrary AGI may tell as about as much about how to build a safe AGI as studying nautilus ecology would tell us about how to safely keep bees and farm their honey. Yes, they’re both ‘animals’, and we probably could learn a lot, but not as much as if we studied something a bit more bee-like. But in this case that presupposes that we understand AI safety well enough to build an AGI that we expect to look at least a little like our target safe AI. And our understanding just isn’t there yet.
We already have seven billion general intelligences we can study in the field, if we so please; it’s not obvious that a rushed-to-completion AGI would resemble a highly reliable safe AGI in all that much more detail than humans resemble either of those two hypothetical AGIs.
(Of course, our knowledge would obviously improve! Knowing about a nautilus and a squirrel really does tell us a lot more about beekeeping than either of those species would on its own, assuming we don’t have prior experience with any other animals. But if the nautilus is a potential global catastrophic risk, we need to weigh those gains against the risk and promise of alternative avenues of research.)
Was any of that unclear?