I might see a possible source of a “miracle”, although this may turn out to be completely unrealistic and I totally would not bet the world on it actually happening.
A lot of today’s machine learning systems can do some amazing things, but much of the time trying to get them to do what you want is like pulling teeth. Early AGI systems might have similar problems: their outputs might be so erratic that it’s obvious that they can’t be relied on to do anything at all; you tell them to maximize paperclips, and half the time they start making clips out of paper and putting them together to make a statue of Max, or something equally ridiculous and obviously wrong. Systems made by people who have no idea that they need to figure out how to align an AI end up as useless failed projects before they end up dangerous.
In practice, though, we should never underestimate the ingenuity of fools...
How does this help anything or change anything? That’s just the world we’re in now, where we have GPT-3 instead of AGI. Eventually the systems get more powerful and dangerous than GPT-3 and then the world ends. You’re just describing the way things already are.
I’m imagining that systems get much stronger without getting much more “aimable”, if that makes sense; they solve problems, but when you ask them to solve things they keep solving the wrong problem in a way that’s sufficiently obvious that makes actually using them pointless. Instead of getting the equivalent of paperclip maximizers, you get a random mind that “wants” things that are so incoherent that they don’t do much of anything at all, and this fact forces people to give up and decide that investing further in general AI capacity without first making investments in AI control/”alignment” is useless.
Maybe that’s just my confusion or stupidity talking, though. And I did call it a “miracle” that the ability to make a seemingly useful AGI ends up bottlenecking on alignment research rather than raw capacity research because the default unaligned AGI is an incoherent mess that does random ineffective things when operating “out of sample” rather than a powerful optimization process that destroys the world.
It’s not obvious to me that this scenario concentrates net probability mass onto ‘things go awesome for humanity long-term’. Making everything harder might mean that alignment is also harder. A few extra years of chaos doesn’t buy us anything unless we’re actively nailing down useful robust AGI during that time.
(There is some extra hope in ‘For some reason, humanity has working AGIs for a little while before anyone can destroy the world, and this doesn’t make alignment much harder’, though I’d assume there are other, much larger contributors-of-hope in any world like that where things actually go well.)
I might see a possible source of a “miracle”, although this may turn out to be completely unrealistic and I totally would not bet the world on it actually happening.
A lot of today’s machine learning systems can do some amazing things, but much of the time trying to get them to do what you want is like pulling teeth. Early AGI systems might have similar problems: their outputs might be so erratic that it’s obvious that they can’t be relied on to do anything at all; you tell them to maximize paperclips, and half the time they start making clips out of paper and putting them together to make a statue of Max, or something equally ridiculous and obviously wrong. Systems made by people who have no idea that they need to figure out how to align an AI end up as useless failed projects before they end up dangerous.
In practice, though, we should never underestimate the ingenuity of fools...
How does this help anything or change anything? That’s just the world we’re in now, where we have GPT-3 instead of AGI. Eventually the systems get more powerful and dangerous than GPT-3 and then the world ends. You’re just describing the way things already are.
I’m imagining that systems get much stronger without getting much more “aimable”, if that makes sense; they solve problems, but when you ask them to solve things they keep solving the wrong problem in a way that’s sufficiently obvious that makes actually using them pointless. Instead of getting the equivalent of paperclip maximizers, you get a random mind that “wants” things that are so incoherent that they don’t do much of anything at all, and this fact forces people to give up and decide that investing further in general AI capacity without first making investments in AI control/”alignment” is useless.
Maybe that’s just my confusion or stupidity talking, though. And I did call it a “miracle” that the ability to make a seemingly useful AGI ends up bottlenecking on alignment research rather than raw capacity research because the default unaligned AGI is an incoherent mess that does random ineffective things when operating “out of sample” rather than a powerful optimization process that destroys the world.
It’s not obvious to me that this scenario concentrates net probability mass onto ‘things go awesome for humanity long-term’. Making everything harder might mean that alignment is also harder. A few extra years of chaos doesn’t buy us anything unless we’re actively nailing down useful robust AGI during that time.
(There is some extra hope in ‘For some reason, humanity has working AGIs for a little while before anyone can destroy the world, and this doesn’t make alignment much harder’, though I’d assume there are other, much larger contributors-of-hope in any world like that where things actually go well.)