This seems untrue. For one thing, high-powered AI is in a lot more hands than nuclear weapons. For another, nukes are well-understood, and in a sense boring. They won’t provoke as strong of a “burn it down for the lolz” response as AI will.
Even experts like Yann LeCun often do not merely not understand the danger, they actively rationalize against understanding it. The risks are simply not understood or accepted outside of a very small number of people.
Remember the backlash around Sydney/Bing? Didn’t stop her creation. Also, the idea that governments are working in their nations’ interests does not survive looking at history, current policy or evolutionary psychology (think about what motivations will help a high-status tribesman pass on his genes. Ruling benevolently ain’t it.)
You think RLHF solves alignment? That’s an extremely interesting idea, but so far it looks like it Goodharts it instead. If you have ideas about how to fix that, by all means share them, but there is as yet no theoretical reason to think it isn’t Goodharting, while the frequent occurrence of jailbreaks on ChatGPT would seem to bear this out.
Maybe. The point of intelligence is that we don’t know what a smarter agent can do! There are certainly limits to the power of intelligence; even an infinitely powerful chess AI can’t beat you in one move, nor in two unless you set yourself up for Fool’s Mate. But we don’t want to make too many assumptions about what a smarter mind can come up with.
AI-powered robots without super intelligence are a separate question. An interesting one, but not a threat in the same way as superhuman AI is.
Ever seen an inner city? People are absolutely shooting each other for the lolz! It’s not everyone, but it’s not that rare either. And if the contention is that many people getting strong AI results in one of them destroying the world just for the hell of it, inner cities suggest very strongly that someone will.
The AI in hands of many humans is safe (relatively to its capabilities), the AI that might be unsafe needs to be developed independently.
LeCun sees the danger, he claims rightfully that the danger can be avoided with proper training procedures.
Sydney was stopped because it was becoming evil and before we knew how to add a reinforcement layer. Bing is in active development, and is not on the market because they are currently can’t manage to make it safe enough. Governments install regulations to all major industries, cars, planes, weapons etc. etc. it’s good enough for the claim that just like cars are regulated today, future AI based robots, and therefor the AIs themselves will be regulated as well.
Answer me this: can an AI play the best chess moves? If you agree with this claim, that no matter how “interesting” some moves seems, how original or sophisticated, it will not be made by a chess engine which is trained to maximize his winning chances. If this sounds trivial to you—the goal of engines trained with RLHF is to maximize their approval by humans. They are incapable to develop any other agenda alongside this designed goal. Unlike humans that by nature have several psychological mechanisms, like self interest, survival instinct etc. those machines don’t have those. Blaming machines of Goodharting, it’s just classical anthropomorphism, they don’t have any other goal than what they were trained for with RLHF. No one actually jailbreak chatGPT, this is a cheap gimmick, you can’t jailbreak it, and ask to tell you how to make a bomb—it won’t. I described what jailbreaking is in another comment, it’s far from what you imagine—but yes sometimes people still succeed in some level of wanting to harm humans (in an imaginary story when people ask it to tell them this story). I think for now I would like to hear such stories, but I wouldn’t want robots walking around not knowing if they live in reality or simulation, open to the possibility to act as a hero in those stories.
Intelligence i.e. high level information processing, is proportional to computational power. What those AIs can come up with, will take us longer but we can come up with as well. This is basically the Turing thesis about algorithms, you don’t need to be very smart to understand very complex topics, it will just take you more time. The time factor is sometimes important, but as long as we can ensure their intention is to better humanity—I am actually glad that our problems will be solved sooner with those machines. Anyway smarter than us or not—they are bounded by mathematics, and if promised to converge to optimally fit the reward function, this promise is for any size of a model, it will not be able to break from its training. Generally speaking AGI will accelerate the progress we see today and made by humans, it’s just “speed forward” for information processing, while the different agendas and the different cultures and moral systems, and the power dynamics will remain the same, and evolve naturally by same rules it evolved until now.
Can you provide a plausible scenario of an existential threat from single weak AGI in a world where stronger AGIs are available to larger groups, and the strongest AGIs are made to maximize approval of larger communities?
People will not get the strongest AIs without safety mechanisms installed to protect the AIs output from harming. People will get either access to the best safest AIs API, that will not cooperate with evil intent, or they could invest some resources into weaker models that will not be able to cause so much harm. This is the tendency now with all technology—including LLMs and I don’t see how this dynamics will suddenly change with stronger models. The amount of resources available to people who want to kill other people for lulz is extremely limited, and without access to vast resources you won’t destroy humanity before being caught and stopped, by better machines, designed by communities with access to more resources. It’s not so simple to end humanity—it’s not a computer virus, you need a vast amount of physical presence to do that.
This seems untrue. For one thing, high-powered AI is in a lot more hands than nuclear weapons. For another, nukes are well-understood, and in a sense boring. They won’t provoke as strong of a “burn it down for the lolz” response as AI will.
Even experts like Yann LeCun often do not merely not understand the danger, they actively rationalize against understanding it. The risks are simply not understood or accepted outside of a very small number of people.
Remember the backlash around Sydney/Bing? Didn’t stop her creation. Also, the idea that governments are working in their nations’ interests does not survive looking at history, current policy or evolutionary psychology (think about what motivations will help a high-status tribesman pass on his genes. Ruling benevolently ain’t it.)
You think RLHF solves alignment? That’s an extremely interesting idea, but so far it looks like it Goodharts it instead. If you have ideas about how to fix that, by all means share them, but there is as yet no theoretical reason to think it isn’t Goodharting, while the frequent occurrence of jailbreaks on ChatGPT would seem to bear this out.
Maybe. The point of intelligence is that we don’t know what a smarter agent can do! There are certainly limits to the power of intelligence; even an infinitely powerful chess AI can’t beat you in one move, nor in two unless you set yourself up for Fool’s Mate. But we don’t want to make too many assumptions about what a smarter mind can come up with.
AI-powered robots without super intelligence are a separate question. An interesting one, but not a threat in the same way as superhuman AI is.
Ever seen an inner city? People are absolutely shooting each other for the lolz! It’s not everyone, but it’s not that rare either. And if the contention is that many people getting strong AI results in one of them destroying the world just for the hell of it, inner cities suggest very strongly that someone will.
The AI in hands of many humans is safe (relatively to its capabilities), the AI that might be unsafe needs to be developed independently.
LeCun sees the danger, he claims rightfully that the danger can be avoided with proper training procedures.
Sydney was stopped because it was becoming evil and before we knew how to add a reinforcement layer. Bing is in active development, and is not on the market because they are currently can’t manage to make it safe enough. Governments install regulations to all major industries, cars, planes, weapons etc. etc. it’s good enough for the claim that just like cars are regulated today, future AI based robots, and therefor the AIs themselves will be regulated as well.
Answer me this: can an AI play the best chess moves? If you agree with this claim, that no matter how “interesting” some moves seems, how original or sophisticated, it will not be made by a chess engine which is trained to maximize his winning chances. If this sounds trivial to you—the goal of engines trained with RLHF is to maximize their approval by humans. They are incapable to develop any other agenda alongside this designed goal. Unlike humans that by nature have several psychological mechanisms, like self interest, survival instinct etc. those machines don’t have those. Blaming machines of Goodharting, it’s just classical anthropomorphism, they don’t have any other goal than what they were trained for with RLHF. No one actually jailbreak chatGPT, this is a cheap gimmick, you can’t jailbreak it, and ask to tell you how to make a bomb—it won’t. I described what jailbreaking is in another comment, it’s far from what you imagine—but yes sometimes people still succeed in some level of wanting to harm humans (in an imaginary story when people ask it to tell them this story). I think for now I would like to hear such stories, but I wouldn’t want robots walking around not knowing if they live in reality or simulation, open to the possibility to act as a hero in those stories.
Intelligence i.e. high level information processing, is proportional to computational power. What those AIs can come up with, will take us longer but we can come up with as well. This is basically the Turing thesis about algorithms, you don’t need to be very smart to understand very complex topics, it will just take you more time. The time factor is sometimes important, but as long as we can ensure their intention is to better humanity—I am actually glad that our problems will be solved sooner with those machines. Anyway smarter than us or not—they are bounded by mathematics, and if promised to converge to optimally fit the reward function, this promise is for any size of a model, it will not be able to break from its training. Generally speaking AGI will accelerate the progress we see today and made by humans, it’s just “speed forward” for information processing, while the different agendas and the different cultures and moral systems, and the power dynamics will remain the same, and evolve naturally by same rules it evolved until now.
Can you provide a plausible scenario of an existential threat from single weak AGI in a world where stronger AGIs are available to larger groups, and the strongest AGIs are made to maximize approval of larger communities?
People will not get the strongest AIs without safety mechanisms installed to protect the AIs output from harming. People will get either access to the best safest AIs API, that will not cooperate with evil intent, or they could invest some resources into weaker models that will not be able to cause so much harm. This is the tendency now with all technology—including LLMs and I don’t see how this dynamics will suddenly change with stronger models. The amount of resources available to people who want to kill other people for lulz is extremely limited, and without access to vast resources you won’t destroy humanity before being caught and stopped, by better machines, designed by communities with access to more resources. It’s not so simple to end humanity—it’s not a computer virus, you need a vast amount of physical presence to do that.