Michael Simkin comments on Top lesson from GPT: we will probably destroy humanity “for the lulz” as soon as we are able.

Michael Simkin 21 Apr 2023 10:32 UTC
1 point
0
1. The AI in hands of many humans is safe (relatively to its capabilities), the AI that might be unsafe needs to be developed independently.
2. LeCun sees the danger, he claims rightfully that the danger can be avoided with proper training procedures.
3. Sydney was stopped because it was becoming evil and before we knew how to add a reinforcement layer. Bing is in active development, and is not on the market because they are currently can’t manage to make it safe enough. Governments install regulations to all major industries, cars, planes, weapons etc. etc. it’s good enough for the claim that just like cars are regulated today, future AI based robots, and therefor the AIs themselves will be regulated as well.
4. Answer me this: can an AI play the best chess moves? If you agree with this claim, that no matter how “interesting” some moves seems, how original or sophisticated, it will not be made by a chess engine which is trained to maximize his winning chances. If this sounds trivial to you—the goal of engines trained with RLHF is to maximize their approval by humans. They are incapable to develop any other agenda alongside this designed goal. Unlike humans that by nature have several psychological mechanisms, like self interest, survival instinct etc. those machines don’t have those. Blaming machines of Goodharting, it’s just classical anthropomorphism, they don’t have any other goal than what they were trained for with RLHF. No one actually jailbreak chatGPT, this is a cheap gimmick, you can’t jailbreak it, and ask to tell you how to make a bomb—it won’t. I described what jailbreaking is in another comment, it’s far from what you imagine—but yes sometimes people still succeed in some level of wanting to harm humans (in an imaginary story when people ask it to tell them this story). I think for now I would like to hear such stories, but I wouldn’t want robots walking around not knowing if they live in reality or simulation, open to the possibility to act as a hero in those stories.
5. Intelligence i.e. high level information processing, is proportional to computational power. What those AIs can come up with, will take us longer but we can come up with as well. This is basically the Turing thesis about algorithms, you don’t need to be very smart to understand very complex topics, it will just take you more time. The time factor is sometimes important, but as long as we can ensure their intention is to better humanity—I am actually glad that our problems will be solved sooner with those machines. Anyway smarter than us or not—they are bounded by mathematics, and if promised to converge to optimally fit the reward function, this promise is for any size of a model, it will not be able to break from its training. Generally speaking AGI will accelerate the progress we see today and made by humans, it’s just “speed forward” for information processing, while the different agendas and the different cultures and moral systems, and the power dynamics will remain the same, and evolve naturally by same rules it evolved until now.
6. Can you provide a plausible scenario of an existential threat from single weak AGI in a world where stronger AGIs are available to larger groups, and the strongest AGIs are made to maximize approval of larger communities?
7. People will not get the strongest AIs without safety mechanisms installed to protect the AIs output from harming. People will get either access to the best safest AIs API, that will not cooperate with evil intent, or they could invest some resources into weaker models that will not be able to cause so much harm. This is the tendency now with all technology—including LLMs and I don’t see how this dynamics will suddenly change with stronger models. The amount of resources available to people who want to kill other people for lulz is extremely limited, and without access to vast resources you won’t destroy humanity before being caught and stopped, by better machines, designed by communities with access to more resources. It’s not so simple to end humanity—it’s not a computer virus, you need a vast amount of physical presence to do that.