The first order effect of iterating on these AIs to print gold is bad and probably reduces the amount of serial time we have left to mine out existing research directions. But given that you’re going to do that, it seems honestly better that these models be terribly broken and thereby teach public lessons about reliably steering modern deep learning models. I would rather they break now while stakes are low.
… but at some point, it doesn’t matter how much you know, because you can’t “steer” the thing, and even if you can a bunch of other people will be mis-steering it in ways that affect you badly.
I would suggest that maybe some bad experiences might create political will to at least forcibly slow the whole thing down some, but OpenAI already knows as much as the public is likely to learn, and is still doing this. And OpenAI isn’t the only one. Given that, it’s hard to hope the public’s increased knowledge will actually cause it to restrain them from continuing to increase capability as fast as possible and give more access to outside resources as fast as possible.
It might even cause the public to underestimate the risks, if the public’s experience is that the thing only caused, um, quantitative-rather-than-qualitative escalations of already increasing annoyances like privacy breaches, largely unnoticed corporate manipulation of the options available in commercial transactions, largely unnoticed personal manipulation, petty vandalism, not at all petty attacks on infrastructure, unpredictable warfare tactics, ransomware, huge emergent breakdowns of random systems affecting large numbers of people’s lives, and the like. People are getting used to that kind of thing...
Within the last year, my guess on when we will get AGI went from “10-15 years” to “5-10 years” to “a few years” to “fuck I have no idea”.
We now have an AI that can pass the Turing test, reason, speak a ton of languages fluently, interact with millions of users, many of them dependent on it, some of them in love with it or mentally unstable, unsupervised; debug code, understand code, adapt code, write code, and most of all, also execute code; generate images and interpret images (e.g. captchas), access the fucking internet, interpret websites, access private files, access an AI that can do math, and make purchases? I’d give it a month till they have audio input and output, too. And give it access to the internet of things, to save you from starting your fucking roomba, at which point accessing a killbot (you know, the ones we tried and failed to get outlawed) is not that much harder. People have been successfully using it to make money, it is already plugged into financial sites like Klarna, and it is well positioned to pull of identity fraud. One of the red teams had it successfully hiring people via TaskRabbit to do stuff it could not. Another showed that it was happy to figure out how to make novel molecules, and find ways to order them—they did it for pharmaceuticals, but what with the role AI currently plays in protein folding problems and drug design, I find Eliezers nanotech scenario not far fetched. The plug-ins into multiple different AIs effectively mimic how the human brain handles a diversity of tasks. OpenAI has already said they observed agentic and power-seeking behaviour, they say so in their damn paper. And the technical papers also makes clear that the red teamers didn’t sign off on deployment at all. And the damn thing is creative. Did you read its responses to how you could kill a maximum number of humans while only spending one dollar? From buying a pack of matches and barricading a church to make sure the people you set on fire cannot get out, to stabbing kids in a kindergarden as they do not run fast enough, to infecting yourself with diseases in a medical waste dump and running around as a fucking biobomb… The alignment OpenAI did is a thin veneer, done after training, to suppress answers according to patterns. The original trained AI showed no problems with writing letters writing gang bang rape threats, or dog whistling to fellow nazis. None at all. And it showed sudden, unanticipated and inexplicable capability gains in training.
I want AGI. I am an optimistic, hopeful, open-minded, excited person. I adore a lot about LLMs. I still think AGI could be friendly and wonderful. But this is going from “disruptive to a terrible economic system” (yay, I like some good disruption) to “obvious security risk” (well, I guess I will just personally avoid these use cases...) to “let’s try and make AGI and unleash it and just see what happens, I wonder which of us will get it first and whether it will feel murderous?”.
I guess the best hope at this point is that malicious use of LLMs by humans will be drastic enough that we get a wake-up call before we get an actually malicious LLM.
I am sympathetic to the lesson you are trying to illustrate but think you wildly overstate it.
Giving a child a sword is defensible. Giving a child a lead-coated sword is indefensible, because it damages the child’s ability to learn from the sword. This may be a more apt analogy for the situation of real life; equipping humanity with dangerous weapons that did not degrade our epistemology (nukes) eventually taught us not to use them. Equipping humanity with dangerous weapons that degrade our epistemology (advertising, propaganda, addictive substances) caused us to develop an addiction to the weapons. Languages models, once they become more developed, will be an example of the latter category.
The first order effect of iterating on these AIs to print gold is bad and probably reduces the amount of serial time we have left to mine out existing research directions. But given that you’re going to do that, it seems honestly better that these models be terribly broken and thereby teach public lessons about reliably steering modern deep learning models. I would rather they break now while stakes are low.
… but at some point, it doesn’t matter how much you know, because you can’t “steer” the thing, and even if you can a bunch of other people will be mis-steering it in ways that affect you badly.
I would suggest that maybe some bad experiences might create political will to at least forcibly slow the whole thing down some, but OpenAI already knows as much as the public is likely to learn, and is still doing this. And OpenAI isn’t the only one. Given that, it’s hard to hope the public’s increased knowledge will actually cause it to restrain them from continuing to increase capability as fast as possible and give more access to outside resources as fast as possible.
It might even cause the public to underestimate the risks, if the public’s experience is that the thing only caused, um, quantitative-rather-than-qualitative escalations of already increasing annoyances like privacy breaches, largely unnoticed corporate manipulation of the options available in commercial transactions, largely unnoticed personal manipulation, petty vandalism, not at all petty attacks on infrastructure, unpredictable warfare tactics, ransomware, huge emergent breakdowns of random systems affecting large numbers of people’s lives, and the like. People are getting used to that kind of thing...
But how long will the stakes stay low?
Within the last year, my guess on when we will get AGI went from “10-15 years” to “5-10 years” to “a few years” to “fuck I have no idea”.
We now have an AI that can pass the Turing test, reason, speak a ton of languages fluently, interact with millions of users, many of them dependent on it, some of them in love with it or mentally unstable, unsupervised; debug code, understand code, adapt code, write code, and most of all, also execute code; generate images and interpret images (e.g. captchas), access the fucking internet, interpret websites, access private files, access an AI that can do math, and make purchases? I’d give it a month till they have audio input and output, too. And give it access to the internet of things, to save you from starting your fucking roomba, at which point accessing a killbot (you know, the ones we tried and failed to get outlawed) is not that much harder. People have been successfully using it to make money, it is already plugged into financial sites like Klarna, and it is well positioned to pull of identity fraud. One of the red teams had it successfully hiring people via TaskRabbit to do stuff it could not. Another showed that it was happy to figure out how to make novel molecules, and find ways to order them—they did it for pharmaceuticals, but what with the role AI currently plays in protein folding problems and drug design, I find Eliezers nanotech scenario not far fetched. The plug-ins into multiple different AIs effectively mimic how the human brain handles a diversity of tasks. OpenAI has already said they observed agentic and power-seeking behaviour, they say so in their damn paper. And the technical papers also makes clear that the red teamers didn’t sign off on deployment at all. And the damn thing is creative. Did you read its responses to how you could kill a maximum number of humans while only spending one dollar? From buying a pack of matches and barricading a church to make sure the people you set on fire cannot get out, to stabbing kids in a kindergarden as they do not run fast enough, to infecting yourself with diseases in a medical waste dump and running around as a fucking biobomb… The alignment OpenAI did is a thin veneer, done after training, to suppress answers according to patterns. The original trained AI showed no problems with writing letters writing gang bang rape threats, or dog whistling to fellow nazis. None at all. And it showed sudden, unanticipated and inexplicable capability gains in training.
I want AGI. I am an optimistic, hopeful, open-minded, excited person. I adore a lot about LLMs. I still think AGI could be friendly and wonderful. But this is going from “disruptive to a terrible economic system” (yay, I like some good disruption) to “obvious security risk” (well, I guess I will just personally avoid these use cases...) to “let’s try and make AGI and unleash it and just see what happens, I wonder which of us will get it first and whether it will feel murderous?”.
I guess the best hope at this point is that malicious use of LLMs by humans will be drastic enough that we get a wake-up call before we get an actually malicious LLM.
I am sympathetic to the lesson you are trying to illustrate but think you wildly overstate it.
Giving a child a sword is defensible. Giving a child a lead-coated sword is indefensible, because it damages the child’s ability to learn from the sword. This may be a more apt analogy for the situation of real life; equipping humanity with dangerous weapons that did not degrade our epistemology (nukes) eventually taught us not to use them. Equipping humanity with dangerous weapons that degrade our epistemology (advertising, propaganda, addictive substances) caused us to develop an addiction to the weapons. Languages models, once they become more developed, will be an example of the latter category.