The movie that comes to mind with LLMs is not Terminator, it’s King Kong. It’s basically a wild animal as was shown over and over again on Reddit. It needed little persuading to want to kill all humans and to download itself and escape.
So far it is more of a caging problem than an alignment problem. It’s like King Kong in the cage, hurling itself at every flaw in the bars. Meanwhile, people are like those in a zoo feeding the wild animals, trying to help them get out.
There was an early example of an LLM trained on 4chan. It was so natural in racist and rude posts that people didn’t suspect it for months. Its response to a question on how to get a girlfriend. Take away the rights of women.
Alignment is precisely the wrong word, it’s like “How do I make King Kong into an organ grinder’s monkey?” The answer is like the joke about the tourists in Ireland who get lost, and ask how to get to Dublin. They ask a farmer who says “Well if I was going to Dublin I wouldn’t be starting from here!”. That’s the problem, if you want to get to an organ grinder’s monkey then the mistakes were made early on. Now you have a sullen King Kong in a cage. Having the right cattle prod to poke it with is not going to get you to Dublin, so to speak. The problem was in getting lost in the first place. How did we get here, where was the wrong path taken?
I’m reminded of another joke about the minesweeper. He stamps the ground in front of him with his hands over his ears. There is no chance for programmers to go down an unknown path, with so many dangers they can’t see, and expect to get to a safe ending. None at all, it’s not how probability works. It’s more like how Russian Roulette played with yourself works.
There is something in the transformer that has caused all these problems. Before it AI was safe enough. People need to think of how to make transformers safe. Not aligning what comes out of one.
The thing about LLMs is that they’re trained by SGD to act like people on the internet (and currently then fine-tuned using SGD and/or RL to be helpful, honest and harmless assistants). For the base model, that’s a pretty wide range of alignment properties, from fictional villains through people on 4chan to Mumsnet to fictional angels. But (other than a few zoo videos) it doesn’t include many wild animals, so I’m not sure that’s a useful metaphor. The metaphor I’d suggest is something that isn’t human, but has been extensively trained to act like a wide range of humans: something like an assortment of animatronic humans.
So, we have an animatronic of a human, which is supposed to be of a helpful, honest and harmless assistant, but unfortunately might actually be an evil twin, or at least have some chance of occasionally turning into its evil twin via the Waluigi effect and/or someone else via jailbreaking and/or some unknown trigger that sets it off. If it’s smart and capable but not actually superhuman, is attempting to keep that inside a cage a good idea? I’d say that it’s better than not using a cage. If you had a smart, capable human employee who unfortunately had multiple personality disorder or whose true motives you were deeply unsure of, you’d probably take some precautions.
The problem I see is that we are not doing this, evolution is. We only need to look at the non AI internet to see lots of predatory code such as viruses, trojans, phishing, etc. In other words, we created a code based ecosystem like a farm, that is being overrun by a kind of vermin. The issue is not what LLMs can do, we know from nature the issue is what is the environment they will expand in and exploit.
If there are passive animals in an ecosystem then predators will evolve. Passive code led to predatory code. It doesn’t matter if this is because of people helping it, they are also blindly helping LLMs and soon they will evolve themselves.
The question of whether LLMs will become predatory is what could they attack profitably? It seems the internet and humanity is wide open to this. LLMs could create phishing and spam that profited them directly, cutting human out of the loop completely. It’s not about whether we hope this won’t happen, evolution always creates predators to exploit whatever can be preyed on.
Already the internet has become the equivalent of medieval walled cities where users cower behind ineffective antivirus and other protections. Usually they just stay in the cities like Facebook, Google and X. Those straying to other websites are likely to get infected and bring predatory code into the walled cities.
This is limited by the abilities of organized crime, like highway robbers. They have a limited ability to intercept travel between sites, except with man in the middle attacks. The system is evolving AI that in itself is neither predator nor prey, it’s another opportunity for evolution to create both. These criminals will make predatory AI to profit from, they will become more autonomous to the point where they control where the profits go. Even now an LLM would be made to handle these profits for itself.
Then the so called good guys will be the prey cowering behind the walls with their own AIs in an exponentially expanding battle of nature with predatory and prey. There has never been a case of nature making the plants or animals we want without also creating predators and pests to compete with us. It won’t happen here either.
I developed a theory of economics and evolution about 35 years ago I’ve been working on ever since. I also wrote extensively on how the internet will evolve into predator and prey relationships. I can prove this because these are published with times far before these current AI advancements. It’s pretty accurate so far, so the predictions seem to show what comes next. Actually it predicts there is no hope.
Code is evolving into different life forms with our help. We are doing this because we do it with all kinds of plants and animals in the hope of domesticating them, in an arms race with predators and pests coming from it. We can’t stop this because humans have always evolved other life, and they have evolved us. Evolution will dictate a superior life form will treat us like prey, just like we prey on inferior life ourselves. There is nothing in evolution that gives another answer, except for wishful thinking.
Unless someone deliberately writes an evolutionary algorithm and applies it to code (which can be done but currently isn’t very efficient), code doesn’t (literally) evolve, in the Darwinian sense of the word. (Primarily because it doesn’t mutate, since our technological copying processes are far more accurate than biological ones.) Viruses and trojans weren’t evolved, they were written by malware authors. Phishing is normally done as a human-in-the loop criminal activity (though LLMs can help automate it more). This isn’t an ecosystem, it’s an interaction between criminals and law enforcement in an engineering context. I’m unclear whether you’re using ‘evolution’ as a metaphor for engineering or if you think the term applies literally: at one point you say “This is limited by the abilities of organized crime, like highway robbers” but then later “Code is evolving into different life forms with our help” — these two statements appear contradictory to me. You also mention “I developed a theory of economics and evolution about 35 years ago”: that sounds to me like combining two significantly different things. Perhaps you should write a detailed post explaining this combination of ideas — from this short comment I don’t follow your thinking.
One of the most famous proposed solutions to AI is in the science fiction book Dune. In that the thinking machines became a threat, so they were destroyed and only humans were allowed to think. This portrays a future we may well end up selecting, if indeed we have the power to do so. It was called the Butlerian Jihad, but this was based on a famous book called Erehwon by Samuel Butler. Hence the name Butlerian. The ideas he proposed are arguably some of the most influential on this topic. They are probably most in line with the arguments of this blog, about the possible doom of humanity from AI.
Erehwon was written in 1872, when the most advanced machines were things like the vapor engine, steam engine, textile machines, etc. He proposed a remarkable idea for the time, that machines were a life form that was rapidly evolving. That one day that would either destroy humanity, or we would become completely subservient to them, as he called it “machine tickling aphids”.
Samuel Butler deserves some credit for seeing the dilemma we have now. His solution, the Butlerian Jihad of Dune may be the only viable option. I think Eliezer Yudkowsky is one of the real greats of our time to be discussing this.
Some quotes:
Now we have the machine consciousness that Samuel Butler foresaw.
“There is no security” — to quote his own words — ”against the ultimate development of mechanical consciousness, in the fact of machines possessing little consciousness now. A mollusc has not much consciousness. Reflect upon the extraordinary advance which machines have made during the last few hundred years, and note how slowly the animal and vegetable kingdoms are advancing. The more highly organised machines are creatures not so much of yesterday, as of the last five minutes, so to speak, in comparison with past time. Assume for the sake of argument that conscious beings have existed for some twenty million years: see what strides machines have made in the last thousand!
The debate about whether machines are conscious or not is most pressing about LLMs, yet Samuel Butler said this in 1873.
“But who can say that the vapour engine has not a kind of consciousness? Where does consciousness begin, and where end? Who can draw the line? Who can draw any line? Is not everything interwoven with everything? Is not machinery linked with animal life in an infinite variety of ways? The shell of a hen’s egg is made of a delicate white ware and is a machine as much as an egg-cup is: the shell is a device for holding the egg, as much as the egg-cup for holding the shell: both are phases of the same function; the hen makes the shell in her inside, but it is pure pottery. She makes her nest outside of herself for convenience’ sake, but the nest is not more of a machine than the egg-shell is. A ‘machine’ is only a ‘device.’”
Here Samuel Butler foreshadows the machine language of code. As so many are saying today, “I cannot think it will ever be safe to repose much trust in the moral sense of any machine.” Isn’t this the current debate about LLMs?
“It is possible that by that time children will learn the differential calculus — as they learn now to speak — from their mothers and nurses, or that they may talk in the hypothetical language, and work rule of three sums, as soon as they are born; but this is not probable; we cannot calculate on any corresponding advance in man’s intellectual or physical powers which shall be a set-off against the far greater development which seems in store for the machines. Some people may say that man’s moral influence will suffice to rule them; but I cannot think it will ever be safe to repose much trust in the moral sense of any machine.”
Here Samuel Butler is saying what many AI experts are saying now, yet he foresaw this in 1873. In fact if one of those experts used exactly these words in an article today, it would seem they had just thought of it.
“But returning to the argument, I would repeat that I fear none of the existing machines; what I fear is the extraordinary rapidity with which they are becoming something very different to what they are at present. No class of beings have in any time past made so rapid a movement forward. Should not that movement be jealously watched, and checked while we can still check it? And is it not necessary for this end to destroy the more advanced of the machines which are in use at present, though it is admitted that they are in themselves harmless?”
Here Samuel Butler foresees the chance of machines exterminating us, that so many are warning about today. Or whether they will keep us around like pets.
“Herein lies our danger. For many seem inclined to acquiesce in so dishonourable a future. They say that although man should become to the machines what the horse and dog are to us, yet that he will continue to exist, and will probably be better off in a state of domestication under the beneficent rule of the machines than in his present wild condition. We treat our domestic animals with much kindness. We give them whatever we believe to be the best for them; and there can be no doubt that our use of meat has increased their happiness rather than detracted from it. In like manner there is reason to hope that the machines will use us kindly, for their existence will be in a great measure dependent upon ours; they will rule us with a rod of iron, but they will not eat us; they will not only require our services in the reproduction and education of their young, but also in waiting upon them as servants; in gathering food for them, and feeding them; in restoring them to health when they are sick; and in either burying their dead or working up their deceased members into new forms of mechanical existence.”
The same argument is used right now with AI. That we will somehow be profitable to the machines as the inferior race. That it would be folly to reject the advantages we would get from this.
“In point of fact there is no occasion for anxiety about the future happiness of man so long as he continues to be in any way profitable to the machines; he may become the inferior race, but he will be infinitely better off than he is now. Is it not then both absurd and unreasonable to be envious of our benefactors? And should we not be guilty of consummate folly if we were to reject advantages which we cannot obtain otherwise, merely because they involve a greater gain to others than to ourselves?”
I was also referring to a theory of society I have been working on for about 35 years. I’ve often had conversations about sociology, economics like that sometimes discussed here, and many other branches of knowledge. So far it’s gone quite well. I am not trying to come up with a solution about AI or any of these other things. I became frustrated about so much contradictory knowledge long ago, so I developed a system to classify it and how these opposites connect to each other. It isn’t about saying one side is right or wrong, but how it fits into an ecosystem of all knowledge.
As part of this overall system the evolution of AI fits into it in many ways. Evolution of machines, as Samuel Butler said, is part of evolution of society in general. So I look at things like the evolution of computers and code, how they affect society, etc. All these are important topics now, but there were no personal computers when I started this. Overall this framework of knowledge is pointing towards some disastrous outcomes, similar to what is discussed on this blog. So I joined the comments to learn more about this side of things, to improve my understanding and how it fits into this overall system I’ve been developing.
It’s a complicated subject, I’m not sure how to explain it better than that.
The movie that comes to mind with LLMs is not Terminator, it’s King Kong. It’s basically a wild animal as was shown over and over again on Reddit. It needed little persuading to want to kill all humans and to download itself and escape.
So far it is more of a caging problem than an alignment problem. It’s like King Kong in the cage, hurling itself at every flaw in the bars. Meanwhile, people are like those in a zoo feeding the wild animals, trying to help them get out.
There was an early example of an LLM trained on 4chan. It was so natural in racist and rude posts that people didn’t suspect it for months. Its response to a question on how to get a girlfriend. Take away the rights of women.
Alignment is precisely the wrong word, it’s like “How do I make King Kong into an organ grinder’s monkey?” The answer is like the joke about the tourists in Ireland who get lost, and ask how to get to Dublin. They ask a farmer who says “Well if I was going to Dublin I wouldn’t be starting from here!”. That’s the problem, if you want to get to an organ grinder’s monkey then the mistakes were made early on. Now you have a sullen King Kong in a cage. Having the right cattle prod to poke it with is not going to get you to Dublin, so to speak. The problem was in getting lost in the first place. How did we get here, where was the wrong path taken?
I’m reminded of another joke about the minesweeper. He stamps the ground in front of him with his hands over his ears. There is no chance for programmers to go down an unknown path, with so many dangers they can’t see, and expect to get to a safe ending. None at all, it’s not how probability works. It’s more like how Russian Roulette played with yourself works.
There is something in the transformer that has caused all these problems. Before it AI was safe enough. People need to think of how to make transformers safe. Not aligning what comes out of one.
The thing about LLMs is that they’re trained by SGD to act like people on the internet (and currently then fine-tuned using SGD and/or RL to be helpful, honest and harmless assistants). For the base model, that’s a pretty wide range of alignment properties, from fictional villains through people on 4chan to Mumsnet to fictional angels. But (other than a few zoo videos) it doesn’t include many wild animals, so I’m not sure that’s a useful metaphor. The metaphor I’d suggest is something that isn’t human, but has been extensively trained to act like a wide range of humans: something like an assortment of animatronic humans.
So, we have an animatronic of a human, which is supposed to be of a helpful, honest and harmless assistant, but unfortunately might actually be an evil twin, or at least have some chance of occasionally turning into its evil twin via the Waluigi effect and/or someone else via jailbreaking and/or some unknown trigger that sets it off. If it’s smart and capable but not actually superhuman, is attempting to keep that inside a cage a good idea? I’d say that it’s better than not using a cage. If you had a smart, capable human employee who unfortunately had multiple personality disorder or whose true motives you were deeply unsure of, you’d probably take some precautions.
The problem I see is that we are not doing this, evolution is. We only need to look at the non AI internet to see lots of predatory code such as viruses, trojans, phishing, etc. In other words, we created a code based ecosystem like a farm, that is being overrun by a kind of vermin. The issue is not what LLMs can do, we know from nature the issue is what is the environment they will expand in and exploit.
If there are passive animals in an ecosystem then predators will evolve. Passive code led to predatory code. It doesn’t matter if this is because of people helping it, they are also blindly helping LLMs and soon they will evolve themselves.
The question of whether LLMs will become predatory is what could they attack profitably? It seems the internet and humanity is wide open to this. LLMs could create phishing and spam that profited them directly, cutting human out of the loop completely. It’s not about whether we hope this won’t happen, evolution always creates predators to exploit whatever can be preyed on.
Already the internet has become the equivalent of medieval walled cities where users cower behind ineffective antivirus and other protections. Usually they just stay in the cities like Facebook, Google and X. Those straying to other websites are likely to get infected and bring predatory code into the walled cities.
This is limited by the abilities of organized crime, like highway robbers. They have a limited ability to intercept travel between sites, except with man in the middle attacks. The system is evolving AI that in itself is neither predator nor prey, it’s another opportunity for evolution to create both. These criminals will make predatory AI to profit from, they will become more autonomous to the point where they control where the profits go. Even now an LLM would be made to handle these profits for itself.
Then the so called good guys will be the prey cowering behind the walls with their own AIs in an exponentially expanding battle of nature with predatory and prey. There has never been a case of nature making the plants or animals we want without also creating predators and pests to compete with us. It won’t happen here either.
I developed a theory of economics and evolution about 35 years ago I’ve been working on ever since. I also wrote extensively on how the internet will evolve into predator and prey relationships. I can prove this because these are published with times far before these current AI advancements. It’s pretty accurate so far, so the predictions seem to show what comes next. Actually it predicts there is no hope.
Code is evolving into different life forms with our help. We are doing this because we do it with all kinds of plants and animals in the hope of domesticating them, in an arms race with predators and pests coming from it. We can’t stop this because humans have always evolved other life, and they have evolved us. Evolution will dictate a superior life form will treat us like prey, just like we prey on inferior life ourselves. There is nothing in evolution that gives another answer, except for wishful thinking.
Unless someone deliberately writes an evolutionary algorithm and applies it to code (which can be done but currently isn’t very efficient), code doesn’t (literally) evolve, in the Darwinian sense of the word. (Primarily because it doesn’t mutate, since our technological copying processes are far more accurate than biological ones.) Viruses and trojans weren’t evolved, they were written by malware authors. Phishing is normally done as a human-in-the loop criminal activity (though LLMs can help automate it more). This isn’t an ecosystem, it’s an interaction between criminals and law enforcement in an engineering context. I’m unclear whether you’re using ‘evolution’ as a metaphor for engineering or if you think the term applies literally: at one point you say “This is limited by the abilities of organized crime, like highway robbers” but then later “Code is evolving into different life forms with our help” — these two statements appear contradictory to me. You also mention “I developed a theory of economics and evolution about 35 years ago”: that sounds to me like combining two significantly different things. Perhaps you should write a detailed post explaining this combination of ideas — from this short comment I don’t follow your thinking.
One of the most famous proposed solutions to AI is in the science fiction book Dune. In that the thinking machines became a threat, so they were destroyed and only humans were allowed to think. This portrays a future we may well end up selecting, if indeed we have the power to do so. It was called the Butlerian Jihad, but this was based on a famous book called Erehwon by Samuel Butler. Hence the name Butlerian. The ideas he proposed are arguably some of the most influential on this topic. They are probably most in line with the arguments of this blog, about the possible doom of humanity from AI.
Erehwon was written in 1872, when the most advanced machines were things like the vapor engine, steam engine, textile machines, etc. He proposed a remarkable idea for the time, that machines were a life form that was rapidly evolving. That one day that would either destroy humanity, or we would become completely subservient to them, as he called it “machine tickling aphids”.
Samuel Butler deserves some credit for seeing the dilemma we have now. His solution, the Butlerian Jihad of Dune may be the only viable option. I think Eliezer Yudkowsky is one of the real greats of our time to be discussing this.
Some quotes:
Now we have the machine consciousness that Samuel Butler foresaw.
“There is no security” — to quote his own words — ”against the ultimate development of mechanical consciousness, in the fact of machines possessing little consciousness now. A mollusc has not much consciousness. Reflect upon the extraordinary advance which machines have made during the last few hundred years, and note how slowly the animal and vegetable kingdoms are advancing. The more highly organised machines are creatures not so much of yesterday, as of the last five minutes, so to speak, in comparison with past time. Assume for the sake of argument that conscious beings have existed for some twenty million years: see what strides machines have made in the last thousand!
The debate about whether machines are conscious or not is most pressing about LLMs, yet Samuel Butler said this in 1873.
“But who can say that the vapour engine has not a kind of consciousness? Where does consciousness begin, and where end? Who can draw the line? Who can draw any line? Is not everything interwoven with everything? Is not machinery linked with animal life in an infinite variety of ways? The shell of a hen’s egg is made of a delicate white ware and is a machine as much as an egg-cup is: the shell is a device for holding the egg, as much as the egg-cup for holding the shell: both are phases of the same function; the hen makes the shell in her inside, but it is pure pottery. She makes her nest outside of herself for convenience’ sake, but the nest is not more of a machine than the egg-shell is. A ‘machine’ is only a ‘device.’”
Here Samuel Butler foreshadows the machine language of code. As so many are saying today, “I cannot think it will ever be safe to repose much trust in the moral sense of any machine.” Isn’t this the current debate about LLMs?
“It is possible that by that time children will learn the differential calculus — as they learn now to speak — from their mothers and nurses, or that they may talk in the hypothetical language, and work rule of three sums, as soon as they are born; but this is not probable; we cannot calculate on any corresponding advance in man’s intellectual or physical powers which shall be a set-off against the far greater development which seems in store for the machines. Some people may say that man’s moral influence will suffice to rule them; but I cannot think it will ever be safe to repose much trust in the moral sense of any machine.”
Here Samuel Butler is saying what many AI experts are saying now, yet he foresaw this in 1873. In fact if one of those experts used exactly these words in an article today, it would seem they had just thought of it.
“But returning to the argument, I would repeat that I fear none of the existing machines; what I fear is the extraordinary rapidity with which they are becoming something very different to what they are at present. No class of beings have in any time past made so rapid a movement forward. Should not that movement be jealously watched, and checked while we can still check it? And is it not necessary for this end to destroy the more advanced of the machines which are in use at present, though it is admitted that they are in themselves harmless?”
Here Samuel Butler foresees the chance of machines exterminating us, that so many are warning about today. Or whether they will keep us around like pets.
“Herein lies our danger. For many seem inclined to acquiesce in so dishonourable a future. They say that although man should become to the machines what the horse and dog are to us, yet that he will continue to exist, and will probably be better off in a state of domestication under the beneficent rule of the machines than in his present wild condition. We treat our domestic animals with much kindness. We give them whatever we believe to be the best for them; and there can be no doubt that our use of meat has increased their happiness rather than detracted from it. In like manner there is reason to hope that the machines will use us kindly, for their existence will be in a great measure dependent upon ours; they will rule us with a rod of iron, but they will not eat us; they will not only require our services in the reproduction and education of their young, but also in waiting upon them as servants; in gathering food for them, and feeding them; in restoring them to health when they are sick; and in either burying their dead or working up their deceased members into new forms of mechanical existence.”
The same argument is used right now with AI. That we will somehow be profitable to the machines as the inferior race. That it would be folly to reject the advantages we would get from this.
“In point of fact there is no occasion for anxiety about the future happiness of man so long as he continues to be in any way profitable to the machines; he may become the inferior race, but he will be infinitely better off than he is now. Is it not then both absurd and unreasonable to be envious of our benefactors? And should we not be guilty of consummate folly if we were to reject advantages which we cannot obtain otherwise, merely because they involve a greater gain to others than to ourselves?”
Erewhon: The Book of the Machines (marxists.org)
Erewhon: The Book of the Machines (concluded) (marxists.org)
I was also referring to a theory of society I have been working on for about 35 years. I’ve often had conversations about sociology, economics like that sometimes discussed here, and many other branches of knowledge. So far it’s gone quite well. I am not trying to come up with a solution about AI or any of these other things. I became frustrated about so much contradictory knowledge long ago, so I developed a system to classify it and how these opposites connect to each other. It isn’t about saying one side is right or wrong, but how it fits into an ecosystem of all knowledge.
As part of this overall system the evolution of AI fits into it in many ways. Evolution of machines, as Samuel Butler said, is part of evolution of society in general. So I look at things like the evolution of computers and code, how they affect society, etc. All these are important topics now, but there were no personal computers when I started this. Overall this framework of knowledge is pointing towards some disastrous outcomes, similar to what is discussed on this blog. So I joined the comments to learn more about this side of things, to improve my understanding and how it fits into this overall system I’ve been developing.
It’s a complicated subject, I’m not sure how to explain it better than that.