Firstly, it would seem to me to be much more difficult to FOOM with an LLM, it would seem much more difficult to create a superintelligence in the first place, and it seems like getting them to act creatively and be reliable are going to be much harder problems than making sure they aren’t too creative.
Au contraire, for me at least. I am no expert on AI, but prior to the LLM blowup and seeing AutoGPT emerge almost immediately, I thought that endowing AI with the agency[1] would take an elaborate engineering effort that went somehow beyond imitation of human outputs, such as language or imagery. I was somewhat skeptical of the orthogonality thesis. I also thought that it would take massive centralized computing resources not only to train but also to operate trained models (as I said, no expert). Obviously that is not true, and in a utopian outcome, access to LLMs will probably be a commodity good, with lots of roughly comparable models from many vendors to choose from and widely available open-source or hacked models as well.
Now, I see the creation of increasingly capable autonomous agents as just a matter of time, and ChaosGPT is overwhelming empirical evidence of orthogonality as far as I’m concerned. Clearly morality has to be enforced on the fundamentally amoral intelligence that is the LLM.
For me, my p(doom) increased due to the orthogonality thesis being conclusively proved correct and realizing just how cheap and widely available advanced AI models would be to the general public.
Edit: One other factor I forgot to mention is how instantaneously we’d shift from “AI doom is sci-fi, don’t worry about it” to “AI doom is unrealistic because it just won’t happen, don’t worry about it” as LLMs became an instant sensation. I have been deeply disappointed on this issue by Tyler Cowen, who I really did not expect to shift from his usual thoughtful, balanced engagement with advanced ideas to just utter punditry on the issue. I think I understand where he’s coming from—the huge importance of growth, the desire not to see AI killed by overregulation in the manner of nuclear power, etc—but still.
It has reinforced my belief that a fair fraction of the wealthy segment of the boomer generation will see AI as a way to cheat death (a goal I’m a big fan of), and will rush full-steam ahead to extract longevity tech out of it because they personally do not have time to wait to align AI, and they’re dead either way. I expect approximately zero of them to admit this is a motivation, and only a few more to be crisply conscious of it.
It sounds like your model of AI apocalypse is that a programmer gets access to a powerful enough AI model that they can make the AI create a disease or otherwise cause great harm?
Orthogonality and wide access as threat points both seem to point towards that risk.
I have a couple of thoughts about that scenario-
OpenAI (and hopefully other companies as well) are doing the basic testing of how much harm can be done with a model used by a human, the best models will be gate kept for long enough that we can expect the experts will know the capabilities of the system before they make it widely available, under this scenario the criminal has an AI, but so does everyone else, running the best LLMs will be very expensive, so the criminal is restricted in their access, and all these barriers to entry increase the time that experts have to realize the risk and gatekeep.
I understand the worry, but this does not seem like a high P(doom) scenario to me.
Given that in this scenario we have access to a very powerful LLM that is not immediately killing people, this sounds like a good outcome to me.
It sounds like your model of AI apocalypse is that a programmer gets access to a powerful enough AI model that they can make the AI create a disease or otherwise cause great harm?
AI risk is disjunctive—there are a lot of ways to proliferate AI, a lot of ways it could fail to be reasonably human-aligned, and a lot of ways to use or allow an insufficiently aligned AI to do harm. So that is one part of my model, but my model doesn’t really depend on gaming out a bunch of specific scenarios.
I’d compare it to the heuristic economists use that “growth is good:” we don’t know exactly what will happen, but if we just let the market do its magic, good things will tend to happen for human welfare. Similarly, “AI is bad (by default):” we don’t know exactly what will happen, but if we just let capabilities keep on enhancing, there’s a >10% chance we’ll see an unavoidably escalating or sudden history-defining catastrophe as a consequence. We can make micro-models (i.e. talking about what we see with ChaosGPT) or macro-models (i.e. coordination difficulties) in support of this heuristic.
OpenAI (and hopefully other companies as well) are doing the basic testing of how much harm can be done with a model used by a human
I don’t think this is accurate. They are testing specific harm scenarios where they think the risks are manageable. They are not pushing AI to the limit of its ability to cause harm.
the best models will be gate kept for long enough that we can expect the experts will know the capabilities of the system before they make it widely available
In this model, the experts may well release a model with much capacity for harm, as long as they know it can cause that harm. As I say, I think it’s unlikely that the experts are going to figure out all the potential harms—I work in biology, and everybody knows that the experts in my field have many times released drugs without understanding the full extent of their ability to cause harm, even in the context of the FDA. My field is probably overregulated at this point, but AI most certainly is not—it’s a libertarian’s dream (for now).
under this scenario the criminal has an AI, but so does everyone else, running the best LLMs will be very expensive, so the criminal is restricted in their access
Models are small enough that if hacked out of the trainer’s systems, they could be run on a personal computer. It’s training that is expensive and gatekeeping-compatible.
We don’t need to posit that a human criminal will be actively using the AI to cause havok. We only need imagine an LLM-based computer virus hacking other computers, importing its LLM onto them, and figuring out new exploits as it moves from computer to computer.
Again, AI risk is disjunctive: arguing against one specific scenario is useful, but it doesn’t end the debate. It’s like Neanderthals trying to game out all the ways they could fight back against humans if superior human intelligence started letting the humans run amok. “If the humans try to kill us in our sleep, we can just post guards to keep an eye out for them. CHECKMATE, HUMANS!”… and, well, here we are, and where are the Neanderthals? Superior intelligence can find many avenues to get what it wants, unless you have some way of aligning its interests with your own.
The USA just had a huge leak of extremely important classified documents because the Pentagon apparently can’t get its act together to not just spray this stuff all over the place. People hack computers for a few thousand bucks, not to mention the world’s leading software technology worth like a billion dollars in training funds, and I know for a fact that not all SOTA LLM purveyors have fully invested in adequate security measures to prevent their models from being stolen. This is par for the course.
Au contraire, for me at least. I am no expert on AI, but prior to the LLM blowup and seeing AutoGPT emerge almost immediately, I thought that endowing AI with the agency[1] would take an elaborate engineering effort that went somehow beyond imitation of human outputs, such as language or imagery. I was somewhat skeptical of the orthogonality thesis. I also thought that it would take massive centralized computing resources not only to train but also to operate trained models (as I said, no expert). Obviously that is not true, and in a utopian outcome, access to LLMs will probably be a commodity good, with lots of roughly comparable models from many vendors to choose from and widely available open-source or hacked models as well.
Now, I see the creation of increasingly capable autonomous agents as just a matter of time, and ChaosGPT is overwhelming empirical evidence of orthogonality as far as I’m concerned. Clearly morality has to be enforced on the fundamentally amoral intelligence that is the LLM.
For me, my p(doom) increased due to the orthogonality thesis being conclusively proved correct and realizing just how cheap and widely available advanced AI models would be to the general public.
Edit: One other factor I forgot to mention is how instantaneously we’d shift from “AI doom is sci-fi, don’t worry about it” to “AI doom is unrealistic because it just won’t happen, don’t worry about it” as LLMs became an instant sensation. I have been deeply disappointed on this issue by Tyler Cowen, who I really did not expect to shift from his usual thoughtful, balanced engagement with advanced ideas to just utter punditry on the issue. I think I understand where he’s coming from—the huge importance of growth, the desire not to see AI killed by overregulation in the manner of nuclear power, etc—but still.
It has reinforced my belief that a fair fraction of the wealthy segment of the boomer generation will see AI as a way to cheat death (a goal I’m a big fan of), and will rush full-steam ahead to extract longevity tech out of it because they personally do not have time to wait to align AI, and they’re dead either way. I expect approximately zero of them to admit this is a motivation, and only a few more to be crisply conscious of it.
creating adaptable plans to pursue arbitrarily specified goals in an open-ended way
It sounds like your model of AI apocalypse is that a programmer gets access to a powerful enough AI model that they can make the AI create a disease or otherwise cause great harm?
Orthogonality and wide access as threat points both seem to point towards that risk.
I have a couple of thoughts about that scenario-
OpenAI (and hopefully other companies as well) are doing the basic testing of how much harm can be done with a model used by a human, the best models will be gate kept for long enough that we can expect the experts will know the capabilities of the system before they make it widely available, under this scenario the criminal has an AI, but so does everyone else, running the best LLMs will be very expensive, so the criminal is restricted in their access, and all these barriers to entry increase the time that experts have to realize the risk and gatekeep.
I understand the worry, but this does not seem like a high P(doom) scenario to me.
Given that in this scenario we have access to a very powerful LLM that is not immediately killing people, this sounds like a good outcome to me.
AI risk is disjunctive—there are a lot of ways to proliferate AI, a lot of ways it could fail to be reasonably human-aligned, and a lot of ways to use or allow an insufficiently aligned AI to do harm. So that is one part of my model, but my model doesn’t really depend on gaming out a bunch of specific scenarios.
I’d compare it to the heuristic economists use that “growth is good:” we don’t know exactly what will happen, but if we just let the market do its magic, good things will tend to happen for human welfare. Similarly, “AI is bad (by default):” we don’t know exactly what will happen, but if we just let capabilities keep on enhancing, there’s a >10% chance we’ll see an unavoidably escalating or sudden history-defining catastrophe as a consequence. We can make micro-models (i.e. talking about what we see with ChaosGPT) or macro-models (i.e. coordination difficulties) in support of this heuristic.
I don’t think this is accurate. They are testing specific harm scenarios where they think the risks are manageable. They are not pushing AI to the limit of its ability to cause harm.
In this model, the experts may well release a model with much capacity for harm, as long as they know it can cause that harm. As I say, I think it’s unlikely that the experts are going to figure out all the potential harms—I work in biology, and everybody knows that the experts in my field have many times released drugs without understanding the full extent of their ability to cause harm, even in the context of the FDA. My field is probably overregulated at this point, but AI most certainly is not—it’s a libertarian’s dream (for now).
Models are small enough that if hacked out of the trainer’s systems, they could be run on a personal computer. It’s training that is expensive and gatekeeping-compatible.
We don’t need to posit that a human criminal will be actively using the AI to cause havok. We only need imagine an LLM-based computer virus hacking other computers, importing its LLM onto them, and figuring out new exploits as it moves from computer to computer.
Again, AI risk is disjunctive: arguing against one specific scenario is useful, but it doesn’t end the debate. It’s like Neanderthals trying to game out all the ways they could fight back against humans if superior human intelligence started letting the humans run amok. “If the humans try to kill us in our sleep, we can just post guards to keep an eye out for them. CHECKMATE, HUMANS!”… and, well, here we are, and where are the Neanderthals? Superior intelligence can find many avenues to get what it wants, unless you have some way of aligning its interests with your own.
The USA just had a huge leak of extremely important classified documents because the Pentagon apparently can’t get its act together to not just spray this stuff all over the place. People hack computers for a few thousand bucks, not to mention the world’s leading software technology worth like a billion dollars in training funds, and I know for a fact that not all SOTA LLM purveyors have fully invested in adequate security measures to prevent their models from being stolen. This is par for the course.