DirectedEvolution comments on How did LW update p(doom) after LLMs blew up?

DirectedEvolution 24 Apr 2023 19:24 UTC
3 points
0
It sounds like your model of AI apocalypse is that a programmer gets access to a powerful enough AI model that they can make the AI create a disease or otherwise cause great harm?
AI risk is disjunctive—there are a lot of ways to proliferate AI, a lot of ways it could fail to be reasonably human-aligned, and a lot of ways to use or allow an insufficiently aligned AI to do harm. So that is one part of my model, but my model doesn’t really depend on gaming out a bunch of specific scenarios.
I’d compare it to the heuristic economists use that “growth is good:” we don’t know exactly what will happen, but if we just let the market do its magic, good things will tend to happen for human welfare. Similarly, “AI is bad (by default):” we don’t know exactly what will happen, but if we just let capabilities keep on enhancing, there’s a >10% chance we’ll see an unavoidably escalating or sudden history-defining catastrophe as a consequence. We can make micro-models (i.e. talking about what we see with ChaosGPT) or macro-models (i.e. coordination difficulties) in support of this heuristic.
OpenAI (and hopefully other companies as well) are doing the basic testing of how much harm can be done with a model used by a human
I don’t think this is accurate. They are testing specific harm scenarios where they think the risks are manageable. They are not pushing AI to the limit of its ability to cause harm.
the best models will be gate kept for long enough that we can expect the experts will know the capabilities of the system before they make it widely available
In this model, the experts may well release a model with much capacity for harm, as long as they know it can cause that harm. As I say, I think it’s unlikely that the experts are going to figure out all the potential harms—I work in biology, and everybody knows that the experts in my field have many times released drugs without understanding the full extent of their ability to cause harm, even in the context of the FDA. My field is probably overregulated at this point, but AI most certainly is not—it’s a libertarian’s dream (for now).
under this scenario the criminal has an AI, but so does everyone else, running the best LLMs will be very expensive, so the criminal is restricted in their access
Models are small enough that if hacked out of the trainer’s systems, they could be run on a personal computer. It’s training that is expensive and gatekeeping-compatible.
We don’t need to posit that a human criminal will be actively using the AI to cause havok. We only need imagine an LLM-based computer virus hacking other computers, importing its LLM onto them, and figuring out new exploits as it moves from computer to computer.
Again, AI risk is disjunctive: arguing against one specific scenario is useful, but it doesn’t end the debate. It’s like Neanderthals trying to game out all the ways they could fight back against humans if superior human intelligence started letting the humans run amok. “If the humans try to kill us in our sleep, we can just post guards to keep an eye out for them. CHECKMATE, HUMANS!”… and, well, here we are, and where are the Neanderthals? Superior intelligence can find many avenues to get what it wants, unless you have some way of aligning its interests with your own.
The USA just had a huge leak of extremely important classified documents because the Pentagon apparently can’t get its act together to not just spray this stuff all over the place. People hack computers for a few thousand bucks, not to mention the world’s leading software technology worth like a billion dollars in training funds, and I know for a fact that not all SOTA LLM purveyors have fully invested in adequate security measures to prevent their models from being stolen. This is par for the course.