I’m a daily user of ChatGPT, sometimes supplementing it with Claude, and the occasional local model for some experiments. I try to make squeeze LLMs into agent-shaped bodies, but it doesn’t really work. I also have a PhD, which typically would make me an expert in the field of AI, but the field is so busy and dynamic that it’s hard to really state what an “expert” even is.

AI can bring—already brings—lots of value, and general improvements to human lives, so my default stance is that its continued progress—one might say, acceleration—is in our best interest. The flip-side is, of course, a whole host of dangers associated with AI.

Boring dangers

There’s a few “boring” dangers. Humans can use AI for evil—this applies to every single technology in the history of technology, so if it was enough to stop AI, it should make us return to monke. Speaking a bit more formally—if the benefits outweight the dangers, we should go ahead with AI development. Speaking a bit less formally—sure, we need to stop the bad guys, but we’re stopping all sorts of bad guys already, so I’ll just omit this whole section of dangers, because I’m utterly unconvinced that the “boring” bad applications of AI are uniquely worse than the bad applications of other technologies.

Dangerous dangers

The more interesting part of AI danger is, as people here will likely agree, the risk of a superintelligent AI being misaligned with our human values, to the point of becoming a threat to humanity’s continued existence. I absolutely acknowledge that a sufficiently powerful AI could pose such threat. A machine that perfectl executes the literal meaning of someone’s words can neglect a lot of the implicit assumptions (“Don’t run over the baby”, “Don’t turn people into paperclips”) that we humans know intuitively. A superintelligence might develop its own goals, be it instrumental or terminal, and we might have little chance to stop it. I agree with all of this.

And yet, I don’t see any reason to stop, or even slow down.

Dangerous capabilities

By far, the most dangerous—and useful—capability for an AI to have is agency. The moment an AI can just go and do things (as opposed to outputting information for a human to read), its capabilities go up a notch. And yet, every AI agent that I’ve seen so far is either an AGI that’s been achieved in a gridworld, or a scam startup that’s wasting millions of VC dollars. Really, PauseAI people should be propping up AutoGPT et al. if they want to slow down real progress.

It’s not that easy for an unassisted^[1] AI to do harm—especially existentially significant harm. I like to think that I’m a fairly smart human, and I have no idea how I would bring about the end of humanity if I so desired. You’d need a lot of knowledge about the world as it is, a lot of persuasive power to get people to do what you want them to, a lot of adaptability to learn how the world changes. And we’re simply not even remotely close to a level of capabilities, or even autonomy, where this would be possible.

Auto-regressive models, typically LLMs, do one thing really well—they predict the next most likely token. There’s a lot of interesting things that you could do via predicting the next token, but agency, in my opinion, is not one of them. The moment you need to construct multi-step plans, react to changing conditions, communicate with others, and even know what is important enough to remember/observe—LLMs start failing completely.

But they’ll get better!

Sure, they might—I hope they do. There’s so much good that better models can bring. But the LLM-based paradigm is definitely hitting a wall. Maybe we can scale GPT-5 and GPT-6, GPT-N. They’ll get smarter, they’ll hallucinate less, they’ll put some more people out of jobs, they’ll pick more goats in the Goat-Enthusiast Monty Hall variant. They won’t get agency. They won’t get scary.

But we should pause, figure out safety, and then resume

That would be nice, except for two details—opportunity cost, and not having any reference AI to base our safety work on. Opportunity cost is self-explanatory—by stopping AI progress, we lose all the good stuff that AI would lead to. Pausing it should be the last resort. Without discussing the specific threat levels at this point in history, you could have made an argument for PauseCompute back in the 90′s, or whenever really. If we start approaching dangerous levels of capabilities, we should probably stop or pause if our safety knowledge isn’t up to snuff. But we’re still so, so far.

As for the second part—if we stop AI research right now and only focus on alignment, we will never figure out alignment. We’d be forced into purely philosophical “research” entirely detached from reality. It would have to be general enough to cover any conceivable type of AI, any conceivable threat model, and thus—impossible. Instead, we should keep building our current silly lil’ AIs, to start building an understanding of what the capable-and-dangerous models might be in the future, and work on making those specific models safe.

If you did alignment research in the 90′s, you wouldn’t have focused on LLMs.

A small test

Hey MacGPT-10, make a sphere in Blender.

It’s a relatively complex task—figure out a browser, find the right website, open it, download blender, install blender, open blender, figure out how to use it, make a sphere.

No existing model can do it. I’m pretty sure GPT-N won’t be able to do it, assuming they follow the same paradigm. And this is maybe 1% of a potentially existentially-threatening level of capabilities.

The end...

...is not near. If AI kills us all, it will be something so different from current models that we won’t even think “Of course, I should have predicted this”. So for now, I will keep advancing AI capabilities and/or safety (depending on my research/job at any given moment), because they’re both valuable, and try to help humanity thrive, and tackle its other, much more immediate threats.

^
I’m explicitly excluding scenarios of “Human uses AI to deliberately do harm”, or “Human uses AI incorrectly and accidentally causes harm”

Why I’m not doing PauseAI