Seth Herd comments on A smart enough LLM might be deadly simply if you run it for long enough

Seth Herd 14 Apr 2024 20:08 UTC
2 points
0
My summary: This is related to the The Waluigi Effect (mega-post) but extends the hypothesis to that “Waluigi” hostile simulacra finding ways to perpetuate itself and gain influence first over the simulator, then over the real world.

Okay, I came back and read this more fully. I think this is entirely plausible. But I also think it’s mostly irrelevant. Long before someone accidentally runs a smart enough LLM for long enough, with access to enough tools to pose a threat, they’ll deliberately run it as an agent. The prompt “you’re a helpful assistant that wants to accomplish [x]; make a plan and execute it, using [this set of APIs] to gather information and take actions as appropriate.

And long before that, people will use more complex scaffolding to create dangerous language model cognitive architectures out of less capable LLMs.

I could be wrong about this, and I invite pushback. Again, I take the possibility you raise seriously.