1a3orn comments on Alignment Implications of LLM Successes: a Debate in One Act

1a3orn 27 Oct 2023 9:08 UTC
5 points
3
I agree with you about LLMs!

If MIRI-adjacent pessimists think that, I think they should stop saying things like this, which—if you don’t think LLMs have instrumental motives—is the actual opposite of good communication:

@Pradyumna: “I’m struggling to understand why LLMs are existential risks. So let’s say you did have a highly capable large language model. How could RLHF + scalable oversight fail in the training that could lead to every single person on this earth dying?”

@ESYudkowsky: “Suppose you captured an extremely intelligent alien species that thought 1000 times faster than you, locked their whole civilization in a spatial box, and dropped bombs on them from the sky whenever their output didn’t match a desired target—as your own intelligence tried to measure that.

What could they do to you, if when the ‘training’ phase was done, you tried using them the same way as current LLMs—eg, connecting them directly to the Internet?”

(To the reader, lest you are concerned by this—the process of RLHF has no resemblance to this.)