1a3orn comments on Ways I Expect AI Regulation To Increase Extinction Risk

1a3orn 5 Jul 2023 13:16 UTC
35 points
14
I do think there are important dissimilarities between AI and flight.

For instance: People disagree massively over what is safe for AI in ways they do not over flight; i.e., are LLMs going to plateau and provide us a harmless and useful platform for exploring interpretability, while maybe robustifying the world somewhat; or are they going to literally kill everyone?

I think pushing for regulations under such circumstances is likely to promote the views of an accidental winner of a political struggle; or to freeze in amber currently-accepted views that everyone agrees were totally wrong two years later; or to result in a Frankenstein-eqsque mishmash of regulation that serves a miscellaneous set of incumbents and no one else.

My chief purpose here, though, isn’t to provide a totally comprehensive “AI regs will be bad” point-of-view.

It’s more that, well, everyone has a LLM / babbler inside themselves that helps them imaginatively project forward into future. It’s the thing that makes you autocomplete the world; the implicit world-model driving thought, the actual iceberg of real calculation behind the iceberg’s tip of conscious thought.

When you read a story of AI things going wrong, you train the LLM / babbler. If you just read stories of AI stuff going wrong—and read no stories of AI policy going wrong—then the babbler becomes weirdly sensitive to the former, and learns to ignore the former. And there are many, many stories on LW and elsewhere now about how AI stuff goes wrong—without such stories about AI policy going wrong.

If you want to pass good AI regs—or like, be in a position to not pass AI regs with bad effects—the babbler needs to be trained to see all these problems. Without being equally trained to see all these problems, you can have confidence that AI regs will be good, but that confidence will just correspond to a hole in one’s world model.

This is… intended to be a small fraction of the training data one would want, to get one’s intuition in a place where confidence doesn’t just correspond to a hole in one’s world model.