Steven Byrnes comments on Thoughts on “AI is easy to control” by Pope & Belrose

Steven Byrnes 2 Dec 2023 14:51 UTC
LW: 16 AF: 7
9
AF
While it’s obviously true that there is a lot of stuff operating in brains besides LLM-like prediction, such as mechanisms that promote specific predictive models over other ones, that seems to me to only establish that “the human brain is not just LLM-like prediction”, while you seem to be saying that “the human brain does not do LLM-like prediction at all”. (Of course, “LLM-like prediction” is a vague concept and maybe we’re just using it differently and ultimately agree.)
I disagree with whether that distinction matters:
I think technical discussions of AI safety depend on the AI-algorithm-as-a-whole; I think “does the algorithm have such-and-such component” is not that helpful a question.
So for example, here’s a nightmare-scenario that I think about often:
- (step 1) Someone reads a bunch of discussions about LLM x-risk
- (step 2) They come down on the side of “LLM x-risk is low”, and therefore (they think) it would be great if TAI is an LLM as opposed to some other type of AI
- (step 3) So then they think to themselves: Gee, how do we make LLMs more powerful? Aha, they find a clever way to build an AI that combines LLMs with open-ended real-world online reinforcement learning or whatever.
Even if (step 2) is OK (which I don’t want to argue about here), I am very opposed to (step 3), particularly the omission of the essential part where they should have said “Hey wait a minute, I had reasons for thinking that LLM x-risk is low, but do those reasons apply to this AI, which is not an LLM of the sort that I’m used to, but rather it’s a combination of LLM + open-ended real-world online reinforcement learning or whatever?” I want that person to step back and take a fresh look at every aspect of their preexisting beliefs about AI safety / control / alignment from the ground up, as soon as any aspect of the AI architecture and training approach changes, even if there’s still an LLM involved. :)