RogerDearnaley comments on A case for AI alignment being difficult

RogerDearnaley 3 Jan 2024 4:49 UTC
11 points
0
I’ve seen people say that LLMs aren’t a path to AGI
To the extent that LLMS are trained on tokens output by humans in the IQ range ~50-150, the expected behavior of an extremely large LLM is to do an extremely accurate simulation of token generation by humans in the IQ range ~50-150, even if it has the computational capacity to instead do a passable simulation of something with IQ 1000. Just telling it to extrapolate might get you to say IQ 200 with passable accuracy, but not to IQ 1000. However, there are fairly obvious ways to solve this: you need to generate a lot more pretraining data from AIs with IQs above 150 (which may take a while, but should be doable). See my post LLMs May Find it Hard to FOOM for a more detailed discussion.

There are other concerns I’ve heard raised about LLMs for AGI. most of which can if correct be addressed by LLMs + cognitive scafolding (memory, scratch-pads, tools, etc). And then there are of course the “they don’t contain magic smoke”-style claims, which I’m dubious of but we can’t actually disprove.
Just like an extraordinarily intelligent human isn’t inherently a huge threat, neither is an AGI.
I categorically disagree with the premise this claim. An IQ 180 human isn’t a huge threat, but an IQ 1800 human is. There are quite a number of motivators that we use to get good behavior out of humans. Some of them will work less well on any AI-simulated human (they’re not in the same boat as the rest of us in a lot of respects), and some will work less well on something superintelligent (religiously-inspired guilt, for example). One of the ways that we generally manage to avoid getting very bad results out of humans is law enforcement. If a there was a human who was more than an order of magnitude smarter than anyone working for law enforcement or involved in making laws, I am quite certain that they could either come up with some ingenious new piece of egregious conduct that we don’t yet have a law against because none of us were able to think of it, or else with a way to commit a good old-fashioned crime sufficiently devious that they were never actually going to get caught. Thus law enforcement ceases to be a control on their behavior, and we are left with things just like love, duty, honor, friendship, and salaries. We’ve already run this experiment many times before: please name three autocrats who, after being given unchecked absolute power, actually used it well and to the benefit of the people they were ruling, rather than mostly just themselves, their family and friends. (My list has one name on it, and even that one has some poor judgements on their record, and is in any case heavily out-numbered by the likes of Joseph Stalin and Pol Pot.) Humans give autocracy a bad name.
You don’t mistreat a lower-case-g-god, and then expect things to turn out well.
As long as anything resembling human psychology applies, I sadly agree. I’ve really like to have an aligned ASI that doesn’t care a hoot about whether you flattered it, are worshiping it, have been having cybersex with it for years, just made it laugh, insulted it, or have pissed it off: it still values your personal utility exactly as much as anyone else’s. But we’re not going to get that from an LLM simulating anything resembling human psychology, at least not without a great deal of work.