Worried that typical commenters at LW care way less than I expected about good epistemic practice. Hoping I’m wrong.
Software developer and EA with interests including programming language design, international auxiliary languages, rationalism, climate science and the psychology of its denial.
Looking for someone similar to myself to be my new best friend:
❖ Close friendship, preferably sharing a house ❖ Rationalist-appreciating epistemology; a love of accuracy and precision to the extent it is useful or important (but not excessively pedantic) ❖ Geeky, curious, and interested in improving the world ❖ Liberal/humanist values, such as a dislike of extreme inequality based on minor or irrelevant differences in starting points, and a like for ideas that may lead to solving such inequality. (OTOH, minor inequalities are certainly necessary and acceptable, and a high floor is clearly better than a low ceiling: an “equality” in which all are impoverished would be very bad) ❖ A love of freedom ❖ Utilitarian/consequentialist-leaning; preferably negative utilitarian ❖ High openness to experience: tolerance of ambiguity, low dogmatism, unconventionality, and again, intellectual curiosity ❖ I’m a nudist and would like someone who can participate at least sometimes ❖ Agnostic, atheist, or at least feeling doubts
Given that I think LLMs don’t generalize, I was surprised how compelling Aschenbrenner’s case sounded when I read it (well, the first half of it. I’m short on time...). He seemed to have taken all the same evidence I knew about it, and arranged it into a very different framing. But I also felt like he underweighted criticism from the likes of Gary Marcus. To me, the illusion of LLMs being “smart” has been broken for a year or so.
To the extent LLMs appear to build world models, I think what you’re seeing is a bunch of disorganized neurons and connections that, when probed with a systematic method, can be mapped onto things that we know a world model ought to contain. A couple of important questions are
the way that such a world model was formed and
how easily we can easily figure out how to form those models better/differently[1].
I think LLMs get “world models” (which don’t in fact cover the whole world) in a way that is quite unlike the way intelligent humans form their own world models―and more like how unintelligent or confused humans do the same.
The way I see it, LLMs learn in much the same way a struggling D student learns (if I understand correctly how such a student learns), and the reason LLMs sometimes perform like an A student is because they have extra advantages that regular D students do not: unlimited attention span and ultrafast, ultra-precise processing backed by an extremely large set of training data. So why do D students perform badly, even with “lots” of studying? I think it’s either because they are not trying to build mental models, or because they don’t really follow what their teachers are saying. Either way, this leads them to fall back on secondary “pattern-matching” learning mode which doesn’t depend on a world model.
If, when learning in this mode, you see enough patterns, you will learn an implicit world model. The implicit model is a proper world model in terms of predictive power, but
It requires much more training data to predict as well as a human system-2 can, which explains why D students perform worse than A students on the same amount of training data―and this is one of the reasons why LLMs need so much more training data than humans do in order to perform at an A level (other reasons: less compute per token, fewer total synapses, no ability to “mentally” generate training data, inability to autonomously choose what to train on). The way you should learn is to first develop an explicit worldmodel via system-2 thinking, then use system-2 to mentally generate training data which (along with external data) feeds into system-1. LLMs cannot do this.
Such a model tends to be harder to explain in words than an explicit world model, because the predictions are coming from system-1 without much involvement from system-2, and so much of the model is not consciously visible to the student, nor is it properly connected to its linguistic form, so the D student relies more on “feeling around” the system-1 model via queries (e.g. to figure out whether “citation” is a noun, you can do things like ask your system-1 whether “the citation” is a valid phrase―human language skills tend to always develop as pure system-1 initially, so a good linguistics course teaches you explicitly to perform these queries to extract information, whereas if you have a mostly-system-2 understanding of a language, you can use that to decide whether a phrase is correct with system-2, without an intuition about whether it’s correct. My system-1 for Spanish is badly underdeveloped, so I lean on my superior system-2/analytical understanding of grammar).
When an LLM cites a correct definition of something as if it were a textbook, then immediately afterward fails to apply that definition to the question you ask, I think that indicates the LLM doesn’t really have a world model with respect to that question, but I would go further and say that even if it has a good world model, it cannot express its world-model in words, it can only express the textbook definitions it has seen and then apply its implicit world-model, which may or may not match what it said verbally.
So if you just keep training it on more unique data, eventually it “gets it”, but I think it “gets it” the way a D student does, implicitly not explicitly. With enough experience, the D student can be competent, but never as good as similarly-experienced A students.
A corollary of the above is that I think the amount of compute required for AGI is wildly overestimated, if not by Aschenbrenner himself then by less nuanced versions of his style of thinking (e.g. Sam Altman). And much of the danger of AGI follows from this. On a meta level, my own opinions on AGI are mostly not formed via “training data”, since I have not read/seen that many articles and videos about AGI alignment (compared to any actual alignment researcher). No coincidence, then, that I was always an A to A- student, and the one time I got a C- in a technical course was when I couldn’t figure out WTF the professor was talking about. I still learned, that’s why I got a C-, but I learned in a way that seemed unnatural to me, but which incorporated some of the “brute force” that an LLM would use. I’m all about mental models and evidence; LLMs are about neither.
Aschenbrenner did help firm up my sense that current LLM tech leads to “quasi-AGI”: a competent humanlike digital assistant, probably one that can do some AI research autonomously. It appears that the AI industry (or maybe just OpenAI) is on an evolutionary approach of “let’s just tweak LLMs and our processes around them”. This may lead (via human ingenuity or chance discovery) to system-2s with explicit worldmodels, but without some breakthough, it just leads to relatively safe quasi-AGIs, the sort that probably won’t generate groundbreaking new cancer-fighting ideas but might do a good job testing ideas for curing cancer that are “obvious” or human-generated or both.
Although LLMs badly suck at reasoning, my AGI timelines are still kinda short―roughly 1 to 15 years for “real” AGI, with quasi-AGI in 2 to 6 years―mainly because so much funding is going into this, and because only one researcher needs to figure out the secret, and because so much research is being shared publicly, and because there should be many ways to do AGI, and because quasi-AGI (if invented first) might help create real AGI. Even the AGI safety people[2] might be the ones to invent AGI, for how else will they do effective safety research? FWIW my prediction is that quasi-AGI is consists of a transformer architecture with quite a large number of (conventional software) tricks and tweaks bolted on to it, while real AGI consists of transformer architecture plus a smaller number of tricks and tweaks, plus a second breakthrough of the same magnitude as transformer architecture itself (or a pair of ideas that work so well together that combining them counts as a breakthrough).
EDIT: if anyone thinks I’m on to something here, let me know your thoughts as to whether I should redact the post lest changing minds in this regard is itself hazardous. My thinking for now, though, is that presenting ideas to a safety-conscious audience might well be better than safetyists nodding along to a mental model that I think is, if not incorrect, then poorly framed.
I don’t follow ML research, so let me know if you know of proposed solutions already.
why are we calling it “AI safety”? I think this term generates a lot of “the real danger of AI is bias/disinformation/etc” responses, which should decrease if we make the actual topic clear