when chatting with an LLM, do you wonder what its purpose is in the responses it gives? I’m pretty sure it’s “predict a plausible next token”, but I don’t know how I’ll know to change my belief.
I think “it has a simulated purpose, but whether it has an actual purpose is not relevant for interacting with it”.
My intuition is that the token predicter doesn’t have a purpose, that it’s just answering, “what would this chatbot that I am simulating, respond with?”
For the chatbot character (the Simulacrum) it’s, “What would a helpful chatbot want in this situation?” It behaves as if its purpose is to be helpful and harmless (or whatever personality it was instilled with).
(I’m assuming that as part of making a prediction, it is building (and/or using) models of things, which is a strong statement and idk how to argue for it)
I think framing it as “predicting the next token” is similar to explaining a rock’s behavior when rolling as, “obeying the laws of physics”. Like, it’s a lower-than-useful level of abstraction. It’s easier to predict the rock’s behavior via things it’s bouncing off of, its direction, speed, mass, etc.
Or put another way, “sure it’s predicting the next token, but how is it actually doing that? what does that mean?”. A straightforward way to predict the next token is to actually understand what it means to be a helpful chatbot in this conversation (which includes understanding the world, human psychology, etc.) and completing whatever sentence is currently being written, given that understanding.
(There’s another angle that makes this very confusing: whether RLHF fundamentally changes the model or not. Does it turn it from a single token predicter to a multi-token response predicter? Also is it possible that the base model already has goals beyond predicting 1 token? Maybe the way it’s trained somehow makes it useful for it to have goals.)
There have been a number of debates (which I can’t easily search on, which is sad) about whether speech is an action (intended to bring about a consequence) or a truth-communication or truth-seeking (both imperfect, of course) mechanism
The immediate reason is, we just enjoy talking to people. Similar to “we eat because we enjoy food/dislike being hungry”. Biologically, hunger developed for many low-level processes like our muscles needing glycogen, but subjectively, the cause of eating is some emotion.
I think asking what speech really is at some deeper level doesn’t make sense. Or at least it should recognize that why individual people speak, and why speech developed in humans, are separate topics, with (I’m guessing) very small intersections.
Twitter doesn’t incentivize truth-seeking
Twitter is designed for writing things off the top of your head, and things that others will share or reply to. There are almost no mechanisms to reward good ideas, to punish bad ones, nor for incentivizing the consistency of your views, nor any mechanism for even seeing whether someone updates their beliefs, or whether a comment pointed out that they’re wrong.
(The fact that there are comments is really really good, and it’s part of what makes Twitter so much better than mainstream media. Community Notes is great too.)
The solution to Twitter sucking, is not to follow different people, and DEFINITELY not to correct every wrong statement (oops), it’s to just leave. Even smart people, people who are way smarter and more interesting and knowledgeable and funny than me, simply don’t care that much about their posts. If a post is thought-provoking, you can’t even do anything with that fact, because nothing about the website is designed for deeper conversations. Though I’ve had a couple of nice moments where I went deep into a topic with someone in the replies.
Shortforms are better
The above thing is also a danger with Shortforms, but to a lesser extent, because things are easier to find, and it’s much more likely that I’ll see something I’ve written, see that I’m wrong, and delete it or edit it. Posts on Twitter or not editable, they’re harder to find, there’s no preview-on-hover, there is no hyperlinked text.