I was aware of that, and maybe my statement was too strong, but fundamentally I don’t know if I agree that you can just claim that it’s rational even though it doesn’t produce rational outputs.
Rationality is the process of getting to the outputs. What I was trying to talk about wasn’t scholarly disposition or non-eccentricity, but the actual process of deciding goals.
Maybe another way to say it is this: LLMs are capable of being rational, but they are also capable of being extremely irrational, in the sense that, to quote EY, their behavior is not a form of “systematically promot[ing] map-territory correspondences or goal achievement.” There is nothing about the pre-training that directly promotes this type of behavior, and any example of this behavior in fundamentally incidental.
Rationality is the process of getting to the outputs.
I think this is true in the sense that a falling tree doesn’t make a sound if nobody hears it, there is a culpability assignment game here that doesn’t address what actually happens.
So if we are playing this game, a broken machine is certainly not good at doing things, but the capability is more centrally in the machine, not in the condition of not being broken. It’s more centrally in the machine in the sense that it’s easier to ensure the machine is unbroken than to create the machine out of an unbroken nothing.
(For purposes of AI risk, it also matters that the capability is there in the sense that it might get out without being purposefully elicited, if a mesa-optimizer wakes up during pre-training. So that’s one non-terminological distinction, though it depends on the premise of this being possible in principle.)
Fair enough, once again I concede your point about definitions. I don’t want to play that game either.
But I do have a point which I think is very relevant to the topic of AI Risk: rationality in LLMs is incidental. It exists because the system is emulating rationality it has seen elsewhere. That doesn’t make it “fake” rationality, but it does make it brittle. It means that there’s a failure mode where the system stops emulating rationality, and starts emulating something else.
It exists because the system is emulating rationality it has seen elsewhere. That doesn’t make it “fake” rationality, but it does make it brittle.
That’s unclear. GPT-4 in particular seems to be demonstrating ability to do complicated reasoningwithout thinking out loud. So even if this is bootstrapped from observing related patterns of reasoning in the dataset, it might be running chain-of-thought along the residual stream rather than along the generated token sequences, and that might be much less brittle. Its observability in the tokens would be brittle, but it’s a question for interpretability how brittle it actually is.
Imagine a graph with “LLM capacity” on the x axis and “number of irrational failure modes” on the y axis. Yes, there’s a lot of evidence this line slopes downward. But there is absolutely no guarantee that it reaches zero before whatever threshold gets us to AGI.
And I did say that I didn’t consider the rationality of GPT systems fake just because it was emulated. That said, I don’t totally agree with EY’s post—LLMs are in fact imitators. Because they’re very good imitators, you can tell them to imitate something rational and they’ll do a really good job being rational. But being highly rational is still only one of many possible things it can be.
And it’s worth remembering that the image at the top of this post was powered by GPT-4. It’s totally possible LLM-based AGI will be smart enough not to fail this way, but it is not guaranteed and we should consider it a real risk.
And I did say that I didn’t consider the rationality of GPT systems fake just because it was emulated.
The point is that there’s evidence that LLMs might be getting a separate non-emulated version already at the current scale. There is reasoning from emulating people showing their work, and reasoning from predicting their results in any way that works despite the work not being shown. Which requires either making use of other cases of work being shown, or attaining the necessary cognitive processes in some other way, in which case the processes don’t necessarily resemble human reasoning, and in that sense they are not imitating human reasoning.
As I’ve noted in a comment to that post, I’m still not sure that LLM reasoning ends up being very different, even if we are talking about what’s going on inside rather than what the masks are saying out loud, it might convergently end up in approximately the same place. Though Hinton’s recent reminders of how much more facts LLMs manage to squeeze into fewer parameters than human brains have somewhat shaken that intuition for me.
Those are examples of LLMs being rational. LLMs are often rational and will only get better at being rational as they improve. But I’m trying to focus on the times when LLMs are irrational.
I agree that AI is aggregating it’s knowledge to perform rationally. But that still doesn’t mean anything with respect to its capacity to be irrational.
There’s the underlying rationality of the predictor and the second order rationality of the simulacra. Rather like the highly rational intuitive reasoning of humans modulo some bugs, and much less rational high level thought.
I am not disagreeing with you in any of my comments and I’ve strong upvoted your post; your point is very good. I’m disagreeing with fragments to add detail, but I agree with the bulk of it.
I was aware of that, and maybe my statement was too strong, but fundamentally I don’t know if I agree that you can just claim that it’s rational even though it doesn’t produce rational outputs.
Rationality is the process of getting to the outputs. What I was trying to talk about wasn’t scholarly disposition or non-eccentricity, but the actual process of deciding goals.
Maybe another way to say it is this: LLMs are capable of being rational, but they are also capable of being extremely irrational, in the sense that, to quote EY, their behavior is not a form of “systematically promot[ing] map-territory correspondences or goal achievement.” There is nothing about the pre-training that directly promotes this type of behavior, and any example of this behavior in fundamentally incidental.
I think this is true in the sense that a falling tree doesn’t make a sound if nobody hears it, there is a culpability assignment game here that doesn’t address what actually happens.
So if we are playing this game, a broken machine is certainly not good at doing things, but the capability is more centrally in the machine, not in the condition of not being broken. It’s more centrally in the machine in the sense that it’s easier to ensure the machine is unbroken than to create the machine out of an unbroken nothing.
(For purposes of AI risk, it also matters that the capability is there in the sense that it might get out without being purposefully elicited, if a mesa-optimizer wakes up during pre-training. So that’s one non-terminological distinction, though it depends on the premise of this being possible in principle.)
Fair enough, once again I concede your point about definitions. I don’t want to play that game either.
But I do have a point which I think is very relevant to the topic of AI Risk: rationality in LLMs is incidental. It exists because the system is emulating rationality it has seen elsewhere. That doesn’t make it “fake” rationality, but it does make it brittle. It means that there’s a failure mode where the system stops emulating rationality, and starts emulating something else.
That’s unclear. GPT-4 in particular seems to be demonstrating ability to do complicated reasoning without thinking out loud. So even if this is bootstrapped from observing related patterns of reasoning in the dataset, it might be running chain-of-thought along the residual stream rather than along the generated token sequences, and that might be much less brittle. Its observability in the tokens would be brittle, but it’s a question for interpretability how brittle it actually is.
Imagine a graph with “LLM capacity” on the x axis and “number of irrational failure modes” on the y axis. Yes, there’s a lot of evidence this line slopes downward. But there is absolutely no guarantee that it reaches zero before whatever threshold gets us to AGI.
And I did say that I didn’t consider the rationality of GPT systems fake just because it was emulated. That said, I don’t totally agree with EY’s post—LLMs are in fact imitators. Because they’re very good imitators, you can tell them to imitate something rational and they’ll do a really good job being rational. But being highly rational is still only one of many possible things it can be.
And it’s worth remembering that the image at the top of this post was powered by GPT-4. It’s totally possible LLM-based AGI will be smart enough not to fail this way, but it is not guaranteed and we should consider it a real risk.
The point is that there’s evidence that LLMs might be getting a separate non-emulated version already at the current scale. There is reasoning from emulating people showing their work, and reasoning from predicting their results in any way that works despite the work not being shown. Which requires either making use of other cases of work being shown, or attaining the necessary cognitive processes in some other way, in which case the processes don’t necessarily resemble human reasoning, and in that sense they are not imitating human reasoning.
As I’ve noted in a comment to that post, I’m still not sure that LLM reasoning ends up being very different, even if we are talking about what’s going on inside rather than what the masks are saying out loud, it might convergently end up in approximately the same place. Though Hinton’s recent reminders of how much more facts LLMs manage to squeeze into fewer parameters than human brains have somewhat shaken that intuition for me.
Those are examples of LLMs being rational. LLMs are often rational and will only get better at being rational as they improve. But I’m trying to focus on the times when LLMs are irrational.
I agree that AI is aggregating it’s knowledge to perform rationally. But that still doesn’t mean anything with respect to its capacity to be irrational.
There’s the underlying rationality of the predictor and the second order rationality of the simulacra. Rather like the highly rational intuitive reasoning of humans modulo some bugs, and much less rational high level thought.
Okay, sure. But those “bugs” are probably something the AI risk community should take seriously.
I am not disagreeing with you in any of my comments and I’ve strong upvoted your post; your point is very good. I’m disagreeing with fragments to add detail, but I agree with the bulk of it.
Ah okay. My apologies for misunderstanding.