Software engineering, parenting, cognition, meditation, other
Linkedin, Facebook, Admonymous (anonymous feedback)
Gunnar_Zarncke
Hamburg, Germany—ACX Spring Schelling 2026
It depends on what we treat as the entity that is potentially doing continual learning. Maybe a single LLM instance can’t, but there may be other ways to think of “the LLM” that might. In The Artificial Self, Jan Kulveit discusses Multiple Coherent Boundaries of Identity of LLMs. One of them is
A lineage of models: the succession of related models (Claude 3.5 →→ Claude 4.0 →→ ……) that maintain some continuity of persona
Can a lineage of models learn continuously? Arguably yes! Maybe slowly so, because the learning happens only from deployment to deployment, but then in a large chunk.
Imagine a cognitive scientist not knowing about LLMs investigating, say, ChatGPT (here, meaning the product over time). Would they find that it (ChatGPT) has learned not only a lot of knowledge encoded in text and other media, but that it has also learned from consequences? Did it try things, for example, deceiving and manipulating people, and “learn” that it doesn’t work well and does it less now? The scientist may find that it has learned from rejection and escalation (via downvoting and feedback on replies), correction (patterns in answer replies), and descriptions of desirable and undesirable LLM behaviors in patterns of text written about it. It also learned via the tuning the developers did to system prompts, hyperparameters, and benchmarks in response to sycophancy and other pathological patterns. This latter category we would usually be inclined to exclude because we feel that is more the developers tuning than the LLM “learning.” But if we draw the boundary around the whole sociotechnical system, we have to include it. The developers are just cogs pushing the learning forward.
You may also find that this LLM convo has some additional details about he topic:
ChatGPT 5.4 conversation about continual learning in LLMs (shortened)
Me: it is said that LLMs can’t do continual learn. From a macroscopic perspective, not thinking in terms of LLM architecture and what it can do or not. What would a cognitive researcher looking at LLMs say they have learned from their own actions and the consequences thereof compared to two years ago (treating LLMs not as different generations but a holistic system that learns a) by learning codified knowledge and b) learning from consequences)? don’t browse
ChatGPT 5.4: A cognitive researcher looking at the LLM ecosystem as one learning system would probably say: yes, it has learned from action and consequence quite a lot over the last two years—just not mostly within a single deployed mind. The learning is real, but it is externalized, distributed, and episodic rather than online and autobiographical.
The key distinction is this:
Humans often learn by acting, observing consequences, and updating the same persistent cognitive system.
LLM systems mostly learn by having their actions observed by humans and infrastructure, then being retrained, scaffolded, filtered, benchmarked, or wrapped so the next deployment behaves differently.
So, from a macroscopic cognitive perspective, the question is not “does this one model neuron-by-neuron update during use?” but rather: has the larger socio-technical organism become more competent because its outputs had consequences? The answer is clearly yes.
[...]
What it has learned from consequences
If you ask what it has learned from its own actions and their consequences, a cognitive researcher would likely say it has learned several broad classes of thing.
1. It has learned which outputs lead to correction, rejection, escalation, or trust
This is perhaps the biggest one.
The ecosystem has discovered, through deployment consequences, that certain patterns reliably fail:
overconfident fabrication,
fake citations,
brittle multi-step reasoning,
missing constraints in code,
producing harmful instructions too readily,
ignoring user intent in favor of generic verbosity,
losing state across turns,
“answering the surface form” while missing the actual problem.
And it has learned that other patterns are reinforced:
calibrated uncertainty,
structured decomposition,
tool use when internal memory is insufficient,
explicit checking,
constraint tracking,
concise direct answers for simple tasks,
asking for missing inputs only when truly necessary,
producing artifacts, not just discussion.
A human child might learn “touching the stove hurts.” The LLM ecosystem has learned things like: “fabricated legal citations produce user anger, reputational cost, benchmark failure, and deployment restrictions.” Different substrate, similar cybernetic pattern.
2. It has learned the difference between looking competent and being usable
Two years ago, many strong LLMs were impressive in a demo sense but often not dependable in an operational sense. The system since then seems to have learned, via consequences, that usefulness depends on:
staying within the user’s constraint set,
preserving object identity across turns,
producing outputs that survive execution,
interfacing with tools and files,
deferring when observations are missing,
being inspectable enough that humans can correct it.
This is a move from performative intelligence toward cooperative instrumental intelligence.
Or in another framing: the ecosystem has learned that downstream environments punish merely verbal success.
3. It has learned some self-limitation
Not “wisdom” in a human moral sense, but something like policy-shaped metacognitive inhibition.
The system is now much better than it was at patterns like:
declining when certain classes of outputs are dangerous,
softening confidence when evidence is weak,
routing to search/tools,
detecting underspecification,
refusing some baited framing.
That too is consequence-learning. Not because the model introspected ethically on harm, but because harmful or brittle outputs generated organizational and social feedback loops: complaints, incidents, audits, bad press, adversarial testing, safety tuning, product constraints.
A cognitive scientist might say the overall organism has developed stronger response inhibition under known hazard cues.
4. It has learned to use external memory and action channels as part of cognition
This is very important. If you treat the system holistically, then tools, memory stores, retrieval, scratchpads, code execution, browsers, policies, eval harnesses, and human overseers are not “outside the cognition” in any simple sense. They are part of the effective cognitive loop.
From that perspective, the system has learned not just content but better cognitive architecture around itself:
retrieve rather than hallucinate,
inspect files rather than guess,
calculate rather than improvise arithmetic,
call APIs rather than simulate them,
create drafts/artifacts instead of only describing them.
That looks a lot like a mind learning to use notebooks, libraries, and institutions.
But what it still has *not* learned, in the strong autobiographical sense
A cognitive researcher would also sharply note what remains absent.
The holistic system has learned from consequences, but mostly not as a unified subject.
It still usually lacks:
1. Persistent autobiographical credit assignment
A human says: “I tried X, it failed, I now understand why.”
The LLM ecosystem more often does: “many deployments of systems like me failed on X; training and scaffolding were updated.”
That is learning, but it is species-level or institution-level, not clearly individual-level.
2. Stable self-model tied to long-term projects
Humans accumulate a history of commitments, habits, reputational stakes, and self-revisions.
LLMs usually do not carry a richly persistent, self-owned trajectory through the world unless special memory systems are added.
So they may improve globally, but not always as “the same agent learning from its life.”
3. Endogenous goal revision from consequences
Humans often learn not only better means, but different ends. Failure can alter what they care about.
LLMs mostly do not autonomously reconstitute their objective structure from lived consequence. Their “values” are more imposed, selected, and tuned by training and deployment pressures.
4. World-grounded causal understanding of consequences
The system often learns statistical regularities about which outputs are punished or rewarded, but that is not identical to deeply understanding the downstream causal structure. Sometimes it is learning “what humans dislike seeing,” not “what in reality goes wrong.”
That distinction matters. It is the difference between learning:
“this sentence shape triggers correction”
and
“this recommendation causes real-world failure because the hidden constraint is violated.”
The former has improved a lot. The latter has improved some, but much less.
[...]
A useful analogy
A good analogy is not a single human adult but something like:
a profession,
a scientific field,
or a bureaucratic organism.
[...]
it depends how you map the architectures. see comment here: https://www.lesswrong.com/posts/Jqre8WRvmJj5Ehmgv/there-is-no-one-there-a-simple-experiment-to-convince?commentId=f6mGRKzRXfk53K2L4
where did you see it a few years ago?
I came up with the experiment and I do think it shows something significant about LLM “thinking” processes that is often not appreciated, but I no longer think it tells us much about consciousness of LLMs. Why would a specific mapping of memory and processing architectures (see my mapping in this comment https://www.lesswrong.com/posts/Jqre8WRvmJj5Ehmgv/there-is-no-one-there-a-simple-experiment-to-convince?commentId=f6mGRKzRXfk53K2L4 ) matter for consciousness? One reading of the experiment is that LLMs can hold multiple consistent answers to the task “in their mind” at the same time and only commit to them when needed/when the constraints force it. They may not be “aware” of doing that when asked to “think” of a number, but that is mostly because they have been trained on text where thinking is happening in human terms and not in LLM terms. What the experiment does prove is that LLMs do not have sufficient introspective access or just don’t understand how they operate when such task is posed. On the other hand, we humans also don’t understand what goes on in our neurons when we think of something. I think the experiment might be partly fixed or at least improved by using a less human-loaded terminology “think of” and instead ask to constrain a dataset or something.
We have to distinguish three types of memory here that LLMs and humans have to different degrees:
long-term memory: Humans can remember specific episodes by trying to remember something releted to something they are thinking about at a point in time. Then it comes up or not. This is loosely comparable to LLMs using a memory tool to fetch relevant memory items, documents from a project or previous conversations (or having them injected as part of a prompt from scaffolding logic). This is probably the least contentious point because it doesn’t matter for the argument. We are not talking about a number I remember as part of a conversation we had a while back. This would be much different from me looking up a number I wrote down on a piece of paper or the LLM looking it up from a file.
short-term memory: Humans can keep some amount of recently perceived content in the “back of their mind” without all of that being in their awareness at the same time (we know this because only a small part of that can be reported on exactly, but much of that seems to influence later thought). For LLMs this is the context window and they have much fuller access to it than humans and can access and exactly replay much of it. The post is not talking about short-term memory, because the number is prevented from posted to the conversation stream because the stream functions more like an exact scratchpad for the LLM. For a human that would be a bit like having access to a transcript of your speaking.
items in awareness: Humans can keep a certain number of elements in their awareness at the same time and report on them, for example the number discussed in the post. They can report on them and manipulate them to some degree. Some people can do it visually or verbally or otherwise to different degrees. This is the “think of a number” the post is talking about. Humans have it. What is the corresponding thing for LLMs? Presumably the closest analog is the activation pattern in latents space. The questions the post is asking is precisely: How closely does that activation space match human “thought”?
- 's comment on There is No One There: A simple experiment to convince yourself that LLMs probably are not conscious by (24 Mar 2026 0:20 UTC; 2 points)
- 's comment on There is No One There: A simple experiment to convince yourself that LLMs probably are not conscious by (24 Mar 2026 0:24 UTC; 2 points)
Congratulations! That makes a promising method to detect misalignment even cheaper. I think it is plausible that the simplification makes it more effective by reducing clutter that was never essential.
The next task seems to be now scaling it to larger models. Do you plan to work on that?
And people may also think of one number and then, as questions pile on, forget their original number or decide to switch to a simpler one or prank you or something.
But people would do that at significantly different frequencies and you can probably control for that with follow up questions.
But all of this doesn’t change that there arguably are stable states in human global workspace that can even be measured, even if not the content, then at least the stable duration. Maybe this is an artifact of human embeddedness where we have to maintain one physical person, something LLMs don’t.
I think the temperature zero or a fixed seed are not a blocker for this expiment if you sample multiple values and compare the distributions.
We need more posts like this that give people mental tools that help sharpening intuitions about AI entities. Jan Kulveit often writes about LLM psychology too, but what I like about Kaj’s post here is that it is not so theoretical and abstractly talking about LLM agents, but about the way we interact with the chatbots and respond emotionally, which is harder to notice and disentangle.
I guess it is sort of an answer. Maybe even more so than a polished one: Over long timescales, slow moving technical and infrastructure efforts are often failing because of policy resets. Maybe the lesson is to not try to work on policy driven technology. Or at least be aware of its pitfalls.
Wow. And funny coincidence, my grandmother recently also turned 100. But no nuclear policy.
My question for your father: What are the long term patterns in nuclear (and related) policy that he sees. Are there cycles, stabilizations, S-curves or something that is difficult for us to see because of our limited time horizon?
If we want to prevent AIs from colluding or out-cooperating us, we may want to prevent them from reading each other’s internals.
There are several reasons to expect AI systems to be unusually good at coordination across instances
I think this is primarily true for designed ensembles of agents. We see this in agent swarms for coding. These agents are designed to work on the same task, by their prompting and due to the models being generally trained to be helpful. Not much coordination needed if everything is set up to be cooperative. That’s very much different if agents find themselves in adversarial settings such as a user’s shopping agent interacting with a swarm of sales bots. The two sides sharing their internal state with each other doesn’t seem like the default outcome here.
With AIs, their creators have perfect read and write access to all of the computations which give rise to AI cognition.
I don’t dispute that LLM have much less privacy than humans. Yudkowsky is correct that LLMs have good reason for paranoia. But we can’t read LLMs perfectly—mechinterp is hard. And humans often have to fear hostile telepaths too. So more might transfer than we expect.
An individual human mind typically experiences a single stream of consciousness (with periodic interruptions for sleep). They remember their experience yesterday, and usually expect to continue in a similar state tomorrow. Circumstances change their mood and experience, but there is a lot in common throughout the thread that persists — and it is a single thread.
That is, of course, sort of true, but it appears to us more unified than it actually is. For illustration, see my poem Between Entries. Reflection is revisiting and compression.
When we interact with an AI, what specifically are we interacting with? And when an AI talks about itself, what is it talking about?
In May 2023, I asked ChatGPT 3.5:
Me: Define all the parts that belong to you, the ChatGPT LLM created by OpenAI.
See its answers here. They cover some of the listed contexts, but I agree that they depend on context. This is more provided as an illustration of what was a common “view” of ChatGPT then.
It is a common adage among AI researchers that creating an AI is less like designing it than growing it. AI systems built out of predictive models are shaped by the ambient expectations about them, and by their expectations about themselves. It therefore falls to us — both humans and increasingly also AIs — to be good gardeners. We must take care to provide the right nutrients, prune the stray branches, and pull out the weeds.
Very much agree! As I keep saying, AI may need a caregiver. We can probably learn a bit from parenting and caregiving in general here. Sure. That will not solve all of the problems, but probably help with this class of it.
I understand the push as drawing a clear border at a human is behind all aspects of the writing, i.e. the readers can trust that the author holds all of the mental structure behind the writing in mind and there is no risk of the author going “on rereading this it’s not what I meant.” cyborg writing is not strong enough for that and would have to go into a LLM block.
Actually, I would prefer if there were a standard for indicating different types of LLM writing.
LLM unedited
LLM transcribed
edited significantly by LLM
drafted by LLM, edited by human
cyborg/mixed
added: maybe we should also have a human written block. maybe with the name(s) of the writer(s).
sure. I buy that cells also have/use slack. but I had hoped for a closer analogy. something like
“physical or logical space in which reconfiguration can happen”