There are multiple possible agents in the system, with multiple different implied goals
Such an ontology demands mechanistic evidence and explanation, such as evidence that LLMs perform multiple threads of counterfactual planning across longitudinal Transformer blocks, using different circuits (even if these circuits are at least partially superposed with each other because it’s hard to see how and why they would cleanly segregate from each other within the residual stream during training).
[...] some of them busy computing decisions and behavior of others
One of these entities being more agentic than others means that it gets to determine the eventual outcome.
These are even more extraordinary statements. I cannot even easily imagine a mechanistic model of what’s happening within an LLM (a feed-forward Transformer) that would support these statements. Can you explain?
Such an ontology demands mechanistic evidence and explanation, such as evidence that LLMs perform multiple threads of counterfactual planning across longitudinal Transformer blocks, using different circuits (even if these circuits are at least partially superposed with each other because it’s hard to see how and why they would cleanly segregate from each other within the residual stream during training).
These are even more extraordinary statements. I cannot even easily imagine a mechanistic model of what’s happening within an LLM (a feed-forward Transformer) that would support these statements. Can you explain?