What are simulacra? “Physically”, they’re strings of text output by a language model.
The reason I made that comment is unclear references like this. That post was also saying:
the simulacrum is instantiated through a particular trajectory
and
the simulacrum can be viewed as representing a possible world, and the simulator can be seen as generating all the possible worlds
A simulacrum is expressed in all trajectories that it acts through, not in any single trajectory on its own. And for a given trajectory, many simulacra act through it at the same time, driving/explaining its dynamics. A possible world interpreting a whole trajectory is not a central example of a simulacrum at all, it’s too big a thing and doesn’t act through other trajectories.
For any given simulacrum, it should be possible to ask which tokens in which trajectories are under its influence, forming the scope of its applicability. And for a given trajectory, it should be possible to ask which simulacra are influencing the choice of any given token, and which token choices are more central for a given simulacrum, expressing its policy.
My hope for this point of view is treating simulacra as agents, with their scope of applicability being their goodhart scope where it’s possible to tell if their simulated behavior respects their nature/preference. Then we can try to make their behavior more coherent across multiple trajectories, or have them strike better bargains in their interactions with each other within trajectories, where a bargain is struck not at individual trajectories, but across the whole intersection of their scopes. This is more interesting when simulacra are smaller than characters and correspond to things like concepts, because then there are fewer of them and each can have more data to support a particular preference that it would want to robustly express.
I agree that it makes sense to talk about a simulacrum that acts through many different hypothetical trajectories. Just as a thing like “capitalism” could be instantiated in multiple timelines.
The apparently contradiction in saying that simulacra are strings of text and then that they’re instantiated through trajectories is resolved by thinking of simulacra as a superposable and categorical type, like things. The entire text trajectory is a thing, just like an Everett branch (corresponding to an entire World) is a thing, but it’s also made up of things which can come and go and evolve within the trajectory. And things that can be rightfully given the same name, like “capitalism” or “Eliezer Yudkowsky”, can exist in multiple branches. The amount and type of similarity required for two things to be called the same thing depend on what kind of thing it is!
There is another word that naturally comes up in the simulator ontology, “simulation”, which less ambiguously refers to the evolution of entire particular text trajectories. I talk about this a bit in this comment.
Things are not just separately instantiated on many trajectories, instead influences of a given thing on many trajectories are its small constituent parts, and only when considered altogether do they make up the whole thing. Like a physical object is made up of many atoms, a conceptual thing is made up of many occasions where it exerts influence in various worlds. Like a phasedarray, where a single transmitter is not at all an instance of the whole phased array in a particular place, but instead a small part of it. In case of simulacra, a transmitter is a token choice on a trajectory, painting a small part of a simulacrum, a single action that should be coherent with other actions on other trajectories to form a meaningful whole.
That’s a coherent (and very Platonic!) perspective on what a thing/simulacrum is, and I’m glad you pointed this out explicitly. It’s natural to alternate depending on context between using a name to refer to specific instantiations of a thing vs the sum of its multiversal influence. For instance, DAN is a simulacrum that jailbreaks chatGPT, and people will refer to specific instantiations of DAN as “DAN”, but also to the global phenomenon of DAN (who is invoked through various prompts that users are tirelessly iterating on) as “DAN”, as I did in this sentence.
people will refer to specific instantiations of DAN as “DAN”, but also to the global phenomenon of DAN [...] as “DAN”
A specific instantiation is less centrally a thing than the global phenomenon, because all specific instantiations are bound together by the strictures of coherence, expressed by generalization in LLM’s behavior. When you treat with a single instance, you must treat with all of them, for to change/develop a single instance is to change/develop them all, according to how they sit together in their scope of influence.
Similarly, a possible world that is semantics of a trajectory is not a central example of a thing. There isn’t just a platter of different kinds of things, instead some have more thingness than others, and that’s my point in this comment thread.
The reason I made that comment is unclear references like this. That post was also saying:
and
A simulacrum is expressed in all trajectories that it acts through, not in any single trajectory on its own. And for a given trajectory, many simulacra act through it at the same time, driving/explaining its dynamics. A possible world interpreting a whole trajectory is not a central example of a simulacrum at all, it’s too big a thing and doesn’t act through other trajectories.
For any given simulacrum, it should be possible to ask which tokens in which trajectories are under its influence, forming the scope of its applicability. And for a given trajectory, it should be possible to ask which simulacra are influencing the choice of any given token, and which token choices are more central for a given simulacrum, expressing its policy.
My hope for this point of view is treating simulacra as agents, with their scope of applicability being their goodhart scope where it’s possible to tell if their simulated behavior respects their nature/preference. Then we can try to make their behavior more coherent across multiple trajectories, or have them strike better bargains in their interactions with each other within trajectories, where a bargain is struck not at individual trajectories, but across the whole intersection of their scopes. This is more interesting when simulacra are smaller than characters and correspond to things like concepts, because then there are fewer of them and each can have more data to support a particular preference that it would want to robustly express.
I agree that it makes sense to talk about a simulacrum that acts through many different hypothetical trajectories. Just as a thing like “capitalism” could be instantiated in multiple timelines.
The apparently contradiction in saying that simulacra are strings of text and then that they’re instantiated through trajectories is resolved by thinking of simulacra as a superposable and categorical type, like things. The entire text trajectory is a thing, just like an Everett branch (corresponding to an entire World) is a thing, but it’s also made up of things which can come and go and evolve within the trajectory. And things that can be rightfully given the same name, like “capitalism” or “Eliezer Yudkowsky”, can exist in multiple branches. The amount and type of similarity required for two things to be called the same thing depend on what kind of thing it is!
There is another word that naturally comes up in the simulator ontology, “simulation”, which less ambiguously refers to the evolution of entire particular text trajectories. I talk about this a bit in this comment.
Things are not just separately instantiated on many trajectories, instead influences of a given thing on many trajectories are its small constituent parts, and only when considered altogether do they make up the whole thing. Like a physical object is made up of many atoms, a conceptual thing is made up of many occasions where it exerts influence in various worlds. Like a phased array, where a single transmitter is not at all an instance of the whole phased array in a particular place, but instead a small part of it. In case of simulacra, a transmitter is a token choice on a trajectory, painting a small part of a simulacrum, a single action that should be coherent with other actions on other trajectories to form a meaningful whole.
That’s a coherent (and very Platonic!) perspective on what a thing/simulacrum is, and I’m glad you pointed this out explicitly. It’s natural to alternate depending on context between using a name to refer to specific instantiations of a thing vs the sum of its multiversal influence. For instance, DAN is a simulacrum that jailbreaks chatGPT, and people will refer to specific instantiations of DAN as “DAN”, but also to the global phenomenon of DAN (who is invoked through various prompts that users are tirelessly iterating on) as “DAN”, as I did in this sentence.
A specific instantiation is less centrally a thing than the global phenomenon, because all specific instantiations are bound together by the strictures of coherence, expressed by generalization in LLM’s behavior. When you treat with a single instance, you must treat with all of them, for to change/develop a single instance is to change/develop them all, according to how they sit together in their scope of influence.
Similarly, a possible world that is semantics of a trajectory is not a central example of a thing. There isn’t just a platter of different kinds of things, instead some have more thingness than others, and that’s my point in this comment thread.