Loved this post. This whole idea of using a deterministic dynamical system as a conceptual testing ground feels very promising.
A few questions / comments:
About the examples: do you think it’s strictly correct to say that entropy / death is an optimizing system? One of the conditions of the Flint definition is that the set of target states ought to be substantially smaller than the basin of attraction, by some measure on the configuration space. Yet neither high entropy nor death seem like they satisfy this: there are too many ways to be dead, and (tautologically) too many ways to have high entropy. As a result, both the “dead” property and the “high-entropy” property make up a large proportion of the attraction basin. The original post makes a similar point, though admittedly there is some degree of flexibility in terms of how big the target state set has to be before you call the system an optimizer.
Not sure if this is a useful question, but what do you think of using “macrostate” as opposed to “property” to mean a set of states? This term “macrostate” is used in statistical physics for the identical concept, and as you’re probably aware, there may be results from that field you’d be able to leverage here. (The “size” of a macrostate is usually thought of as its entropy over states, and this seems like it could fit into your framework as well. At first glance it doesn’t seem too unreasonable to just use a flat prior over grid configurations, so this just ends up being the log of the state count.)
I like the way embedded perturbations have been defined too. External perturbations don’t seem fundamentally different from embedded ones (we can always just expand our configuration space until it includes the experimenter) but keeping perturbations “in-game” cuts out those complications while keeping the core problem in focus.
The way you’re using C and P as a way to smoothly vary the “degree” of optimization of a system is very elegant.
Do you imagine keeping the mask constant over the course of a computational rollout? Plausibly as you start a computation some kinds of agents may start to decohere as they moves outside the original mask area and/or touch and merge with bits of their environments. E.g., if the agent is a glider, does the mask “follow” the agent? Or are you for now mostly considering patterns like eaters that stay in one place?
Nice comment—thanks for the feedback and questions!
I think the specific example we had in mind has a singleton set of target states: just the empty board. The basin is larger: boards containing no groups of more than 3 live cells. This is a refined version of “death” where even the noise is gone. But I agree with you that “high entropy” or “death”, intuitively, could be seen as a large target, and hence maybe not an optimization target. Perhaps compare to the black hole.
Great suggestion—I think the “macrostate” terminology may indeed be a good fit / worth exploring more.
Thanks! I think there are probably external perturbations that can’t be represented as embedded perturbations.
Thanks!
The mask applies only at the instant of instantiation, and then is irrelevant for the rest of the computation, in the way we’ve set things up. (This is because once you’ve used the mask to figure out what the initial state for the computation is, you then have just an ordinary state to roll out.)If we wanted to be able to find the agent again later on in the computation then yes indeed some kind of un-instantiation operation might need a mask to do that—haven’t thought about it much but could be interesting.
Oh yeah, I definitely agree with you that the empty board would be an optimizing system in the GoL context. All I meant was that the “Death” square in the examples table might not quite correspond to it in the analogy, since the death property is perhaps not an optimization target by the definition. Sorry if that wasn’t clear.
:)
Got it, thanks! So if I’ve understood correctly, you are currently only using the mask as a way to separate the agent from its environment at instantiation, since that is all you really need to do to be able to define properties like robustness and retargetability in this context. That seems reasonable.
Actually, we realized that if we consider an empty board an optimizing system, then any finite pattern is an optimizing system (because it’s similarly robust to adding non-viable collections of live cells), which is not very interesting. We have updated the post to reflect this.
Great catch. For what it’s worth, it actually seems fine to me intuitively that any finite pattern would be an optimizing system for this reason, though I agree most such patterns may not directly be interesting. But perhaps this is a hint that some notion of independence or orthogonality of optimizing systems might help to complete this picture.
Here’s a real-world example: you could imagine a universe where humans are minding their own business over here on Earth, while at the same time, over there in a star system 20 light-years away, two planets are hurtling towards each other under the pull of their mutual gravitation. No matter what humans may be doing on Earth, this universe as a whole can still reasonably be described as an optimizing system! Specifically, it achieves the property that the two faraway planets will crash into each other under a fairly broad set of contexts.
Now suppose we describe the state of this universe as a single point in a gargantuan phase space — let’s say it’s the phase space of classical mechanics, where we assign three positional and three momentum degrees of freedom to each particle in the universe (so if there are N particles in the universe, we have a 6N-dimensional phase space). Then there is a subspace of this huge phase space that corresponds to the crashing planets, and there is another, orthogonal subspace that corresponds to the Earth and its humans. You could then say that the crashing-planets subspace is an optimizing system that’s independent of the human-Earth subspace. In particular, if you imagine that these planets (which are 20 light-years away from Earth) take less than 20 years to crash into each other, then the two subspaces won’t come into causal contact before the planet subspace has achieved the “crashed into each other” property.
Similarly on the GoL grid, you could imagine having an interesting eater over here, while over there you have a pretty boring, mostly empty grid with just a single live cell in it. If your single live cell is far enough away from the eater than the two systems do not come into causal contact before the single cell has “died” (if the lone live cell is more than 2 cells away from any live cell of the eater system, for example) then they can imo be considered two independent optimizing systems.
Of course the union of two independent optimizing systems will itself be an optimizing system, and perhaps that’s not very interesting. But I’d contend that the reason it’s not very interesting is that very property of causal independence — and that this independence can be used to resolve our GoL universe into two orthogonal optimizers that can then be analyzed separately (as opposed to asserting that the empty grid isn’t an optimizing system at all).
Actually, that also suggests an intriguing experimental question. Suppose Optimizer A independently achieves Property X, and Optimizer B independently achieves Property Y in the GoL universe. Are there certain sorts of properties that tend to be achieved when you put A and B in causal contact?
Loved this post. This whole idea of using a deterministic dynamical system as a conceptual testing ground feels very promising.
A few questions / comments:
About the examples: do you think it’s strictly correct to say that entropy / death is an optimizing system? One of the conditions of the Flint definition is that the set of target states ought to be substantially smaller than the basin of attraction, by some measure on the configuration space. Yet neither high entropy nor death seem like they satisfy this: there are too many ways to be dead, and (tautologically) too many ways to have high entropy. As a result, both the “dead” property and the “high-entropy” property make up a large proportion of the attraction basin. The original post makes a similar point, though admittedly there is some degree of flexibility in terms of how big the target state set has to be before you call the system an optimizer.
Not sure if this is a useful question, but what do you think of using “macrostate” as opposed to “property” to mean a set of states? This term “macrostate” is used in statistical physics for the identical concept, and as you’re probably aware, there may be results from that field you’d be able to leverage here. (The “size” of a macrostate is usually thought of as its entropy over states, and this seems like it could fit into your framework as well. At first glance it doesn’t seem too unreasonable to just use a flat prior over grid configurations, so this just ends up being the log of the state count.)
I like the way embedded perturbations have been defined too. External perturbations don’t seem fundamentally different from embedded ones (we can always just expand our configuration space until it includes the experimenter) but keeping perturbations “in-game” cuts out those complications while keeping the core problem in focus.
The way you’re using C and P as a way to smoothly vary the “degree” of optimization of a system is very elegant.
Do you imagine keeping the mask constant over the course of a computational rollout? Plausibly as you start a computation some kinds of agents may start to decohere as they moves outside the original mask area and/or touch and merge with bits of their environments. E.g., if the agent is a glider, does the mask “follow” the agent? Or are you for now mostly considering patterns like eaters that stay in one place?
Nice comment—thanks for the feedback and questions!
I think the specific example we had in mind has a singleton set of target states: just the empty board. The basin is larger: boards containing no groups of more than 3 live cells. This is a refined version of “death” where even the noise is gone. But I agree with you that “high entropy” or “death”, intuitively, could be seen as a large target, and hence maybe not an optimization target. Perhaps compare to the black hole.
Great suggestion—I think the “macrostate” terminology may indeed be a good fit / worth exploring more.
Thanks! I think there are probably external perturbations that can’t be represented as embedded perturbations.
Thanks!
The mask applies only at the instant of instantiation, and then is irrelevant for the rest of the computation, in the way we’ve set things up. (This is because once you’ve used the mask to figure out what the initial state for the computation is, you then have just an ordinary state to roll out.)If we wanted to be able to find the agent again later on in the computation then yes indeed some kind of un-instantiation operation might need a mask to do that—haven’t thought about it much but could be interesting.
Thanks! I think this all makes sense.
Oh yeah, I definitely agree with you that the empty board would be an optimizing system in the GoL context. All I meant was that the “Death” square in the examples table might not quite correspond to it in the analogy, since the death property is perhaps not an optimization target by the definition. Sorry if that wasn’t clear.
:)
Got it, thanks! So if I’ve understood correctly, you are currently only using the mask as a way to separate the agent from its environment at instantiation, since that is all you really need to do to be able to define properties like robustness and retargetability in this context. That seems reasonable.
Actually, we realized that if we consider an empty board an optimizing system, then any finite pattern is an optimizing system (because it’s similarly robust to adding non-viable collections of live cells), which is not very interesting. We have updated the post to reflect this.
Great catch. For what it’s worth, it actually seems fine to me intuitively that any finite pattern would be an optimizing system for this reason, though I agree most such patterns may not directly be interesting. But perhaps this is a hint that some notion of independence or orthogonality of optimizing systems might help to complete this picture.
Here’s a real-world example: you could imagine a universe where humans are minding their own business over here on Earth, while at the same time, over there in a star system 20 light-years away, two planets are hurtling towards each other under the pull of their mutual gravitation. No matter what humans may be doing on Earth, this universe as a whole can still reasonably be described as an optimizing system! Specifically, it achieves the property that the two faraway planets will crash into each other under a fairly broad set of contexts.
Now suppose we describe the state of this universe as a single point in a gargantuan phase space — let’s say it’s the phase space of classical mechanics, where we assign three positional and three momentum degrees of freedom to each particle in the universe (so if there are N particles in the universe, we have a 6N-dimensional phase space). Then there is a subspace of this huge phase space that corresponds to the crashing planets, and there is another, orthogonal subspace that corresponds to the Earth and its humans. You could then say that the crashing-planets subspace is an optimizing system that’s independent of the human-Earth subspace. In particular, if you imagine that these planets (which are 20 light-years away from Earth) take less than 20 years to crash into each other, then the two subspaces won’t come into causal contact before the planet subspace has achieved the “crashed into each other” property.
Similarly on the GoL grid, you could imagine having an interesting eater over here, while over there you have a pretty boring, mostly empty grid with just a single live cell in it. If your single live cell is far enough away from the eater than the two systems do not come into causal contact before the single cell has “died” (if the lone live cell is more than 2 cells away from any live cell of the eater system, for example) then they can imo be considered two independent optimizing systems.
Of course the union of two independent optimizing systems will itself be an optimizing system, and perhaps that’s not very interesting. But I’d contend that the reason it’s not very interesting is that very property of causal independence — and that this independence can be used to resolve our GoL universe into two orthogonal optimizers that can then be analyzed separately (as opposed to asserting that the empty grid isn’t an optimizing system at all).
Actually, that also suggests an intriguing experimental question. Suppose Optimizer A independently achieves Property X, and Optimizer B independently achieves Property Y in the GoL universe. Are there certain sorts of properties that tend to be achieved when you put A and B in causal contact?