Jonothan Gorard is a mathematician for Wolfram Research and one of the cofounders of the wolfram physics project. I recently came across this twitter thread from him and found it particularly insightful:
Jonothan Gorard: The territory is isomorphic to an equivalence class of its maps.
As this is pretty much the only statement of my personal philosophical outlook on metaphysics/ontology that I’ve ever made on here, I should probably provide a little further clarification. It starts from a central idea from philosophy of science: theory-ladenness.
As argued by Hanson, Kuhn, etc., raw sense data is filtered through many layers of perception and analysis before it may be said to constitute “an observation”. So making a truly “bare metal” observation of “reality” (uninfluenced by theoretical models) is impossible.
Hence, if we see nothing as it “truly is”, then it only ever makes sense to discuss reality *relative* to a particular theoretical model (or set of models). Each such model captures certain essential features of reality, and abstracts away certain others.
There are typically many possible models consistent with a given collection of sense data. E.g. we may choose to decompose an image into objects vs. colors; to idealize a physical system in terms of particles vs. fields; to describe a given quale in English vs. Spanish.
Each model captures and abstracts a different set of features, so that no single one may be said to encompass a complete description of raw sense data. But now consider the set of all possible such models, capturing and abstracting all possible combinations of features.
If we accept the premise that observations are theory-laden, then my contention is that there cannot exist any more information present within “objective reality” than the information encoded in that collection of consistent models, plus the relationships between them.
This is analogous to the Yoneda lemma in category theory: any abstract category can be “modeled” by representing its objects as sets and its arrows as set-valued functions. Each such “model” is lossy, in that it may not capture the full richness of the category on its own.
Yet the collection of all such models, plus the relationships between them (i.e. the functor category) *does* encode all of the relevant information. I suspect that something quite similar is true in the case of ontology and the philosophy of science.
One advantage of this philosophical perspective is that it is testable (and thus falsifiable): it suggests, for instance, that within the collection of all possible words (across all possible languages) for “apple”, and the linguistic relationships between them is encoded the abstract concept of “apple” itself, and that all relevant properties of this concept are reconstructable from this pattern of linguistic relationships alone. Distributional semantics potentially gives one a way to test this hypothesis systematically.
So no map alone constitutes the territory, but the territory does not exist without its maps, and the collection of all such maps (and the interrelationships between them) is perhaps all that the “territory” really was in the first place.
Some related ideas that this thread brings to mind:
In general relativity, we define tensors and vectors not by their numerical components under a particular coordinate system, but by how those components change as we perform a coordinate transformation. We enable our descriptions to become “closer” to the territory by allowing it to translate across a wide range of possible maps. Similarly, we capture properties of the territory by capturing information that remains invariant when we translate across maps
Abram Demski has a related idea where a perspective becomes more “objective” by being easily translatable across many different perspectives, so that it removes the privilege from any particular perspective
In univalent foundations, we treat isomorphic mathematical objects as essentially the same. We can think of the “territory of math” as the equivalence classes of all isomorphic descriptions, and we get “closer” to that territory by ignoring differences between instances within an equivalence class (ignoring implementation details)
Due to embedded agency, no embedded map can contain the entire territory. We can think of natural latents as low-dimensional summaries of the territory that is isomorphic to the equivalence class of all embedded maps, which singles out features of the territory that is invariant under translation across those maps
Some comments on this post:
Hm, this depends on how large of a territory you are claiming, but while I think this is probably true in a whole lot of practical cases, in special cases you can have a map that is both embedded and yet contains the entire territory, albeit the natural examples I’m thinking of only really work in the infinite case, like infinite state Turing machines, or something like this:
https://arxiv.org/abs/1806.08747
I think the basic reason for that is that in the infinite case, you can often abuse the fact that having a smaller area to do computations with still results in you having the same infinity of computational power, whereas you really can’t do that with only finitely powerful computation.
That said, I agree with this:
In practice, I think this statement is actually very true.
Interesting, I’ll check it out!
Note that it isn’t intended to be in any way a realistic program we could ever run, but rather an interesting ideal case where we could compute every well-founded set.
Quines print the whole territory from a smaller template they hold as a map. There is a practical limitation, and a philosophical difficulty (the meaning of a map considered by itself is a different kind of thing from the territory). But it’s not about embedded agency.
Presumably there can be a piece of paper somewhere with the laws of physics & initial conditions of our universe written on it. That piece of paper can “fully capture” the entire territory in that sense.
But no agents within our universe can compute all consequences of the territory using that piece of paper (given computational irreducibility), because that computation would be part of the time evolution of the universe itself
I think that bit is about embedded agency
A map is something that can answer queries, it doesn’t need to be specifically a giant lookup table. If a map can perfectly describe any specific event when queried about it, it’s already centrally a perfect map, even if it didn’t write down all answers to all possible questions on stone tablets in advance. But even then, in a computational territory we could place a smaller map that is infinite in time, and it will be able to write down all that happens in that territory at all times, with explicit representations of events in the territory being located either in the past or in the future of the events themseves.
I agree, just changed the wording of that part
Embedded agency & computational irreducibility implies that the smaller map cannot outpace the full time evolution of the territory because it is a part of it, which may or may not be important for real world agents.
In the case where the response time of the map does matter to some extent, embedded maps often need to coarse grain over the territory to “locally” outpace the territory
We may think of natural latents as coarse grainings that are convergent for a wide variety of embedded maps
A program can predict another program regardless of when either of them is instantiated in the territory (neither needs to be instantiated for this to work, or they could be instantiated at many times simultaneously). Statistical difficulties need to be set up more explicitly, there are many ways of escaping them in principle (by changing the kind of territory we are talking about), or even in practice (by focusing on abstract behavior of computers).
The claim was that a subprogram (map) embedded within a program(territory) cannot predict the *entire execution trace of that program faster than the program itself given computational irreducibility
“there are many ways of escaping them in principle, or even in practice (by focusing on abstract behavior of computers).”
Yes, I think this is the same point as my point about coarse graining (outpacing the territory “locally” by throwing away some info)
My point about computers-in-practice is that this is no longer an issue within the computers, indefinitely. You can outpace the territory within a computer using a smaller map from within the computer. Whatever “computational irreducibility” is, the argument doesn’t apply for many computations that can be set up in practice, that is they can be predicted by smaller parts of themselves. (Solar flares from distant future can’t be predicted, but even that is not necessarily an important kind of practical question in the real world, after the universe is overwritten with computronium, and all the stars are dismantled to improve energy efficiency.)
I don’t think we disagree?
The point was exactly that although we can’t outpace the territory globally, we can still do it locally(by throwing out info we don’t care about like solar flares)
That by itself is not that interesting. The interesting part is given that different embedded maps throw out different info & retain some info, is there any info that’s convergently retained by a wide variety of maps? (aka natural latents)
The rest of the disagreement seems to boil down to terminology
The locally/globally distinction is suspicious, since “locally” here can persist at an arbitrary scale. If all the different embedded maps live within the same large legible computation, statistical arguments that apply to the present-day physical world will fail to clarify the dynamics of their interaction.
Yes, by “locally outpace” I simply meant outpace at some non-global scale, there will of course be some tighter upper bound for that scale when it comes to real world agents
What I’m saying is that there is no upper bound for real world agents, the scale of “locally” in this weird sense can be measured in eons and galaxies.
Yes, there’s no upper bound for what counts as “local” (except global), but there is an upper bound for the scale at which agents’ predictions can outpace the territory (eg humans can’t predict everything in the galaxy)
I meant upper bound in the second sense
The relevance of extracting/formulating something “local” is that prediction by smaller maps within it remains possible, ignoring the “global” solar flares and such. So that is a situation that could be set up so that a smaller agent predicts everything eons in the future at galaxy scale. Perhaps a superintelligence predicts human process of reflection, that is it’s capable of perfectly answering specific queries before the specific referenced event would take place in actuality, while the computer is used to run many independent possibilities in parallel. So the superintelligence couldn’t enumerate them all in advance, but it could quickly chase and overtake any given one of them.
Even a human would be capable of answering such questions if nothing at all is happening within this galaxy scale computer, and the human is paused for eons after making the prediction that nothing will be happening. (I don’t see what further “first sense” of locality or upper bound that is distinct from this could be relevant.)
I intended ‘local’ (aka not global) to be a necessary but not sufficient condition for predictions made by smaller maps within it to be possible (cuz global predictions runs into problems of embedded agency)
I’m mostly agnostic about what the other necessary conditions are & what the sufficient conditions are
See nerve and nerve and realization for a clearer description of what the premise of the claim is probably gesturing at. (This seems comically useless for someone who can’t read the language, though I’ll try to briefly explain.)
That is, if there is a category C where an interesting object c (such as “the territory”) lives, we can observe that object by first setting up abstract measuring apparatus that takes the form of a category S, and then setting up a “realization” process F that “concretely instantiates” objects and morphisms of S in C, that is it’s a functor from S to C. Then, for any object s of S (a measuring device), we can observe the ways in which F(s) (realization of s in C) maps to the object of study c, and capture the whole collection [F(s), c] of ways in which it does so in C. In the straightforward case this is some set of maps, but more generally it could include more data than only a set (maybe C is an enriched category). Then we can take all our observations [F(s), c] taken in C using each abstract measuring device s, together with ways in which they are related in C, and assemble them into an object of an “abstract map” (“nerve”) of c that’s now written in the language of S and not in the language of C.
The datastructure that encodes this kind of “abstract map” is called a presheaf (on S). Presheaves can be understood as being built out of many abstract measuring devices (objects of S) by “glueing” them together into larger objects (taking coproducts). The Yoneda embedding indicates where the original primitive measuring devices from S exist as presheaves in the category of presheaves PSh(S), with no glueing applied. If understood as a “realization” process, Yoneda embedding can be used in an example of the same methodology for building “abstract maps”. That is, each object (measuring device) from S can be “instantiated” in PSh(S), and then these instances can be collectively used to observe arbitrary objects of PSh(S), creating their “abstract maps”. The Yoneda lemma then says that an “abstract map” of an “abstract map” (some object of PSh(S)) obtained through this method is that same original “abstract map” itself.