Referential Containment
This is an idea I am toying around with for understanding resolutionally adjusted causal modeling—this is just a bunch of intuitions and pointing towards a somewhat clear framing of a fundamental thing. I am sure there are already plenty of accounts for how to approach this kind of task, but I like to figure stuff out by myself, at least initially.
This could definitely benefit from some visual aids, but I prioritized publishing.
“Referential containment” relates to how one might sensibly divide a system into parts or how a system with many parts might be understood as a system with fewer parts—it relates to the translation between different resolutions.
The mathematical intuition here is that if we think of a physical system as a complex causal graph with many edges and nodes, the nodes being information units and the edges representing the relationships of that information, then if we are tasked with dividing this graph into two areas, the dividing line should cross as few edges as possible.
The two resulting pieces will then have the fewest possible edges between them, so the explanation to account for these edges will also be the shortest. Most “references” are “contained” within the respective objects. This also works for hyperedges, and we might want to assign weights to different kinds of edges to indicate how “costly” they are to cross.
To expand this a bit towards more practical examples we could think of material objects and their surface and permanence. Solid objects are usually referentially contained in their locality, which partially guides our understanding of how to differentiate such objects from their surroundings. If I make a cut through an apple, the pieces are no longer connected as strongly in their locality so it becomes easier to think of them as two objects.
However, this is contextual. The locality of the apple halves is not entirely statistically independent, it rarely happens that two apple halves are moved more than a few meters apart before being eaten. Also, I chose the context of locality here, but one could choose among a great number of contexts that will provide different referential boundaries according to which I can detect objects/phenomena/patterns.
In practice, there will likely be too many dimensions of references to track to begin with (as they can be almost arbitrarily constructed), and my selection of which references/relationships to pay attention to will be functional and subjective.
It is interesting to consider the interplay of objectivity and subjectivity here:
If the understanding of the space before me is subjectively important for me, I can devote a certain amount of processing capacity to it, as storage and attention etc go. But maybe there are just 4 very referentially contained clusters/categories—even if I want to devote more resources to understanding this space, forming a fifth object is a way less efficient/useful step compared to forming the initial four.
Practically, it makes more sense to get more precise about the nature of these objects and their relationships with each other, perhaps dividing them further on the next layer of resolution. Our world seems to supply us with hierarchies of efficient categorization into separate things as very functional/efficient models.
Fundamentally, there may be more salient layers of resolution than can be expressed by one hierarchy.
One mental image I have is to apply some “zooming out” pressure to a causal graph, until certain subgraphs “snap” together into nodes, combining their edges to be less precise/specific as well. The order in which subgraphs “snap together” or “bubble apart” in the reverse operation, and learning processes over that, are the main subjects of study here.
Filtering Information
Basically, I think there are two principles by which to sensibly reduce/filter the amount of information that an agent has to process/store:
The first principle is “computational reducibility” (introduced by Stephen Wolfram). This is about finding a subset/smaller representation of some data, such that it contains all the information that is contained in the data (about some other system).
A basic example of this would be to strip away redundancies in the data. A more advanced example would be that if we can treat a complex system as reliably simulating a simpler system, we can just represent/track the simpler system and retain our ability to make predictions/statements about the complex system.
For a useful application of this, one would have to distinguish between efficiency in storing information vs efficiency in processing information, and also consider a more probabilistic notion of computational reducibility, e.g. such that our simplification retains 95% of the information contained in the original data, rather than insisting on perfect accuracy. This is obviously contextual.
The second principle is sometimes called “relevance realization” (introduced by John Varveake afaik) and refers to the ability to distinguish relevant information from irrelevant information. This is highly agent-, environment- and goal-dependent and also ends up being related to the previous concept of computational reducibility. Datapoints that have little predictive value when computing the future states of a given system tend to be less relevant than those that have a larger sway over the future.
So, for a given agent tasked with filtering/representing the information arriving through its sensory interface, one could say that it is tasked with figuring out the computational reducibility and relevance of the data. I am not using either of these terms precisely as their authors might, but I am not sure if I should just make up new terms in that case.
Anyway, the agent is processing- and storage-constrained (relative to the environment), and is running on a computational substrate that favors some mathematical operations over others, all of which makes certain representations more useful.
Can you figure out how referential containment is(/should be) related to these tasks and constraints?
I didn’t understand everything completely, however when you mentioned “relevance realization” it reminded me of a recent post which gave labels to data which was of different levels of usefulness. The exact labels he used isn’t particularly important, but the categories that they represent are extremely useful. He outlines, more or less, that there are three types of information:
1. Unsorted, unfiltered data. Without a way to discriminate the signal from the noise, this data is basically useless. One example he gives are unsorted error logs for a given computer.
2. Highly relevant, well processed data. When the raw data is filtered, modified, and processed into a hyper usable form, that’s when it is it’s most useful. The ways to slice and dice the data are subjective in the sense that it depends on the goals of the user.
3. Misleading or incorrect data. Some data may be correct, but misleading. The example given is a ticket created because a “website loaded slowly”. Because this ticket was submitted, a lot of time may be taken to determine what may be wrong with a server. However, it turns out that the page loaded slowly because it was accessed via an old machine!
I wonder if the two of you might be interested in exploring these concepts together. Or apologies if I misunderstood!
Hm, I’ll give some thought to how to integrate different types of data with this picture, but I think that the “useful” classification of data ultimately depends on whether the agent possesses the right “key” to interpret it, and by extension, how difficult that “key” is to produce from concepts that the agent is already proficient with.
At the end of the day, the agent can only “understand” any data in terms of internalized concepts, so there will often be some uncertainty whether the difficulty is in translating sensible data into that internal representation or the difficulty is in the data being about phenomena somewhat (or far) outside of conceptual bounds. Having a “key” with respect to some data means that this data can be reliably translated.
Referential containment is about how to structure internal representations of concepts such as to make them most useful (which could mean flexible, or it could mean efficient, etc, depending on the problem domain and constraints).
If humanity forgot all of its medical knowledge tomorrow, would we discover the same categories and sub-categories of medicine, structuring our knowledge and drawing the distinctions into the different areas of expertise similarly?
We could gather a bunch of observational data about “the art of preventing people from dying” and find clusters in interventions or treatment strategies that have appropriate conceptual size to teach to different groups of people. Note that this changes depending on how many people we can allocate in total, how much we believe reliably fits into a single human mind, and how many common features there are in curative or preventative measures (commonalities here roughly referring to useful information that is referentially contained with respect to medicine, but can not be further contained in a single sub-field or small cluster thereof).
Let me know if that makes sense.
I think I don’t have the correct background to understand fully. However, I think it makes a little more sense than when I originally read it.
An analogue to what you’re talking about (referential containment) with the medical knowledge would be something like PCA (principle component analysis) in genomics, right? Just at a much higher, autonomous level.
Yeah, I am not super familiar with PCA, but my understanding is that while both PCA and referential containment can be used to extract lower-dimensional or more compact representations, they operate on different types of data structures (feature vectors vs. graphs/hypergraphs) and have different objectives (capturing maximum variance vs. identifying self-contained conceptual chunks). Referential containment is more focused on finding semantically meaningful and contextually relevant substructures within a causal or relational knowledge representation. It also tries to address the opposite direction, basically how to break existing concepts apart when zooming into the representations, and I am not sure if something like that is done with PCA.
I had Claude 3 read this post and compare the two. Here it is, if you are interested (keep in mind that Claude tends to be very friendly and encouraging, so it might be valuing referential containment too highly):
Similarities:
Both aim to identify patterns and structure in high-dimensional data.
Both can be used to reduce the dimensionality of data by finding lower-dimensional representations.
Both rely on analyzing statistical dependencies or correlations between variables/features.
Differences:
PCA is a linear dimensionality reduction technique that projects data onto a lower-dimensional subspace spanned by the principal components (eigenvectors of the covariance matrix). Referential containment is not inherently a dimensionality reduction technique but rather a way to identify self-contained or tightly coupled substructures within a causal graph or hypergraph representation.
PCA operates on numerical feature vectors, while referential containment deals with graph/hypergraph structures representing conceptual relationships and causal dependencies.
PCA finds orthogonal linear combinations of features that capture maximum variance. Referential containment aims to identify subgraphs with strong internal connections and minimal external connections, which may not necessarily correspond to maximum variance directions.
PCA is an unsupervised technique that does not consider semantics or labeled data. Referential containment, as proposed in the context of the PSCA architecture, can potentially leverage semantic information and learned representations to guide the identification of meaningful conceptual chunks.
PCA produces a fixed set of principal components for a given dataset. Referential containment, as described, is more flexible and can adapt the level of granularity or resolution based on context, objectives, and resource constraints.