In my previous post I listed some considerations for a theory of narratives. The smallest building block of narratives are abstractions over empirically observed things and events; that is the ontology of the language that the narrative uses. In this post I want to start laying out a framework that allows showing how, although initially, one may have observational access to a graph with clear causality between events, by abstracting over its vertices and edges naively (or as best you can?), you lose this causal clarity and are left with correlation. The end goal is to end up with some considerations on how to abstract well while preserving causal clarity optimally.
My observational model liberally affords you observations of the universe in the form of a directed acyclic graph, which consists of perceptions in two embedding spaces: vertices, which we’ll call entities, and directed edges, which we’ll call actions. Each of these observation sets is strictly partially ordered (non-reflexive, asymmetric, transitive) by time. Both entities and actions are encodings of your sensory pre-processing into some perception space with some topology that allows for grouping/clustering/classification. Note that “entities” in this model do not yet persist across time, but are mere instantaneous observations that may at most correspond to more permanent entity entries that you might keep track of in some separate dynamical model of the universe.
The above observational structure is supposed to directly represent your best possible model of causation between things; the action edges are observed causations. This model is already limited in predicting the world in three major ways:
Entities and actions are already (perceptual preprocessing) abstractions over incredibly complex systems, such as a human, with layers of theoretically analyzable causal interactions going down multiple orders of magnitude and academic branches down to the quantum level. Thus even the perceived entities and actions are already abstracted intelligible patterns in some spatiotemporal substrate, so there is already a loss of causal clarity when going from the true (quantum?) lowest level causal interactions to interactions at any reasonably perceptible level of abstraction/grouping above, both spatially and temporally. I do not directly take this loss into account in this post, and want to show that even if the embeddings of instances of entities and the causal direction of the instance’s interactions are assumed to match relatively well across multiple communicating agents, there will still be a loss once the agents try to agree on a language to talk/abstract over these causal interactions.
Since the observational model is downstream of noisy perception and preprocessing, all vertices and edges already have a confidence in their being accurate attached. Thankfully, the aspect of your model you can usually be most confident about is the temporal ordering, which at least lets you rule out causal connections.
You cannot reason about your confidence in what kinds of entities cause what through which actions in which contexts without grouping/clustering entities together. That is, in order to attach probabilities to the relations in your separate dynamical, and most importantly, predictive model of the world that your observations seem to be coming from, you need to make statistics over similar entities/actions/contexts. Luckily, this grouping together at the same time as suddenly letting you decide by comparing probabilities, also makes for a more compressed model.
The above point, which is about internal reasoning, also holds for communication with other agents.
A tool used to address points 3 and 4 by a communicating agent group is a vocabulary, which is built up from two ontologies, firstly the ontology of nouns, that is a system of subsets of the entity embedding space. Each noun thus has an associated binary classifier in entity perception space, which is just the characteristic function of the subset. Since it’s a system of subsets, it is partially ordered by the subset relation, which we could call “is-a”. There is also the ontology of verbs, which is a system of subsets of the action embedding space. Note that these two structures are expressive enough to construct higher order types, as in type theory, from them, which could be used by the full vocabulary and associated grammar. I do not want to go that deep in this post, though.
Okay, so now that I have some reasonable data structures, here’s the crux of the problem: The larger the number of observed instances of interactions between instances of two nouns (entity sets), the more likely it is to be “difficult” to separate these entity sets in a way that preserves the causal directionality within interacting groups . Said another way, given two clusters of things of kinds A and B that interact with each other, it may be easy to classify instances of each cluster to be at one end of some interaction, but it may be hard to separate each cluster according to the observed directionality of the interaction.
I believe that often when we talk about correlation vs. causation, it would actually be possible to get a lot of mileage out of trying to structure our ontologies better.
I will follow up with some examples and try to justify this belief more, but for now, that’s the post.
Abstraction sacrifices causal clarity
In my previous post I listed some considerations for a theory of narratives. The smallest building block of narratives are abstractions over empirically observed things and events; that is the ontology of the language that the narrative uses. In this post I want to start laying out a framework that allows showing how, although initially, one may have observational access to a graph with clear causality between events, by abstracting over its vertices and edges naively (or as best you can?), you lose this causal clarity and are left with correlation. The end goal is to end up with some considerations on how to abstract well while preserving causal clarity optimally.
My observational model liberally affords you observations of the universe in the form of a directed acyclic graph, which consists of perceptions in two embedding spaces: vertices, which we’ll call entities, and directed edges, which we’ll call actions. Each of these observation sets is strictly partially ordered (non-reflexive, asymmetric, transitive) by time. Both entities and actions are encodings of your sensory pre-processing into some perception space with some topology that allows for grouping/clustering/classification. Note that “entities” in this model do not yet persist across time, but are mere instantaneous observations that may at most correspond to more permanent entity entries that you might keep track of in some separate dynamical model of the universe.
The above observational structure is supposed to directly represent your best possible model of causation between things; the action edges are observed causations. This model is already limited in predicting the world in three major ways:
Entities and actions are already (perceptual preprocessing) abstractions over incredibly complex systems, such as a human, with layers of theoretically analyzable causal interactions going down multiple orders of magnitude and academic branches down to the quantum level. Thus even the perceived entities and actions are already abstracted intelligible patterns in some spatiotemporal substrate, so there is already a loss of causal clarity when going from the true (quantum?) lowest level causal interactions to interactions at any reasonably perceptible level of abstraction/grouping above, both spatially and temporally. I do not directly take this loss into account in this post, and want to show that even if the embeddings of instances of entities and the causal direction of the instance’s interactions are assumed to match relatively well across multiple communicating agents, there will still be a loss once the agents try to agree on a language to talk/abstract over these causal interactions.
Since the observational model is downstream of noisy perception and preprocessing, all vertices and edges already have a confidence in their being accurate attached. Thankfully, the aspect of your model you can usually be most confident about is the temporal ordering, which at least lets you rule out causal connections.
You cannot reason about your confidence in what kinds of entities cause what through which actions in which contexts without grouping/clustering entities together. That is, in order to attach probabilities to the relations in your separate dynamical, and most importantly, predictive model of the world that your observations seem to be coming from, you need to make statistics over similar entities/actions/contexts. Luckily, this grouping together at the same time as suddenly letting you decide by comparing probabilities, also makes for a more compressed model.
The above point, which is about internal reasoning, also holds for communication with other agents.
A tool used to address points 3 and 4 by a communicating agent group is a vocabulary, which is built up from two ontologies, firstly the ontology of nouns, that is a system of subsets of the entity embedding space. Each noun thus has an associated binary classifier in entity perception space, which is just the characteristic function of the subset. Since it’s a system of subsets, it is partially ordered by the subset relation, which we could call “is-a”. There is also the ontology of verbs, which is a system of subsets of the action embedding space. Note that these two structures are expressive enough to construct higher order types, as in type theory, from them, which could be used by the full vocabulary and associated grammar. I do not want to go that deep in this post, though.
Okay, so now that I have some reasonable data structures, here’s the crux of the problem: The larger the number of observed instances of interactions between instances of two nouns (entity sets), the more likely it is to be “difficult” to separate these entity sets in a way that preserves the causal directionality within interacting groups . Said another way, given two clusters of things of kinds A and B that interact with each other, it may be easy to classify instances of each cluster to be at one end of some interaction, but it may be hard to separate each cluster according to the observed directionality of the interaction.
I believe that often when we talk about correlation vs. causation, it would actually be possible to get a lot of mileage out of trying to structure our ontologies better.
I will follow up with some examples and try to justify this belief more, but for now, that’s the post.
Feedback of any kind is much appreciated.