There’s an alignment-related problem: how do we make an AI care about causes of a particular sensory pattern? What are “causes” of a particular sensory pattern in the first place? You want the AI to differentiate between “putting a real strawberry on a plate” and “creating a perfect illusion of a strawberry on a plate”, but what’s the difference between doing real things and creating perfect illusions, in general?
I have a general answer to those questions. My answer is very unfinished. Also it isn’t mathematical, it’s philosophical in nature. But I believe it’s important anyway. Because there’s not a lot of philosophical or non-philosophical ideas about the questions above. With questions like these you don’t know where to even start thinking, so it’s hard to imagine even a bad answer.
Obvious observations
Observation 1. Imagine you come up with a model which perfectly predicts your sensory experience (Predictor). Just having this model is not enough to understand causes of a particular sensory pattern, i.e. differentiate between stuff like “putting a real strawberry on a plate” and “creating a perfect illusion of a strawberry on a plate”.
Observation 2. Not every Predictor has variables which correspond to causes of a particular sensory pattern. Not every Predictor can be used to easily derive something corresponding to causes of a particular sensory pattern. For example, some Predictors might make predictions by simulating a large universe with a superintelligent civilization inside which predicts your sensory experiences. See “Transparent priors”.
The solution
So, what are causes of a particular sensory pattern?
“Recursive Sensory Models” (RSMs).
I’ll explain what an RSM is and provide various examples.
What is a Recursive Sensory Model?
An RSM is a sequence of N models (Model 1, Model 2, …, Model N) for which the following two conditions hold true:
Model (K + 1) is good at predicting more aspects of sensory experience than Model (K). Model (K + 2) is good at predicting more aspects than Model (K + 1). And so on.
Model 1 can be transformed into any of the other models according to special transformation rules. Those rules are supposed to be simple. But I can’t give a fully general description of those rules. That’s one of the biggest unfinished parts of my idea.
The second bullet point is kinda the most important one, but it’s very underspecified. So you can only get a feel for it through looking at specific examples.
Core claim: when the two conditions hold true, the RSM contains easily identifiable “causes” of particular sensory patterns. The two conditions are necessary and sufficient for the existence of such “causes”. The universe contains “causes” of particular sensory patterns to the extent to which statistical laws describing the patterns also describe deeper laws of the universe.
Example: object permanence
Imagine you’re looking at a landscape with trees, lakes and mountains. You notice that none of those objects disappear.
It seems like a good model: “most objects in the 2D space of my vision don’t disappear”. (Model 1)
But it’s not perfect. When you close your eyes, the landscape does disappear. When you look at your feet, the landscape does disappear.
So you come up with a new model: “there is some 3D space with objects; the space and the objects are independent from my sensory experience; most of the objects don’t disappear”. (Model 2)
Model 2 is better at predicting the whole of your sensory experience.
However, note that the “mathematical ontology” of both models is almost identical. (Both models describe spaces whose points can be occupied by something.) They’re just applied to slightly different things. That’s why “recursion” is in the name of Recursive Sensory Models: an RSM reveals similarities between different layers of reality. As if reality is a fractal.
Intuitively, Model 2 describes “causes” (real trees, lakes and mountains) of sensory patterns (visions of trees, lakes and mountains).
Example: reductionism
You notice that most visible objects move smoothly (don’t disappear, don’t teleport).
“Most visible objects move smoothly in a 2D/3D space” is a good model for predicting sensory experience. (Model 1)
But there’s a model which is even better: “visible objects consist of smaller and invisible/less visible objects (cells, molecules, atoms) which move smoothly in a 2D/3D space”. (Model 2)
However, note that the mathematical ontology of both models is almost identical.
Intuitively, Model 2 describes “causes” (atoms) of sensory patterns (visible objects).
Example: a scale model
Imagine you’re alone in a field with rocks of different size and a scale model of the whole environment. You’ve already learned object permanence.
“Objects don’t move in space unless I push them” is a good model for predicting sensory experience. (Model 1)
But it has a little flaw. When you push a rock, the corresponding rock in the scale model moves too. And vice-versa.
“Objects don’t move in space unless I push them; there’s a simple correspondence between objects in the field and objects in the scale model” is a better model for predicting sensory experience. (Model 2)
However, note that the mathematical ontology of both models is identical.
Intuitively, Model 2 describes a “cause” (the scale model) of sensory patterns (rocks of different size being at certain positions). Though you can reverse the cause and effect here.
Example: empathy
If you put your hand on a hot stove, you quickly move the hand away. Because it’s painful and you don’t like pain. This is a great model (Model 1) for predicting your own movements near a hot stove.
But why do other people avoid hot stoves? If another person touches a hot stove, pain isn’t instantiated in your sensory experience.
Behavior of other people can be predicted with this model: “people have similar sensory experience and preferences, inaccessible to each other”. (Model 2)
However, note that the mathematical ontology of both models is identical.
Intuitively, Model 2 describes a “cause” (inaccessible sensory experience) of sensory patterns (other people avoiding hot stoves).
Counterexample: a chaotic universe
Imagine yourself in a universe where your sensory experience is produced by very simple, but very chaotic laws. Despite the chaos, your sensory experience contains some simple, relatively stable patterns. Purely by accident.
In such universe, RSMs might not find any “causes” underlying particular sensory patterns (except the simple chaotic laws).
Draft of a future post, any feedback is welcome. Continuation of a thought from this shortform post.
(picture: https://en.wikipedia.org/wiki/Drawing_Hands)
The problem
There’s an alignment-related problem: how do we make an AI care about causes of a particular sensory pattern? What are “causes” of a particular sensory pattern in the first place? You want the AI to differentiate between “putting a real strawberry on a plate” and “creating a perfect illusion of a strawberry on a plate”, but what’s the difference between doing real things and creating perfect illusions, in general?
(Relevant topics: environmental goals; identifying causal goal concepts from sensory data; “look where I’m pointing, not at my finger”; Pointers Problem; Eliciting Latent Knowledge; symbol grounding problem; ontology identification problem.)
I have a general answer to those questions. My answer is very unfinished. Also it isn’t mathematical, it’s philosophical in nature. But I believe it’s important anyway. Because there’s not a lot of philosophical or non-philosophical ideas about the questions above. With questions like these you don’t know where to even start thinking, so it’s hard to imagine even a bad answer.
Obvious observations
Observation 1. Imagine you come up with a model which perfectly predicts your sensory experience (Predictor). Just having this model is not enough to understand causes of a particular sensory pattern, i.e. differentiate between stuff like “putting a real strawberry on a plate” and “creating a perfect illusion of a strawberry on a plate”.
Observation 2. Not every Predictor has variables which correspond to causes of a particular sensory pattern. Not every Predictor can be used to easily derive something corresponding to causes of a particular sensory pattern. For example, some Predictors might make predictions by simulating a large universe with a superintelligent civilization inside which predicts your sensory experiences. See “Transparent priors”.
The solution
So, what are causes of a particular sensory pattern?
“Recursive Sensory Models” (RSMs).
I’ll explain what an RSM is and provide various examples.
What is a Recursive Sensory Model?
An RSM is a sequence of N models (Model 1, Model 2, …, Model N) for which the following two conditions hold true:
Model (K + 1) is good at predicting more aspects of sensory experience than Model (K). Model (K + 2) is good at predicting more aspects than Model (K + 1). And so on.
Model 1 can be transformed into any of the other models according to special transformation rules. Those rules are supposed to be simple. But I can’t give a fully general description of those rules. That’s one of the biggest unfinished parts of my idea.
The second bullet point is kinda the most important one, but it’s very underspecified. So you can only get a feel for it through looking at specific examples.
Core claim: when the two conditions hold true, the RSM contains easily identifiable “causes” of particular sensory patterns. The two conditions are necessary and sufficient for the existence of such “causes”. The universe contains “causes” of particular sensory patterns to the extent to which statistical laws describing the patterns also describe deeper laws of the universe.
Example: object permanence
Imagine you’re looking at a landscape with trees, lakes and mountains. You notice that none of those objects disappear.
It seems like a good model: “most objects in the 2D space of my vision don’t disappear”. (Model 1)
But it’s not perfect. When you close your eyes, the landscape does disappear. When you look at your feet, the landscape does disappear.
So you come up with a new model: “there is some 3D space with objects; the space and the objects are independent from my sensory experience; most of the objects don’t disappear”. (Model 2)
Model 2 is better at predicting the whole of your sensory experience.
However, note that the “mathematical ontology” of both models is almost identical. (Both models describe spaces whose points can be occupied by something.) They’re just applied to slightly different things. That’s why “recursion” is in the name of Recursive Sensory Models: an RSM reveals similarities between different layers of reality. As if reality is a fractal.
Intuitively, Model 2 describes “causes” (real trees, lakes and mountains) of sensory patterns (visions of trees, lakes and mountains).
Example: reductionism
You notice that most visible objects move smoothly (don’t disappear, don’t teleport).
“Most visible objects move smoothly in a 2D/3D space” is a good model for predicting sensory experience. (Model 1)
But there’s a model which is even better: “visible objects consist of smaller and invisible/less visible objects (cells, molecules, atoms) which move smoothly in a 2D/3D space”. (Model 2)
However, note that the mathematical ontology of both models is almost identical.
Intuitively, Model 2 describes “causes” (atoms) of sensory patterns (visible objects).
Example: a scale model
Imagine you’re alone in a field with rocks of different size and a scale model of the whole environment. You’ve already learned object permanence.
“Objects don’t move in space unless I push them” is a good model for predicting sensory experience. (Model 1)
But it has a little flaw. When you push a rock, the corresponding rock in the scale model moves too. And vice-versa.
“Objects don’t move in space unless I push them; there’s a simple correspondence between objects in the field and objects in the scale model” is a better model for predicting sensory experience. (Model 2)
However, note that the mathematical ontology of both models is identical.
Intuitively, Model 2 describes a “cause” (the scale model) of sensory patterns (rocks of different size being at certain positions). Though you can reverse the cause and effect here.
Example: empathy
If you put your hand on a hot stove, you quickly move the hand away. Because it’s painful and you don’t like pain. This is a great model (Model 1) for predicting your own movements near a hot stove.
But why do other people avoid hot stoves? If another person touches a hot stove, pain isn’t instantiated in your sensory experience.
Behavior of other people can be predicted with this model: “people have similar sensory experience and preferences, inaccessible to each other”. (Model 2)
However, note that the mathematical ontology of both models is identical.
Intuitively, Model 2 describes a “cause” (inaccessible sensory experience) of sensory patterns (other people avoiding hot stoves).
Counterexample: a chaotic universe
Imagine yourself in a universe where your sensory experience is produced by very simple, but very chaotic laws. Despite the chaos, your sensory experience contains some simple, relatively stable patterns. Purely by accident.
In such universe, RSMs might not find any “causes” underlying particular sensory patterns (except the simple chaotic laws).
But in such case there are probably no “causes”.