I’ve been reading John Wentworth’s posts on the natural abstraction hypothesis recently, and wanted to add some of my own thoughts. These are very much “directions to look in” rather than completed research, but I wanted to share some of the insights central to my model of the NAH, which might be useful for others trying to expand on or simply grok Wentworth’s work.
The Telephone Theorem says that over large causal distances, information not conserved perfectly by each interaction is eventually destroyed. Wentworth provides two arguments, plus empirical evidence, that the limiting distribution should have maximum entropy. (I find this intuitively obvious, even if proving it rigorously and in the most general case is difficult.)
If we know in advance what information will be conserved (e.g. total energy of a system) and we know how this relates to the quantity we want to know the distribution of, then we can just find the maxent distribution subject to that constraint.
Since maxent distributions are usually exponential families, these constraints should correspond to a sufficient statistic for an exponential family.
We can probably say that every sufficient statistic for a distribution corresponds to a set of conserved quantities in the system that gives rise to that distribution.
By Noether’s theorem, conserved quantities correspond to symmetries. (The actual theorem is specific to classical mechanics, but a similar principle seems to hold generally.)
The symmetry model is more natural for dealing with many kinds of abstractions. If we’re pointing to a flower, it’s not obvious how to label information as “conserved” or “not conserved”. We can, however, say what we want to be able to do to the flower and have our algorithm still be able to recognize it: move it, view it from a different angle, let it bloom or wilt, replace all the molecules, &c &c.
This seems isomorphic to the view of abstraction as redundant information, but focused more on different views of the same object rather than different objects from the same class.
Machine learning already makes use of this idea. Many unsupervised representation learning algorithms follow a process like: take an input, distort it in some way that leaves it still recognizable (such as rotating or cropping an image), then train the model to output similar embeddings for the original and distorted version.
If the embeddings have high mutual information when humans would perceive the data points as having high mutual information, then the embedding should contain approximately the information humans find relevant? That’s an interesting hypothesis that might be interesting to formalize.
However, if the NAH is true, maybe an AGI wouldn’t have to see a bunch of examples of (e.g.) rotated images and be told that they’re the same to grasp the concept that things might be rotated? I’m not sure of this, but it seems like to a truly general algorithm, two inputs that have a low-dimensional invertible transformation between them would just be obviously similar.
I’m not entirely sure humans can do this (recognize a pattern they’ve never encountered before given only one example, even if it’s “simple”).
Possibly adult humans have just seen too many different kinds of patterns for there to even be a “simple” one we’ve never seen before.
“Low-dimensional invertible transformation” might be too general to even be computable; in the case of a rotated image, it’s a simple linear transformation, but it’s a linear transformation in coordinate space. It seems like the model’s implicit biases might have to include the fact that this might be a thing; it would be non-obvious to a model used to viewing images as flattened vectors or even convolutions.
“What the heck is a ‘low-dimensional’ transformation anyway?” seems like a good question for further research.
Wentworth’s research mostly deals in sample space; that is, with the probability distribution over entire data points, but finding the true probability distribution of the data is most of the work in machine learning. He talks about latents that render samples conditionally independent, but ML is looking for latents that render features (e.g. image pixels) conditionally independent.
There might be a simple correspondence between these two. Another interesting direction for research.
Abstraction As Symmetry and Other Thoughts
Epistemic status: slightly pruned brainstorming.
I’ve been reading John Wentworth’s posts on the natural abstraction hypothesis recently, and wanted to add some of my own thoughts. These are very much “directions to look in” rather than completed research, but I wanted to share some of the insights central to my model of the NAH, which might be useful for others trying to expand on or simply grok Wentworth’s work.
The Telephone Theorem says that over large causal distances, information not conserved perfectly by each interaction is eventually destroyed. Wentworth provides two arguments, plus empirical evidence, that the limiting distribution should have maximum entropy. (I find this intuitively obvious, even if proving it rigorously and in the most general case is difficult.)
If we know in advance what information will be conserved (e.g. total energy of a system) and we know how this relates to the quantity we want to know the distribution of, then we can just find the maxent distribution subject to that constraint.
Since maxent distributions are usually exponential families, these constraints should correspond to a sufficient statistic for an exponential family.
We can probably say that every sufficient statistic for a distribution corresponds to a set of conserved quantities in the system that gives rise to that distribution.
By Noether’s theorem, conserved quantities correspond to symmetries. (The actual theorem is specific to classical mechanics, but a similar principle seems to hold generally.)
Therefore, sufficient statistics = conserved quantities = symmetries.
The symmetry model is more natural for dealing with many kinds of abstractions. If we’re pointing to a flower, it’s not obvious how to label information as “conserved” or “not conserved”. We can, however, say what we want to be able to do to the flower and have our algorithm still be able to recognize it: move it, view it from a different angle, let it bloom or wilt, replace all the molecules, &c &c.
This seems isomorphic to the view of abstraction as redundant information, but focused more on different views of the same object rather than different objects from the same class.
Machine learning already makes use of this idea. Many unsupervised representation learning algorithms follow a process like: take an input, distort it in some way that leaves it still recognizable (such as rotating or cropping an image), then train the model to output similar embeddings for the original and distorted version.
If the embeddings have high mutual information when humans would perceive the data points as having high mutual information, then the embedding should contain approximately the information humans find relevant? That’s an interesting hypothesis that might be interesting to formalize.
However, if the NAH is true, maybe an AGI wouldn’t have to see a bunch of examples of (e.g.) rotated images and be told that they’re the same to grasp the concept that things might be rotated? I’m not sure of this, but it seems like to a truly general algorithm, two inputs that have a low-dimensional invertible transformation between them would just be obviously similar.
I’m not entirely sure humans can do this (recognize a pattern they’ve never encountered before given only one example, even if it’s “simple”).
Possibly adult humans have just seen too many different kinds of patterns for there to even be a “simple” one we’ve never seen before.
“Low-dimensional invertible transformation” might be too general to even be computable; in the case of a rotated image, it’s a simple linear transformation, but it’s a linear transformation in coordinate space. It seems like the model’s implicit biases might have to include the fact that this might be a thing; it would be non-obvious to a model used to viewing images as flattened vectors or even convolutions.
“What the heck is a ‘low-dimensional’ transformation anyway?” seems like a good question for further research.
Wentworth’s research mostly deals in sample space; that is, with the probability distribution over entire data points, but finding the true probability distribution of the data is most of the work in machine learning. He talks about latents that render samples conditionally independent, but ML is looking for latents that render features (e.g. image pixels) conditionally independent.
There might be a simple correspondence between these two. Another interesting direction for research.