I think this line of research is interesting. I really like the core concept of abstraction as summarizing the information that’s relevant ‘far away’.
A few thoughts:
- For a common human abstraction to be mostly recoverable as a ‘natural’ abstraction, it must depend mostly on the thing it is trying to abstract, and not e.g. evolutionary or cultural history, or biological implementation. This seems more plausible for ‘trees’ than it does for ‘justice’. There may be natural game-theoretic abstractions related to justice, but I’d expect human concepts and behaviors around justice to depend also in important ways on e.g. our innate social instincts. Innate instincts and drives seem likely to a) be complex (high-information) and b) depend on our whole evolutionary history, which is itself presumably path-dependent and chaotic, so I wouldn’t expect this to be just a choice among a small number of natural alternatives.
An (imperfect) way of reframing this project is as an attempt to learn human concepts mostly from the things that are causally upstream of their existance, minimizing the need for things that are downstream (e.g. human feedback), and making the assumption that the only important/high-information thing upstream is the (natural concept of the) human concept’s referent.
- If an otherwise unnatural abstraction is used by sufficiently influential agents, this can cause the abstraction to become ‘natural’, in the sense of being important to predict things ‘far away’.
- What happens when a low-dimensional summary is still too high dimensional for the human / agent to reason about? I conjecture that values might be most important here. An analogy: optimal lossless compression doesn’t depend on your utility function, but optimal lossy compression does. Concepts that are operating in this regime may be less unique. (For that matter, from a more continuous perspective: given `n` bits to summarize a system, how much of the relevance ‘far way’ can we capture as a function of `n`? What is the shape of this curve—is it self similar, or discrete regimes, or? If there are indeed different discrete regimes, what happens in each of them?)
- I think there is a connection to instrumental convergence, roughly along the lines of ‘most utility functions care about the same aspects of most systems’.
Overall, I’m more optimistic about approaches that rely on some human concepts being natural, vs. all of them. Intuitively, I do feel like there should be some amount of naturalness that can help with the ‘put a strawberry on a plate’ problem (and maybe even the ‘without wrecking anything else’ part).
Great comment, you’re hitting a bunch of interesting points.
For a common human abstraction to be mostly recoverable as a ‘natural’ abstraction, it must depend mostly on the thing it is trying to abstract, and not e.g. evolutionary or cultural history, or biological implementation. …
A few notes on this.
First, what natural abstractions we use will clearly depend at least somewhat on the specific needs of humans. A prehistoric tribe of humans living on an island near the equator will probably never encounter snow, and never use that natural abstraction.
My claim, for these cases, is that the space of natural abstractions is (approximately) discrete. Discreteness says that there is no natural abstraction “arbitrarily close” to another natural abstraction—so, if we can “point to” a particular natural abstraction in a close-enough way, then there’s no ambiguity about which abstraction we’re pointing to. This does not mean that all minds use all abstractions. But it means that if a mind does use a natural abstraction, then there’s no ambiguity about which abstraction they’re using.
One concrete consequence of this: one human can figure out what another human means by a particular word without an exponentially massive number of examples. The only way that’s possible is if the space of potential-word-meanings is much smaller than e.g. the space of configurations of a mole of atoms. Natural abstractions give a natural way for that to work.
Of course, in order for that to work, both humans must already be using the relevant abstraction—e.g. if one of them has no concept of snow, then it won’t work for the word “snow”. But the claim is that we won’t have a situation where two people have intuitive notions of snow which are arbitrarily close, yet different. (People could still give arbitrarily-close-but-different verbal definitions of snow, but definitions are not how our brain actually represents word-meanings at the intuitive level. People could also use more-or-less fine-grained abstractions, like eskimos having 17 notions of snow, but those finer-grained abstractions will still be unambiguous.)
If an otherwise unnatural abstraction is used by sufficiently influential agents, this can cause the abstraction to become ‘natural’, in the sense of being important to predict things ‘far away’.
Yes! This can also happen even without agents: if the earth were destroyed and all that remained were one tree, much of the tree’s genetic sequence would not be predictive of anything far away, and therefore not a natural abstraction. But so long as there are lots of genetically-similar trees, “tree-like DNA sequence” could be a natural abstraction.
This is also an example of a summary too large for the human brain. Key thing to notice: we can recognize that a low-dimensional summary exists, talk about it as a concept, and even reason about its properties (e.g. what could we predict from that tree-DNA-sequence-distribution, or how could we estimate the distribution), without actually computing the summary. We get an unambiguous “pointer”, even if we don’t actually “follow the pointer”.
Another consequence of this idea that we don’t need to represent the abstraction explicitly: we can learn things about abstractions. For instance, at some point people looked at wood under a microscope and learned that it’s made of cells. They did not respond to this by saying “ah, this is not a tree because trees are not made of cells; I will call it a cell-tree and infer that most of the things I thought were trees were in fact cell-trees”.
I think there is a connection to instrumental convergence, roughly along the lines of ‘most utility functions care about the same aspects of most systems’.
Exactly right. The intuitive idea is: natural abstractions are exactly the information which is relevant to many different things in many different places. Therefore, that’s exactly the information which is likely to be relevant to whatever any particular agent cares about.
Figuring out the classes of systems which learn roughly-the-same natural abstractions is one leg of this project.
I think this line of research is interesting. I really like the core concept of abstraction as summarizing the information that’s relevant ‘far away’.
A few thoughts:
- For a common human abstraction to be mostly recoverable as a ‘natural’ abstraction, it must depend mostly on the thing it is trying to abstract, and not e.g. evolutionary or cultural history, or biological implementation. This seems more plausible for ‘trees’ than it does for ‘justice’. There may be natural game-theoretic abstractions related to justice, but I’d expect human concepts and behaviors around justice to depend also in important ways on e.g. our innate social instincts. Innate instincts and drives seem likely to a) be complex (high-information) and b) depend on our whole evolutionary history, which is itself presumably path-dependent and chaotic, so I wouldn’t expect this to be just a choice among a small number of natural alternatives.
An (imperfect) way of reframing this project is as an attempt to learn human concepts mostly from the things that are causally upstream of their existance, minimizing the need for things that are downstream (e.g. human feedback), and making the assumption that the only important/high-information thing upstream is the (natural concept of the) human concept’s referent.
- If an otherwise unnatural abstraction is used by sufficiently influential agents, this can cause the abstraction to become ‘natural’, in the sense of being important to predict things ‘far away’.
- What happens when a low-dimensional summary is still too high dimensional for the human / agent to reason about? I conjecture that values might be most important here. An analogy: optimal lossless compression doesn’t depend on your utility function, but optimal lossy compression does. Concepts that are operating in this regime may be less unique. (For that matter, from a more continuous perspective: given `n` bits to summarize a system, how much of the relevance ‘far way’ can we capture as a function of `n`? What is the shape of this curve—is it self similar, or discrete regimes, or? If there are indeed different discrete regimes, what happens in each of them?)
- I think there is a connection to instrumental convergence, roughly along the lines of ‘most utility functions care about the same aspects of most systems’.
Overall, I’m more optimistic about approaches that rely on some human concepts being natural, vs. all of them. Intuitively, I do feel like there should be some amount of naturalness that can help with the ‘put a strawberry on a plate’ problem (and maybe even the ‘without wrecking anything else’ part).
Great comment, you’re hitting a bunch of interesting points.
A few notes on this.
First, what natural abstractions we use will clearly depend at least somewhat on the specific needs of humans. A prehistoric tribe of humans living on an island near the equator will probably never encounter snow, and never use that natural abstraction.
My claim, for these cases, is that the space of natural abstractions is (approximately) discrete. Discreteness says that there is no natural abstraction “arbitrarily close” to another natural abstraction—so, if we can “point to” a particular natural abstraction in a close-enough way, then there’s no ambiguity about which abstraction we’re pointing to. This does not mean that all minds use all abstractions. But it means that if a mind does use a natural abstraction, then there’s no ambiguity about which abstraction they’re using.
One concrete consequence of this: one human can figure out what another human means by a particular word without an exponentially massive number of examples. The only way that’s possible is if the space of potential-word-meanings is much smaller than e.g. the space of configurations of a mole of atoms. Natural abstractions give a natural way for that to work.
Of course, in order for that to work, both humans must already be using the relevant abstraction—e.g. if one of them has no concept of snow, then it won’t work for the word “snow”. But the claim is that we won’t have a situation where two people have intuitive notions of snow which are arbitrarily close, yet different. (People could still give arbitrarily-close-but-different verbal definitions of snow, but definitions are not how our brain actually represents word-meanings at the intuitive level. People could also use more-or-less fine-grained abstractions, like eskimos having 17 notions of snow, but those finer-grained abstractions will still be unambiguous.)
Yes! This can also happen even without agents: if the earth were destroyed and all that remained were one tree, much of the tree’s genetic sequence would not be predictive of anything far away, and therefore not a natural abstraction. But so long as there are lots of genetically-similar trees, “tree-like DNA sequence” could be a natural abstraction.
This is also an example of a summary too large for the human brain. Key thing to notice: we can recognize that a low-dimensional summary exists, talk about it as a concept, and even reason about its properties (e.g. what could we predict from that tree-DNA-sequence-distribution, or how could we estimate the distribution), without actually computing the summary. We get an unambiguous “pointer”, even if we don’t actually “follow the pointer”.
Another consequence of this idea that we don’t need to represent the abstraction explicitly: we can learn things about abstractions. For instance, at some point people looked at wood under a microscope and learned that it’s made of cells. They did not respond to this by saying “ah, this is not a tree because trees are not made of cells; I will call it a cell-tree and infer that most of the things I thought were trees were in fact cell-trees”.
Exactly right. The intuitive idea is: natural abstractions are exactly the information which is relevant to many different things in many different places. Therefore, that’s exactly the information which is likely to be relevant to whatever any particular agent cares about.
Figuring out the classes of systems which learn roughly-the-same natural abstractions is one leg of this project.