The number of possible patterns in an information cluster is superexponential with the size of the information cluster
Firstly, you are misquoting EY’s post: the possible number of patterns in a string grows exponentially with the number of bits, as expected. It is the number of ‘concepts’ which grows super-exponentially, where EY is defining concept very loosely as any program which classifies patterns. The super-exponential growth in concepts is combinatoric and just stems from naive specific classifiers which recognize combinations of specific patterns.
Secondly, this doesn’t really relate to universal pattern recognition, which is concerned only with optimal data classifications according to a criteria such as entropy maximization.
As a simple example, consider the set of binary strings of length N. There are 2^N possible observable strings, and a super-exponential combinatoric set of naive classifiers. But consider observed data sequences of the form 10010 10010 10010 repeated ad infinitum. Any form of optimal extropy maximization will reduce this to something of the form repeat “10010” indefinitely.
In general any given sequence of observations has a single unique compressed (extropy reduced) representation, which corresponds to it’s fundamental optimal ‘pattern’ representation.
Can you demonstrate that the patterns you’re recognizing are non-arbitrary?
Depends on what you mean. It’s rather trivial to construct simple universal extropy maximizers/optimizers—just survey the basic building blocks of unsupervised learning algorithms. The cortical circuit performs similar computations.
For example the 2D edge patterns that cortical tissue (and any good unsupervised learning algorithm) learns to represent when exposed to real world video are absolutely not arbitrary in the slightest. This should be obvious.
If you mean higher level thought abstractions by “the patterns you’re recognizing”, then the issue becomes more complex. Certainly the patterns we currently recognize at the highest level are not optimal extractions, if that’s what you mean. But nor are they arbitrary. If they were arbitrary our cortex would have no purpose, would confer no selection advantage, and would not exist.
We don’t have a fully general absolute pattern recognizing system;
We do have a fully general pattern recognition system. I’m not sure what you mean by “general absolute”.
that would be an evolutionary hindrance even if it were something that could practically be developed.
They are trivial to construct, and require far less genetic information to specify than specific pattern recognition systems.
Specific recognition systems have the tremendous advantage that they work instantly without any optimization time. A general recognition system has to be slowly trained on the patterns of data present in the observations—this requires time and lots of computation.
Simpler short lived organisms rely more on specific recognition systems and circuitry for this reason as they allow newborn creatures to start with initial ‘pre-programmed’ intelligence. This actual requires considerably more genetic complexity than general learning systems.
Mammals grew larger brains with increasing reliance on general learning/recognition systems because it provides a tremendous flexibility advantage at the cost of requiring larger brains, longer gestation, longer initial development immaturity, etc. In primates and humans especially this trend is maximized. Human infant brains have very little going on initially except powerful general meta-algorithms which will eventually generate specific algorithms in response to the observed environment.
I think we don’t agree on what this “complexity” is because it’s not a natural category
The concept of “natural category” is probably less well defined that “complexity” itself, so it probably won’t shed too much light on our discussion.
That being said, from that post he describes it as:
I’ve chosen the phrase “unnatural category” to describe a category whose boundary you draw in a way that sensitively depends on the exact values built into your utility function.
In that sense complexity is absolutely a natural category.
Look at Kolmogorov_complexity. It is a fundamental computable property of information, and information is the fundamental property of modern physics. So that definition of complexity is as natural as you can get, and is right up there with entropy. Unfortunately that definition itself is not perfect and is too close to entropy, but computable variants of it exist .. .. one used in a computational biology paper I was browsing recently (measuring the tendency towards increased complexity in biological systems) defined complexity as compressed information minus entropy, which may be the best fit to the intuitive concept.
Intuitively I could explain it as follows.
The information complexity of an intelligent system is a measure of the fundamental statistical pattern structure it extracts from it’s environment. If the information it observes is already at maximum entropy (such as pure noise), then it is already maximally compressed, no further extraction is possible, and no learning is possible. At the other extreme if the information observed is extremely uniform (low entropy) then it can be fully described/compressed by extremely simple low complexity programs. A learning system extracts entropy from it’s environment and grows in complexity in proportion.
Depends on what you mean. It’s rather trivial to construct simple universal extropy maximizers/optimizers—just survey the basic building blocks of unsupervised learning algorithms. The cortical circuit performs similar computations.
For example the 2D edge patterns that cortical tissue (and any good unsupervised learning algorithm) learns to represent when exposed to real world video are absolutely not arbitrary in the slightest. This should be obvious.
It’s objective that our responses exist, and they occur in response to particular things. It’s not obvious that they occur in response to natural categories, rather than constructed categories like “sexy.”
We do have a fully general pattern recognition system. I’m not sure what you mean by “general absolute”.
“General absolute” was probably a poor choice of words, but I meant to express a system capable of recognizing all types of patterns in all contexts. There is an absolute, non arbitrary pattern here, do you recognize it?
Kolmogorov complexity is a fundamental character, but it’s not at all clear that we should want a Kolmogorov complexity optimizer acting on our universe, or that Kolmogorov complexity actually has much to do with the “complexity” you’re talking about. A message or system can be high in Kolmogorov complexity without being interesting to us, and it still seems to me that you’re conflating complexity with interestingness when they really don’t bear that sort of relationship.
“General absolute” was probably a poor choice of words, but I meant to express a system capable of recognizing all types of patterns in all contexts. There is an absolute, non arbitrary pattern here, do you recognize it?
I see your meaning—and no practical system is capable of recognizing all types of patterns in all contexts. A universal/general learn algorithm is simply one that can learn to recognize any pattern, given enough time/space/training. That doesn’t mean it will recognize any random pattern it hasn’t already learned.
I see hints of structure in your example but it doesn’t ring any bells.
Kolmogorov complexity is a fundamental character, but it’s not at all clear that we should want a Kolmogorov complexity optimizer acting on our universe
No, and that’s not my primary interest. Complexity seems to be the closest fit for something-important-which-has-been-changing over time on earth. If we had a good way to measure it, we could then make a quantitative model of that change and use that to predict the rate of change in the future, perhaps even ultimately reducing it to physical theory.
For example, one of the interesting new recent physics papers (entropic gravity) proposes that gravity is actually not a fundamental force or even spacetime curvature, but actually an entropic statistical pseudo-force. The paper is interesting because as a side effect it appears to correctly derive the mysterious cosmological constant for acceleration. As an unrelated side note I have an issue with it because it uses the holographic principle/berkenstein bound for information density which still appears to lead to lost-information paradoxes in my mind.
But anyway, if you look at a random patch of space-time, it is always slowly evolving to a higher-entropy state (2nd law), and this may be the main driver of most macroscopic tendencies (even gravity). It’s also quite apparent that a closely related measure—complexity—increases non-linearly in a fashion perhaps loosely like gravitational collapse. The non-linear dynamics are somewhat related—complexity tends to increase in proportion to the existing local complexity as a fraction of available entropy. In some regions this appears to go super-critical, like on earth, where in most places the growth is minuscule or non-existent.
It’s not apparent that complexity is increasing over time. In some respects, things seem to be getting more interesting over time, although I think that a lot of this is due to selective observation, but we don’t have any good reason to believe we’re dealing with a natural category here. If we were dealing with something like Kolmogorov complexity, at least we could know if we were dealing with a real phenomenon, but instead we’re dealing with some ill defined category for which we cannot establish a clear connection to any real physical quality.
For all that you claim that it’s obvious that some fundamental measure of complexity is increasing nonlinearly over time, not a lot of other people are making the same claim, having observed the same data, so it’s clearly not as obvious as all that.
Firstly, you are misquoting EY’s post: the possible number of patterns in a string grows exponentially with the number of bits, as expected. It is the number of ‘concepts’ which grows super-exponentially, where EY is defining concept very loosely as any program which classifies patterns. The super-exponential growth in concepts is combinatoric and just stems from naive specific classifiers which recognize combinations of specific patterns.
Secondly, this doesn’t really relate to universal pattern recognition, which is concerned only with optimal data classifications according to a criteria such as entropy maximization.
As a simple example, consider the set of binary strings of length N. There are 2^N possible observable strings, and a super-exponential combinatoric set of naive classifiers. But consider observed data sequences of the form 10010 10010 10010 repeated ad infinitum. Any form of optimal extropy maximization will reduce this to something of the form repeat “10010” indefinitely.
In general any given sequence of observations has a single unique compressed (extropy reduced) representation, which corresponds to it’s fundamental optimal ‘pattern’ representation.
Depends on what you mean. It’s rather trivial to construct simple universal extropy maximizers/optimizers—just survey the basic building blocks of unsupervised learning algorithms. The cortical circuit performs similar computations.
For example the 2D edge patterns that cortical tissue (and any good unsupervised learning algorithm) learns to represent when exposed to real world video are absolutely not arbitrary in the slightest. This should be obvious.
If you mean higher level thought abstractions by “the patterns you’re recognizing”, then the issue becomes more complex. Certainly the patterns we currently recognize at the highest level are not optimal extractions, if that’s what you mean. But nor are they arbitrary. If they were arbitrary our cortex would have no purpose, would confer no selection advantage, and would not exist.
We do have a fully general pattern recognition system. I’m not sure what you mean by “general absolute”.
They are trivial to construct, and require far less genetic information to specify than specific pattern recognition systems.
Specific recognition systems have the tremendous advantage that they work instantly without any optimization time. A general recognition system has to be slowly trained on the patterns of data present in the observations—this requires time and lots of computation.
Simpler short lived organisms rely more on specific recognition systems and circuitry for this reason as they allow newborn creatures to start with initial ‘pre-programmed’ intelligence. This actual requires considerably more genetic complexity than general learning systems.
Mammals grew larger brains with increasing reliance on general learning/recognition systems because it provides a tremendous flexibility advantage at the cost of requiring larger brains, longer gestation, longer initial development immaturity, etc. In primates and humans especially this trend is maximized. Human infant brains have very little going on initially except powerful general meta-algorithms which will eventually generate specific algorithms in response to the observed environment.
The concept of “natural category” is probably less well defined that “complexity” itself, so it probably won’t shed too much light on our discussion.
That being said, from that post he describes it as:
In that sense complexity is absolutely a natural category.
Look at Kolmogorov_complexity. It is a fundamental computable property of information, and information is the fundamental property of modern physics. So that definition of complexity is as natural as you can get, and is right up there with entropy. Unfortunately that definition itself is not perfect and is too close to entropy, but computable variants of it exist .. .. one used in a computational biology paper I was browsing recently (measuring the tendency towards increased complexity in biological systems) defined complexity as compressed information minus entropy, which may be the best fit to the intuitive concept.
Intuitively I could explain it as follows.
The information complexity of an intelligent system is a measure of the fundamental statistical pattern structure it extracts from it’s environment. If the information it observes is already at maximum entropy (such as pure noise), then it is already maximally compressed, no further extraction is possible, and no learning is possible. At the other extreme if the information observed is extremely uniform (low entropy) then it can be fully described/compressed by extremely simple low complexity programs. A learning system extracts entropy from it’s environment and grows in complexity in proportion.
It’s objective that our responses exist, and they occur in response to particular things. It’s not obvious that they occur in response to natural categories, rather than constructed categories like “sexy.”
“General absolute” was probably a poor choice of words, but I meant to express a system capable of recognizing all types of patterns in all contexts. There is an absolute, non arbitrary pattern here, do you recognize it?
Kolmogorov complexity is a fundamental character, but it’s not at all clear that we should want a Kolmogorov complexity optimizer acting on our universe, or that Kolmogorov complexity actually has much to do with the “complexity” you’re talking about. A message or system can be high in Kolmogorov complexity without being interesting to us, and it still seems to me that you’re conflating complexity with interestingness when they really don’t bear that sort of relationship.
I see your meaning—and no practical system is capable of recognizing all types of patterns in all contexts. A universal/general learn algorithm is simply one that can learn to recognize any pattern, given enough time/space/training. That doesn’t mean it will recognize any random pattern it hasn’t already learned.
I see hints of structure in your example but it doesn’t ring any bells.
No, and that’s not my primary interest. Complexity seems to be the closest fit for something-important-which-has-been-changing over time on earth. If we had a good way to measure it, we could then make a quantitative model of that change and use that to predict the rate of change in the future, perhaps even ultimately reducing it to physical theory.
For example, one of the interesting new recent physics papers (entropic gravity) proposes that gravity is actually not a fundamental force or even spacetime curvature, but actually an entropic statistical pseudo-force. The paper is interesting because as a side effect it appears to correctly derive the mysterious cosmological constant for acceleration. As an unrelated side note I have an issue with it because it uses the holographic principle/berkenstein bound for information density which still appears to lead to lost-information paradoxes in my mind.
But anyway, if you look at a random patch of space-time, it is always slowly evolving to a higher-entropy state (2nd law), and this may be the main driver of most macroscopic tendencies (even gravity). It’s also quite apparent that a closely related measure—complexity—increases non-linearly in a fashion perhaps loosely like gravitational collapse. The non-linear dynamics are somewhat related—complexity tends to increase in proportion to the existing local complexity as a fraction of available entropy. In some regions this appears to go super-critical, like on earth, where in most places the growth is minuscule or non-existent.
It’s not apparent that complexity is increasing over time. In some respects, things seem to be getting more interesting over time, although I think that a lot of this is due to selective observation, but we don’t have any good reason to believe we’re dealing with a natural category here. If we were dealing with something like Kolmogorov complexity, at least we could know if we were dealing with a real phenomenon, but instead we’re dealing with some ill defined category for which we cannot establish a clear connection to any real physical quality.
For all that you claim that it’s obvious that some fundamental measure of complexity is increasing nonlinearly over time, not a lot of other people are making the same claim, having observed the same data, so it’s clearly not as obvious as all that.