As I see it, the core theory of natural abstractions is now 80% nailed down
Question 1: What’s the minimal set of articles one should read to understand this 80%?
Question/Remark 2: AFAICT, your theory has a major missing piece, which is, proving that “abstraction” (formalized according to your way of formalizing it) of is actually a crucial ingredient of learning/cognition. The way I see it, such a proof should be by demonstrating that hypothesis classes defined in terms of probabilistic graph models / abstraction hierarchies can be learned with good sample complexity (and better yet if you can tell something about the computational complexity), in a manner that cannot be achieved if you discard any of the important-according-to-you pieces. You might have some different approach to this, but I’m not sure what it is.
Question/Remark 2: AFAICT, your theory has a major missing piece, which is, proving that “abstraction” (formalized according to your way of formalizing it) of is actually a crucial ingredient of learning/cognition. The way I see it, such a proof should be by demonstrating that hypothesis classes defined in terms of probabilistic graph models / abstraction hierarchies can be learned with good sample complexity (and better yet if you can tell something about the computational complexity), in a manner that cannot be achieved if you discard any of the important-according-to-you pieces. You might have some different approach to this, but I’m not sure what it is.
If we want to show that abstraction is a crucial ingredient of learning/cognition, then “Can we efficiently learn hypothesis classes defined in terms of abstraction hierarchies, as captured by John’s formalism?” is entirely the wrong question. Just because something can be learned efficiently doesn’t mean it’s convergent for a wide variety of cognitive systems. And even if such hypothesis classes couldn’t be learned efficiently in full generality, it would still be possible for a subset of that hypothesis class to be convergent for a wide variety of cognitive systems, in which case general properties of the hypothesis class would still apply to those systems’ cognition.
The question we actually want here is “Is abstraction, as captured by John’s formalism, instrumentally convergent for a wide variety of cognitive systems?”. And that question is indeed not yet definitively answered. The pragmascope itself would largely allow us to answer that question empirically, and I expect the ability to answer it empirically will quickly lead to proofs as well.
Telephone Theorem, Redundancy/Resampling, and Maxent for the math, Chaos for the concepts.
Thank you!
Just because something can be learned efficiently doesn’t mean it’s convergent for a wide variety of cognitive systems.
I believe that the relevant cognitive systems all look like learning algorithms for a prior of certain fairly specific type. I don’t know how this prior looks like, but it’s something very rich on the one hand and efficiently learnable on the other hand. So, if you showed that your formalism naturally produces priors that seem closer to that “holy grail prior”, in terms of richness/efficiency, compared to priors that we already know (e.g. MDPs with small number of states which are not rich enough, or the Solomonoff prior which is both statistically and computationally intractable), that would at least be evidence that you’re going in the right direction.
And even if such hypothesis classes couldn’t be learned efficiently in full generality, it would still be possible for a subset of that hypothesis class to be convergent for a wide variety of cognitive systems, in which case general properties of the hypothesis class would still apply to those systems’ cognition.
Hmm, I’m not sure what would it mean for a subset of a hypothesis class to be “convergent”.
The question we actually want here is “Is abstraction, as captured by John’s formalism, instrumentally convergent for a wide variety of cognitive systems?”.
That’s interesting, but I’m still not sure what it means exactly. Let’s say we take a reinforcement learner which a specific hypothesis class, such all MDPs of certain size, or some family of MDPs with low eluder dimension, or the actual AIXI. How would you determine whether your formalism is “instrumentally convergent” for each of those? Is there a rigorous way to state the question?
Question/Remark 2: AFAICT, your theory has a major missing piece, which is, proving that “abstraction” (formalized according to your way of formalizing it) of is actually a crucial ingredient of learning/cognition. The way I see it, such a proof should be by demonstrating that hypothesis classes defined in terms of probabilistic graph models / abstraction hierarchies can be learned with good sample complexity (and better yet if you can tell something about the computational complexity), in a manner that cannot be achieved if you discard any of the important-according-to-you pieces. You might have some different approach to this, but I’m not sure what it is.
Doesn’t the necessity of abstraction follow from size concerns? The alternative to abstraction would be to measure and simulate everything in full detail, which can only be done if you are “exponentially bigger than the universe” (and have exponentially many universes to learn from).
Question 1: What’s the minimal set of articles one should read to understand this 80%?
Question/Remark 2: AFAICT, your theory has a major missing piece, which is, proving that “abstraction” (formalized according to your way of formalizing it) of is actually a crucial ingredient of learning/cognition. The way I see it, such a proof should be by demonstrating that hypothesis classes defined in terms of probabilistic graph models / abstraction hierarchies can be learned with good sample complexity (and better yet if you can tell something about the computational complexity), in a manner that cannot be achieved if you discard any of the important-according-to-you pieces. You might have some different approach to this, but I’m not sure what it is.
Telephone Theorem, Redundancy/Resampling, and Maxent for the math, Chaos for the concepts.
If we want to show that abstraction is a crucial ingredient of learning/cognition, then “Can we efficiently learn hypothesis classes defined in terms of abstraction hierarchies, as captured by John’s formalism?” is entirely the wrong question. Just because something can be learned efficiently doesn’t mean it’s convergent for a wide variety of cognitive systems. And even if such hypothesis classes couldn’t be learned efficiently in full generality, it would still be possible for a subset of that hypothesis class to be convergent for a wide variety of cognitive systems, in which case general properties of the hypothesis class would still apply to those systems’ cognition.
The question we actually want here is “Is abstraction, as captured by John’s formalism, instrumentally convergent for a wide variety of cognitive systems?”. And that question is indeed not yet definitively answered. The pragmascope itself would largely allow us to answer that question empirically, and I expect the ability to answer it empirically will quickly lead to proofs as well.
Thank you!
I believe that the relevant cognitive systems all look like learning algorithms for a prior of certain fairly specific type. I don’t know how this prior looks like, but it’s something very rich on the one hand and efficiently learnable on the other hand. So, if you showed that your formalism naturally produces priors that seem closer to that “holy grail prior”, in terms of richness/efficiency, compared to priors that we already know (e.g. MDPs with small number of states which are not rich enough, or the Solomonoff prior which is both statistically and computationally intractable), that would at least be evidence that you’re going in the right direction.
Hmm, I’m not sure what would it mean for a subset of a hypothesis class to be “convergent”.
That’s interesting, but I’m still not sure what it means exactly. Let’s say we take a reinforcement learner which a specific hypothesis class, such all MDPs of certain size, or some family of MDPs with low eluder dimension, or the actual AIXI. How would you determine whether your formalism is “instrumentally convergent” for each of those? Is there a rigorous way to state the question?
Doesn’t the necessity of abstraction follow from size concerns? The alternative to abstraction would be to measure and simulate everything in full detail, which can only be done if you are “exponentially bigger than the universe” (and have exponentially many universes to learn from).
One could argue that some kind of abstraction is necessary due to size concerns, but that alone does not necessarily nail down my whole formalism.