I think that “epistemic rationality” matches very well with what I am thinking of as level 3, which is my notion of intelligence. It is indeed applicable to non-agentic systems. I am still thinking about whether to include meta-learning (referring to updating level 3 algorithms based on experience) and meta-processes above that in my concept of intelligence.
Would this layer of meta-learning be part of epistemic rationality, do you think? It becomes particularly relevant if the system is resource constrained and has to prioritize what to learn about, and/or cares about efficiency of learning. These constraints feel a bit less natural to introduce for a non-agentic system, other than if said system is set up by an agentic system for some purpose.
In any case, instrumental rationality does not seem to cover all that I mean with Competence, perhaps something narrower, like “cognitive competence”. I find it a bit difficult to systematically distinguish between cognitive competence and noncognitive competence, because the cognitive part of the system is also implemented by its embodiment—and there are various correlations between events in “morphospace” and “cognitive space”. One way of resolving that might be to distinguish between on-surface properties of a markov blanket (corresponding to the system interfacing with its environment) and within-surface properties of that markov blanket (corresponding to integrated regulation systems and the cognition). There will still be feedback loops between those properties, so our mileage on obtaining clean distinctions into different competencies may vary. If you are interested in some more thoughts on that, you can check out my post on extended embodiment.
An independent idea is that it is apparently possible to divide “competence” or “instrumental rationality” into two independent axes: Generality, and intelligence proper (or perhaps: competence proper).
I have already commented a bit on meta-learning above, per default my level 3 would refer just to online learning, but I am thinking of including different levels of meta-learning because of the algorithmic similarities. Perhaps interestingly for you, I consider one of the primary purposes of meta-learning to refine a generally intelligent system into a more narrowly intelligent system, by improving its learning capabilities for a particular set of domains, in some sense biasing the cognition towards the kind of environment it seems to operate within (i.e. in terms of hypothesis generation, or which kinds of functions to use when approximating the behavior of an observed sequence).
Of course, unless the system loses its meta-learning capability, it will be able to respond to changes in its environment by re-aiming/updating its learning tendencies over time—so it is technically general if you give it some time, but ends up converging towards beneficial specialisation.
I think of generality of intelligence as relatively conceptually trivial. At the end of the day, a system is given a sequence of data via observation, and is now tasked with finding a function or set of functions that both corresponds to plausible transition rules of the given sequence, and has a reasonably high chance of correctly predicting the next element of the sequence (which is easy to train for by hiding later elements of the sequence from the modeling process and sequentially introducing them to test and potentially update the fit of the model). Computationally speaking, the set of total atomic functions that you would have to consider in order to be able to compositionally construct arbitrary transition rules for sequences of discrete data packages, is very small. The only mathematical requirement is Turing universality—basically the entire difficulty arises due to resource constraints.
This seems to match with your thoughts about the appearance of greater generality simply due to more processing power. A cognitive system that is provided with more processing power could use that either to search more deeply those regions of causal models that it naturally considers, or it could branch out to consider new regions within model-space. Many brains in the animal kingdom seem to implement a sort of limited generative simulation of their environment, so that could be considered as a fairly general problem domain.
I could try to write more on this, but I am curious what you think about this so far and if I come across as reasonably clear.
So regarding things that involve active prioritizing of compute resources, I think that would fairly clearly fall no longer under epistemic rationality. Because “spending compute resources on this rather than that” is an action, which are only part of instrumental rationality. So in that sense it wouldn’t be part of intelligence. Which makes some sense given that intuitively smart people often concentrate their mental efforts on things that are not necessarily very useful to them.
This relates also to what you write about level 1 and 2 compared to level 3. In the first two cases you mention actions, but not in the third. Which makes sense if level 3 is about epistemic rationality. Assuming level 1 and 2 are about instrumental rationality then, this would be an interesting difference to my previous conceptualization: On my picture, epistemic rationality was a necessary but not sufficient condition for instrumental rationality, but on your picture, instead level 1 and 2 (~instrumental rationality) are a necessary but not sufficient condition for level 3 (~epistemic rationality). I’m not sure what we can conclude from these inversed pictures.
I think of generality of intelligence as relatively conceptually trivial. At the end of the day, a system is given a sequence of data via observation, and is now tasked with finding a function or set of functions that both corresponds to plausible transition rules of the given sequence, and has a reasonably high chance of correctly predicting the next element of the sequence
Okay, but terminology-wise I wouldn’t describe this as generality. Because the narrow/general axis seems to have more to to with instrumental rationality / competence than with epistemic rationality / intelligence. The latter can be described as a form of prediction, or building causal models / a world model. But generality seems to be more about what a system can do overall in terms of actions. GPT-4 may have a quite advanced world model, but at its heart it only imitates Internet text, and doesn’t do so in real time, so it can hardly be used for robotics. So I would describe it as a less general system than most animals, though more general than a Go AI.
Regarding an overall model of cognition, a core part that describes epistemic rationality seems to be captured well by a theory called predictive coding or predictive processing. Scott Alexander has an interesting article about it. It’s originally a theory from neuroscience, but Yann LeCun also sees it as a core part of his model of cognition. The model is described here on pages 6 to 9. Predictive coding is responsible for the part of cognition that he calls the world model.
Basically, predictive coding is the theory that an agent constantly does self-supervised learning (SSL) on sensory data (real-time / online) by continuously predicting its experiences and continuously updating the world-model depending on whether those predictions were correct. This creates a world model, which is the basis for the other abilities of the agent, like creating and executing action plans. LeCun calls the background knowledge created by this type of predictive coding the “dark matter” of intelligence, because it includes fundamental common sense knowledge, like intuitive physics.
The current problem is that currently self-supervised learning only really works for text (in LLMs), but not yet properly for things like video. Basically the difference is that with text we have a relatively small number of discrete tokens with quite low redundancy, while for sensory inputs we have basically continuous data with a very large amount of redundancy. It makes no computational sense to predict probabilities of individual frames of video data like it makes sense for an LLM to “predict” probabilities for the next text token. Currently LeCun tries to make SSL work for these types of sensory data by using his “Joint Embedding Predictive Architecture” (JEPA), described in the paper above.
To the extent that creating a world model is handled by predictive coding, and if we call the ability to create accurate world models “epistemic rationality” or “intelligence”, we seem to have a pretty good grasp of what we are talking about. (Even though we don’t yet have a working implementation of predictive coding, like JEPA.)
But if we talk about a general theory of cognition/competence/instrumental rationality, the picture is much less clear. All we have is things like LeCun’s very coarse model of cognition (pages 6ff in the paper above), or completely abstract models like AIXI. So there is a big gap in understanding what the cognition of a competent agent even looks like.
I think that “epistemic rationality” matches very well with what I am thinking of as level 3, which is my notion of intelligence. It is indeed applicable to non-agentic systems.
I am still thinking about whether to include meta-learning (referring to updating level 3 algorithms based on experience) and meta-processes above that in my concept of intelligence.
Would this layer of meta-learning be part of epistemic rationality, do you think? It becomes particularly relevant if the system is resource constrained and has to prioritize what to learn about, and/or cares about efficiency of learning. These constraints feel a bit less natural to introduce for a non-agentic system, other than if said system is set up by an agentic system for some purpose.
In any case, instrumental rationality does not seem to cover all that I mean with Competence, perhaps something narrower, like “cognitive competence”. I find it a bit difficult to systematically distinguish between cognitive competence and noncognitive competence, because the cognitive part of the system is also implemented by its embodiment—and there are various correlations between events in “morphospace” and “cognitive space”.
One way of resolving that might be to distinguish between on-surface properties of a markov blanket (corresponding to the system interfacing with its environment) and within-surface properties of that markov blanket (corresponding to integrated regulation systems and the cognition). There will still be feedback loops between those properties, so our mileage on obtaining clean distinctions into different competencies may vary.
If you are interested in some more thoughts on that, you can check out my post on extended embodiment.
I have already commented a bit on meta-learning above, per default my level 3 would refer just to online learning, but I am thinking of including different levels of meta-learning because of the algorithmic similarities.
Perhaps interestingly for you, I consider one of the primary purposes of meta-learning to refine a generally intelligent system into a more narrowly intelligent system, by improving its learning capabilities for a particular set of domains, in some sense biasing the cognition towards the kind of environment it seems to operate within (i.e. in terms of hypothesis generation, or which kinds of functions to use when approximating the behavior of an observed sequence).
Of course, unless the system loses its meta-learning capability, it will be able to respond to changes in its environment by re-aiming/updating its learning tendencies over time—so it is technically general if you give it some time, but ends up converging towards beneficial specialisation.
I think of generality of intelligence as relatively conceptually trivial. At the end of the day, a system is given a sequence of data via observation, and is now tasked with finding a function or set of functions that both corresponds to plausible transition rules of the given sequence, and has a reasonably high chance of correctly predicting the next element of the sequence (which is easy to train for by hiding later elements of the sequence from the modeling process and sequentially introducing them to test and potentially update the fit of the model).
Computationally speaking, the set of total atomic functions that you would have to consider in order to be able to compositionally construct arbitrary transition rules for sequences of discrete data packages, is very small. The only mathematical requirement is Turing universality—basically the entire difficulty arises due to resource constraints.
This seems to match with your thoughts about the appearance of greater generality simply due to more processing power. A cognitive system that is provided with more processing power could use that either to search more deeply those regions of causal models that it naturally considers, or it could branch out to consider new regions within model-space. Many brains in the animal kingdom seem to implement a sort of limited generative simulation of their environment, so that could be considered as a fairly general problem domain.
I could try to write more on this, but I am curious what you think about this so far and if I come across as reasonably clear.
So regarding things that involve active prioritizing of compute resources, I think that would fairly clearly fall no longer under epistemic rationality. Because “spending compute resources on this rather than that” is an action, which are only part of instrumental rationality. So in that sense it wouldn’t be part of intelligence. Which makes some sense given that intuitively smart people often concentrate their mental efforts on things that are not necessarily very useful to them.
This relates also to what you write about level 1 and 2 compared to level 3. In the first two cases you mention actions, but not in the third. Which makes sense if level 3 is about epistemic rationality. Assuming level 1 and 2 are about instrumental rationality then, this would be an interesting difference to my previous conceptualization: On my picture, epistemic rationality was a necessary but not sufficient condition for instrumental rationality, but on your picture, instead level 1 and 2 (~instrumental rationality) are a necessary but not sufficient condition for level 3 (~epistemic rationality). I’m not sure what we can conclude from these inversed pictures.
Okay, but terminology-wise I wouldn’t describe this as generality. Because the narrow/general axis seems to have more to to with instrumental rationality / competence than with epistemic rationality / intelligence. The latter can be described as a form of prediction, or building causal models / a world model. But generality seems to be more about what a system can do overall in terms of actions. GPT-4 may have a quite advanced world model, but at its heart it only imitates Internet text, and doesn’t do so in real time, so it can hardly be used for robotics. So I would describe it as a less general system than most animals, though more general than a Go AI.
Regarding an overall model of cognition, a core part that describes epistemic rationality seems to be captured well by a theory called predictive coding or predictive processing. Scott Alexander has an interesting article about it. It’s originally a theory from neuroscience, but Yann LeCun also sees it as a core part of his model of cognition. The model is described here on pages 6 to 9. Predictive coding is responsible for the part of cognition that he calls the world model.
Basically, predictive coding is the theory that an agent constantly does self-supervised learning (SSL) on sensory data (real-time / online) by continuously predicting its experiences and continuously updating the world-model depending on whether those predictions were correct. This creates a world model, which is the basis for the other abilities of the agent, like creating and executing action plans. LeCun calls the background knowledge created by this type of predictive coding the “dark matter” of intelligence, because it includes fundamental common sense knowledge, like intuitive physics.
The current problem is that currently self-supervised learning only really works for text (in LLMs), but not yet properly for things like video. Basically the difference is that with text we have a relatively small number of discrete tokens with quite low redundancy, while for sensory inputs we have basically continuous data with a very large amount of redundancy. It makes no computational sense to predict probabilities of individual frames of video data like it makes sense for an LLM to “predict” probabilities for the next text token. Currently LeCun tries to make SSL work for these types of sensory data by using his “Joint Embedding Predictive Architecture” (JEPA), described in the paper above.
To the extent that creating a world model is handled by predictive coding, and if we call the ability to create accurate world models “epistemic rationality” or “intelligence”, we seem to have a pretty good grasp of what we are talking about. (Even though we don’t yet have a working implementation of predictive coding, like JEPA.)
But if we talk about a general theory of cognition/competence/instrumental rationality, the picture is much less clear. All we have is things like LeCun’s very coarse model of cognition (pages 6ff in the paper above), or completely abstract models like AIXI. So there is a big gap in understanding what the cognition of a competent agent even looks like.