There are three imaginable classes of intelligent agent, not two. (To be clear, I am not suggesting that OP is unaware of this; I’m taking issue with the framing.)
(“Simple”.) Applies some single process to every task that comes along, without any sort of internal adaptation being needed.
(“Universally adaptable”.) Needs special-purpose processes for particular classes of task, but can generate those processes on the fly.
(“Ensemble specialized”.) Has special-purpose processes for particular classes of task, but limited ability to do anything beyond the existing capabilities of those processes.
It seems clear that human intelligence is more #2/#3 than #1. But for many purposes, isn’t the more important distinction between #1/#2 and #3?
For instance, OP says that for a “simple” intelligence we expect improvement in prediction to transfer across domains much better than for an “ensemble” intelligence, but I would expect at least some kinds of improvement to generalize well for a “universally adaptable” intelligence: anything that improves it by making it better at making those special-purpose processes (or by e.g. improving some substrate on which they all run).
“But that only applies to some kinds of improvement!” Yes, but the same goes even for “simple” intelligences. Even if there’s some general process used for everything, many specific applications of that process will depend on specific knowledge, which will often not generalize. E.g., if you get good at playing chess, then whether or not you’re doing it by growing custom hardware or implementing special search procedures part of what you’re doing will be learning about specific configurations of pieces on a chessboard, and that won’t generalize even to similar-ish domains like playing go.
I don’t think I quite buy the argument that simplicity of the best optimizers ~= exploitability of the domain being optimized over. The fuzzy mental image accompanying this not-buying-it is a comparison between two optimization-landscape-families: (1) a consistent broad parabolic maximum with a large amount of varying noise on top of it, and (2) a near-featureless plain with just a bit of varying noise on it, with a hundred very tall sharp spikes. 1 is not very exploitable because of all the noise, but the best you can do will be something nice and simple that models the parabolic peak. 2 is extremely exploitable, but to exploit it well you need to figure out where the individual peaks are and deal with them separately. (This fuzzy mental image should not be taken too literally.)
Our world is simple but omplicated; there are simple principles underlying it, but historical accident and (something like) spontaneous symmetry breaking mean that different bits of it can reflect those simple principles in different ways, and it may be that the best way to deal with that variety is to have neither a single optimization process, nor a fixed ensemble of them, but a general way of learning domain-specific optimizers.
For my Ensemble General Intelligence model, I was mostly imagining #2 instead of #3.
I said of my ensemble general intelligence model:
It could also dynamically generate narrow optimisers on the fly for the problem sets.
General intelligence might be described as an algorithm for picking (a) narrow optimiser(s) to apply to a given problem set (given x examples from said set).
I did not intend to imply that the set of narrow optimisers the general optimiser is selecting from is represented within the agent. I was thinking of a rough mathematical model for how you can describe it.
That there exists a (potentially infinite) set of all possible narrow optimisers a general intelligence might generate/select from, and there exists a function mapping problems sets (given x examples of said set) to narrow optimisers does not imply that any such representation is stored internally in the agent, nor that the agent implements a look up table.
I equivocated between selection and generation. In practice I imagine generation, but the mathematics of selection are easier to reason about.
I imagine that trying to implement ensemble specialised is impractical in the real world because there are too many possible problem sets. I did not at all consider it a potential model of general intelligence.
I might add this clarification when next I’m on my laptop.
It seems to me that the qualm is not about #2 vs #3 as models for humans, but how easily transfer learning happens for the relevant models of general intelligence, and what progress among the class of general intelligence that manifests in our world looks like.
Currently, I think that it’s possible to improve the meta optimisation processes for generating object level optimisation processes, but this doesn’t imply that an improvement to a particular object level optimisation process will transfer across domains.
This is important because improving object level processes and improving meta level processes are different. And improving meta level processes mostly looks like learning a new domain quicker as opposed to improved accuracy in all extant domains. Predictive accuracy still doesn’t transfer across domains the way it would for a simple optimiser.
I can probably make this distinction clearer, elaborate on it more in the OP.
I’ll think on this issue more in the morning.
The section I’m least confident/knowledgeable about is the speculation around applicability of NFL theorems and exploitation of structure/regularity, so I’ll avoid discussing it.
I simply do not think it’s a discussion I can contribute meaningfully to.
Future me with better models of optimisation processes would be able to reason better around it.
If general intelligence was like #3 (Ensemble Intelligence) how would the ability to learn new tasks arise? Who would learn?
I suppose new skills could be hard won after many subjective years of effort, and then transferred via language. Come to think of it, this does resemble how human civilization works. It took hundreds of years for humans to learn how to do math, or engineering, but these skills can be learned in less than 4 years (ie at college).
What distinguishes #2 from #3 is that in #3 you can’t learn (well) to do new tasks that are too far outside the domains covered by your existing modules.
It’s a spectrum, rather than binary. Humans are clearly at least somewhat #2-not-#3, and also I think clearly at least somewhat #3-not-#2. The more #2-not-#3 we are, the more we really qualify as general intelligences.
And yes, human learning can be pretty slow. (Slower than you give it credit for, maybe. To learn to do mathematical research or engineering good enough to make bridges etc. that are reasonably-priced, look OK, and reliably don’t fall down, takes a bunch of what you learn in elementary and high school, plus those 4 years in college, plus further postgraduate work.)
There are three imaginable classes of intelligent agent, not two. (To be clear, I am not suggesting that OP is unaware of this; I’m taking issue with the framing.)
(“Simple”.) Applies some single process to every task that comes along, without any sort of internal adaptation being needed.
(“Universally adaptable”.) Needs special-purpose processes for particular classes of task, but can generate those processes on the fly.
(“Ensemble specialized”.) Has special-purpose processes for particular classes of task, but limited ability to do anything beyond the existing capabilities of those processes.
It seems clear that human intelligence is more #2/#3 than #1. But for many purposes, isn’t the more important distinction between #1/#2 and #3?
For instance, OP says that for a “simple” intelligence we expect improvement in prediction to transfer across domains much better than for an “ensemble” intelligence, but I would expect at least some kinds of improvement to generalize well for a “universally adaptable” intelligence: anything that improves it by making it better at making those special-purpose processes (or by e.g. improving some substrate on which they all run).
“But that only applies to some kinds of improvement!” Yes, but the same goes even for “simple” intelligences. Even if there’s some general process used for everything, many specific applications of that process will depend on specific knowledge, which will often not generalize. E.g., if you get good at playing chess, then whether or not you’re doing it by growing custom hardware or implementing special search procedures part of what you’re doing will be learning about specific configurations of pieces on a chessboard, and that won’t generalize even to similar-ish domains like playing go.
I don’t think I quite buy the argument that simplicity of the best optimizers ~= exploitability of the domain being optimized over. The fuzzy mental image accompanying this not-buying-it is a comparison between two optimization-landscape-families: (1) a consistent broad parabolic maximum with a large amount of varying noise on top of it, and (2) a near-featureless plain with just a bit of varying noise on it, with a hundred very tall sharp spikes. 1 is not very exploitable because of all the noise, but the best you can do will be something nice and simple that models the parabolic peak. 2 is extremely exploitable, but to exploit it well you need to figure out where the individual peaks are and deal with them separately. (This fuzzy mental image should not be taken too literally.)
Our world is simple but omplicated; there are simple principles underlying it, but historical accident and (something like) spontaneous symmetry breaking mean that different bits of it can reflect those simple principles in different ways, and it may be that the best way to deal with that variety is to have neither a single optimization process, nor a fixed ensemble of them, but a general way of learning domain-specific optimizers.
For my Ensemble General Intelligence model, I was mostly imagining #2 instead of #3.
I said of my ensemble general intelligence model:
I did not intend to imply that the set of narrow optimisers the general optimiser is selecting from is represented within the agent. I was thinking of a rough mathematical model for how you can describe it.
That there exists a (potentially infinite) set of all possible narrow optimisers a general intelligence might generate/select from, and there exists a function mapping problems sets (given x examples of said set) to narrow optimisers does not imply that any such representation is stored internally in the agent, nor that the agent implements a look up table.
I equivocated between selection and generation. In practice I imagine generation, but the mathematics of selection are easier to reason about.
I imagine that trying to implement ensemble specialised is impractical in the real world because there are too many possible problem sets. I did not at all consider it a potential model of general intelligence.
I might add this clarification when next I’m on my laptop.
It seems to me that the qualm is not about #2 vs #3 as models for humans, but how easily transfer learning happens for the relevant models of general intelligence, and what progress among the class of general intelligence that manifests in our world looks like.
Currently, I think that it’s possible to improve the meta optimisation processes for generating object level optimisation processes, but this doesn’t imply that an improvement to a particular object level optimisation process will transfer across domains.
This is important because improving object level processes and improving meta level processes are different. And improving meta level processes mostly looks like learning a new domain quicker as opposed to improved accuracy in all extant domains. Predictive accuracy still doesn’t transfer across domains the way it would for a simple optimiser.
I can probably make this distinction clearer, elaborate on it more in the OP.
I’ll think on this issue more in the morning.
The section I’m least confident/knowledgeable about is the speculation around applicability of NFL theorems and exploitation of structure/regularity, so I’ll avoid discussing it.
I simply do not think it’s a discussion I can contribute meaningfully to.
Future me with better models of optimisation processes would be able to reason better around it.
If general intelligence was like #3 (Ensemble Intelligence) how would the ability to learn new tasks arise? Who would learn?
I suppose new skills could be hard won after many subjective years of effort, and then transferred via language. Come to think of it, this does resemble how human civilization works. It took hundreds of years for humans to learn how to do math, or engineering, but these skills can be learned in less than 4 years (ie at college).
What distinguishes #2 from #3 is that in #3 you can’t learn (well) to do new tasks that are too far outside the domains covered by your existing modules.
It’s a spectrum, rather than binary. Humans are clearly at least somewhat #2-not-#3, and also I think clearly at least somewhat #3-not-#2. The more #2-not-#3 we are, the more we really qualify as general intelligences.
And yes, human learning can be pretty slow. (Slower than you give it credit for, maybe. To learn to do mathematical research or engineering good enough to make bridges etc. that are reasonably-priced, look OK, and reliably don’t fall down, takes a bunch of what you learn in elementary and high school, plus those 4 years in college, plus further postgraduate work.)