Ambiguities around ‘intelligence’ often complicate discussions about superintelligence, so it seems good to think about them a little.
Some common concerns: is ‘intelligence’ really a thing? Can intelligence be measured meaningfully as a single dimension? Is intelligence the kind of thing that can characterize a wide variety of systems, or is it only well-defined for things that are much like humans? (Kruel’s interviewees bring up these points several times)
What do we have to assume about intelligence to accept Bostrom’s arguments? For instance, does the claim that we might reach superintelligence by this variety of means require that intelligence be a single ‘thing’?
Related:
Do we see a strong clustering of strategies that work across all the domains we have encountered so far? I see the answer to original question being yes if there is just one large cluster, and no if it turns out there are many fairly orthogonal clusters.
Is robustness against corner cases (idiosyncratic domains) a very important parameter? We certainly treat it as such in our construction of least convenient worlds to break decision theories.
There is a small set of operations (dimension reduction, 2-class categorization, n-class categorization, prediction) and algorithms for them (PCA, SVM, k-means, regression) that work well on a wide variety of domains. Does that help?
Not that wide a variety of domains, compared to all human tasks.
Specifically, they can only handle data that comes in matrix form, and often only after it has been cleaned up and processed by a human being. Consider, just the iris dataset: if instead of the measurements of the flowers you were working with photographs of the flowers, you might have made your problem substantially harder, since now you have a vision task not amenable to the algorithms you list.
Can you give an example of data that doesn’t come in matrix form? If you have a set of neurons and a set of connections between them, that’s a matrix. If you have asynchronous signals travelling between those neurons, that’s a time series of matrices. If it ain’t in a matrix, it ain’t data.
[ADDED: This was a silly thing for me to say, but most big data problems use matrices.]
The answer you just wrote could be characterized as a matrix of vocabulary words and index-of-occurrence. But that’s a pretty poor way to characterize it for almost all natural language processing techniques.
First of all, something like PCA and the other methods you listed won’t work on a ton of things that could be shoehorned into matrix format.
Taking an image or piece of audio and representing it using raw pixel or waveform data is horrible for most machine learning algorithms. Instead, you want to heavily transform it before you consider putting it into something like PCA.
A different problem goes for the matrix of neuronal connections in the brain: it’s too large-scale, too sparse, and too heterogenous to be usefully analyzed by anything but specialized methods with a lot of preprocessing and domain knowledge going into them. You might be able to cluster different functional units of the brain, but as you tried to get to more granular units, heterogeneity in number of connnections per neuron would cause dense clusters to “absorb” sparser but legitimate clusters in almost all clustering methods. Working with a time-series of activations is an even bigger problem, since you want to isolate specific cascades of activations that correspond to a stimulus, and then look at the architecture of the activated part of the brain, characterize it, and then be able to understand things like which neurons are functionally equivalent but correspond to different parallel units in the brain (left eye vs. right eye).
If I give you a time series of neuronal activations and connections with no indication of the domain, you’d probably be able to come up with a somewhat predictive model using non-domain-specific methods, but you’d be handicapping yourself horribly.
Inferring causality is another problem—none of these predictive machine learning methods do a good job of establishing whether two factors have a causal relation, merely whether they have a predictive one (within the particular given dataset).
First, yes, I overgeneralized. Matrices don’t represent natural language and logic well.
But, the kinds of problems you’re talking about—music analysis, picture analysis, and anything you eventually want to put into PCA—are perfect for matrix methods. It’s popular to start music and picture analysis with a discrete Fourier transform, which is a matrix operation. Or you use MPEG, which is all matrices. Or you construct feature detectors, say edge detectors or contrast detectors, using simple neural networks such as those found in primary visual cortex, and you implement them with matrices. Then you pass those into higher-order feature detectors, which also use matrices. You may break information out of the matrices and process it logically further downstream, but that will be downstream of PCA. As a general rule, PCA is used only on data that has so far existed only in matrices. Things that need to be broken out are not homogenous enough, or too structured, to use PCA on.
There’s an excellent book called Neural Engineering by Chris Eliasmith in which he develops a matrix-based programming language that is supposed to perform calculations the way that the brain does. It has many examples of how to tackle “intelligent” problems with only matrices.
lukeprog linked above the Hsu paper that documents good correlation between different narrow human intelligence measurements. The author concludes that a general g factor is sufficient.
All humans have more or less the same cognitive hardware. The human brain is prestructured that specific areas normally have assigned specific functionality. In case of a lesion other parts of the brain can take over. If a brain is especially capable this covers all cranial regions. A single dimension measure for humans might suffice.
If a CPU has a higher clock frequency rating than another CPU of the identical series: the clock factor is the speedup factor for any CPU-centric algorithm.
An AI with NN pattern matching architecture will be similar slow and unreliable in mental arithmetics like us humans. Extend its architecture with a floating point coprocessor and its arithmetic capabilities will rise by magnitudes.
If you challenge an AI that is superintelligent in engineering but has low performance regarding this challenging requirement it will design a coprocessor for this task. Such coprocessors exist already: FPGA. Programming is highly complex but speedups of magnitudes reward all efforts. Once a coprocessor hardware configuration is in the world it can be shared and further improved by other engineering AIs.
To monitor AI intelligence development of extremly heterogeneous and dynamic architectures we need high dimensional intelligence metrics.
We have to assume only that we will not significantly improve our understanding of what intelligence is without attempting to create it (through reverse engineering, coding, or EMs).
If our understanding remains incipient the safe policy is to assume that indeed intelligence is a capacity, or set of capacities that can be used to bootstrap itself. Given the 10¨52 lives at stake, even if we were fairly confident intelligence cannot bootstrap, we should still MaxiPok and act as if it was.
We have to assume only that we will not significantly improve our understanding of what intelligence is without attempting to create it
I disagree. By analogy, understanding of agriculture has increased greatly without the creation of an artificial photosynthetic cell. And yes, I know that photovoltic panels exist, but only a long time later.
Do you mind spelling out the analogy? (including where it breaks) I didn’t get it.
Reading my comment I feel compelled to clarify what I meant:
Katja asked: in which worlds should we worry about what ‘intelligence’ designates not being what we think it does?
I responded: in all the worlds where increasing our understanding of ‘intelligence’ has the side effect of increasing attempts to create it—due to feasibility, curiosity, or an urge for power. In these worlds, expanding our knowledge increases the expected risk, because of the side effects.
Whether intelligence is or not what we thought will only be found after the expected risk increased, then we find out the fact, and the risk either skyrockets or plummets. In hindsight, if it plummets, having learned more would look great. In hindsight, if it skyrockets, we are likely dead.
Single-metric versions of intelligence are going the way of the dinosaur. In practical contexts, it’s much better to test for a bunch of specific skills and aptitudes and to create a predictive model of success at the desired task.
In addition, our understanding of intelligence frequently gives a high score to someone capable of making terrible decisions or someone reasoning brilliantly from a set of desperately flawed first principles.
Yeah, having high math or reading comprehension capability does not always make people more effective or productive. They can still, for instance, become suidical, sociopathic or rebel against well-meaning authorities. They still often do not go into their doctor when sick, they develop addictions, they may become too introverted or arrogant when it is counterproductive or fail to escape bad relationships.
We should not strictly be looking to enhance intelligence. If we’re going down the enhancement route at all, we should wish to create good decision-makers without, for example, tendencies to mis-read people, sociopathy and self-harm.
For instance, rebelling against well-meaning authorities has been known to cause someone not to adhere to a correct medication regime or to start smoking.
Problems regularly rear their head when it comes to listening to the doctor.
I guess I’ll add that the well-meaning authority is also knowledgeable.
Really, what I am getting at is that just like anyone else, smart people may rebel or conform as a knee-jerk reaction. Neither is using reason to come to an appropriate conclusion, but I have seen them do it all the time.
One might think an agent who was sufficiently smart would at some point apply reason to the question of whether they should follow their knee-jerk responses with respect to e.g. these decisions.
Single-metric versions of intelligence are going the way of the dinosaur. In practical contexts, it’s much better to test for a bunch of specific skills and aptitudes and to create a predictive model of success at the desired task.
I first read the book in the early nineties, though Howard Gardner had published the first edition in 1982. I was at first a bit extra skeptical that it would be based too much on some form of “political correctness”, but I found the concepts to be very compelling.
Most of the discussion I heard in subsequent years, occasionally by psychology professor and grad student friends, continued to be positive.
I might say that I had no ulterior motive in trying to find reasons to agree with the book, since I always score in the genius range myself on standardized, traditional-style IQ tests.
So, it does seem to me that intelligence is a vector, not a scalar, if we have to call it by one noun.
As to Katja’s follow-up question, does it matter for Bostrom’s arguments? Not really, as long as one is clear (which it is from the contexts of his remarks) which kind(s) of intelligence he is referring to.
I think there is a more serious vacuum in our understanding, than whether intelligence is a single property, or comes in several irreducibly different (possibly context-dependent) forms, and that is this : with respect to the sorts of intelligence we usually default to conversing about (like the sort that helps a reader understand Bostrom’s book, an explanation of special relativity, or RNA interference in molecular biology), do we even know what we think we know about what that is.
I would have to explain the idea of this purported “vacuum” in understanding at significant length; it is a set of new ideas that stuck me, together, as a set of related insights. I am working on a paper explaining the new perspective I think I have found, and why it might open up some new important questions and strategies for AGI. When it is finished and clear enough to be useful, I will make it available by PDF or on a blog.
(Too lengthy to put in one post here, so I will put the link up. If these ideas pan out, they may suggest some reconceptualizations with nontrivial consequences, and be informative in a scalable sense—which is what one in this area of research would hope for.)
Ambiguities around ‘intelligence’ often complicate discussions about superintelligence, so it seems good to think about them a little.
Some common concerns: is ‘intelligence’ really a thing? Can intelligence be measured meaningfully as a single dimension? Is intelligence the kind of thing that can characterize a wide variety of systems, or is it only well-defined for things that are much like humans? (Kruel’s interviewees bring up these points several times)
What do we have to assume about intelligence to accept Bostrom’s arguments? For instance, does the claim that we might reach superintelligence by this variety of means require that intelligence be a single ‘thing’?
Is intelligence really a single dimension?
Related: Do we see a strong clustering of strategies that work across all the domains we have encountered so far? I see the answer to original question being yes if there is just one large cluster, and no if it turns out there are many fairly orthogonal clusters.
Is robustness against corner cases (idiosyncratic domains) a very important parameter? We certainly treat it as such in our construction of least convenient worlds to break decision theories.
There is a small set of operations (dimension reduction, 2-class categorization, n-class categorization, prediction) and algorithms for them (PCA, SVM, k-means, regression) that work well on a wide variety of domains. Does that help?
Not that wide a variety of domains, compared to all human tasks.
Specifically, they can only handle data that comes in matrix form, and often only after it has been cleaned up and processed by a human being. Consider, just the iris dataset: if instead of the measurements of the flowers you were working with photographs of the flowers, you might have made your problem substantially harder, since now you have a vision task not amenable to the algorithms you list.
Can you give an example of data that doesn’t come in matrix form? If you have a set of neurons and a set of connections between them, that’s a matrix. If you have asynchronous signals travelling between those neurons, that’s a time series of matrices. If it ain’t in a matrix, it ain’t data.
[ADDED: This was a silly thing for me to say, but most big data problems use matrices.]
The answer you just wrote could be characterized as a matrix of vocabulary words and index-of-occurrence. But that’s a pretty poor way to characterize it for almost all natural language processing techniques.
First of all, something like PCA and the other methods you listed won’t work on a ton of things that could be shoehorned into matrix format.
Taking an image or piece of audio and representing it using raw pixel or waveform data is horrible for most machine learning algorithms. Instead, you want to heavily transform it before you consider putting it into something like PCA.
A different problem goes for the matrix of neuronal connections in the brain: it’s too large-scale, too sparse, and too heterogenous to be usefully analyzed by anything but specialized methods with a lot of preprocessing and domain knowledge going into them. You might be able to cluster different functional units of the brain, but as you tried to get to more granular units, heterogeneity in number of connnections per neuron would cause dense clusters to “absorb” sparser but legitimate clusters in almost all clustering methods. Working with a time-series of activations is an even bigger problem, since you want to isolate specific cascades of activations that correspond to a stimulus, and then look at the architecture of the activated part of the brain, characterize it, and then be able to understand things like which neurons are functionally equivalent but correspond to different parallel units in the brain (left eye vs. right eye).
If I give you a time series of neuronal activations and connections with no indication of the domain, you’d probably be able to come up with a somewhat predictive model using non-domain-specific methods, but you’d be handicapping yourself horribly.
Inferring causality is another problem—none of these predictive machine learning methods do a good job of establishing whether two factors have a causal relation, merely whether they have a predictive one (within the particular given dataset).
First, yes, I overgeneralized. Matrices don’t represent natural language and logic well.
But, the kinds of problems you’re talking about—music analysis, picture analysis, and anything you eventually want to put into PCA—are perfect for matrix methods. It’s popular to start music and picture analysis with a discrete Fourier transform, which is a matrix operation. Or you use MPEG, which is all matrices. Or you construct feature detectors, say edge detectors or contrast detectors, using simple neural networks such as those found in primary visual cortex, and you implement them with matrices. Then you pass those into higher-order feature detectors, which also use matrices. You may break information out of the matrices and process it logically further downstream, but that will be downstream of PCA. As a general rule, PCA is used only on data that has so far existed only in matrices. Things that need to be broken out are not homogenous enough, or too structured, to use PCA on.
There’s an excellent book called Neural Engineering by Chris Eliasmith in which he develops a matrix-based programming language that is supposed to perform calculations the way that the brain does. It has many examples of how to tackle “intelligent” problems with only matrices.
lukeprog linked above the Hsu paper that documents good correlation between different narrow human intelligence measurements. The author concludes that a general g factor is sufficient.
All humans have more or less the same cognitive hardware. The human brain is prestructured that specific areas normally have assigned specific functionality. In case of a lesion other parts of the brain can take over. If a brain is especially capable this covers all cranial regions. A single dimension measure for humans might suffice.
If a CPU has a higher clock frequency rating than another CPU of the identical series: the clock factor is the speedup factor for any CPU-centric algorithm.
An AI with NN pattern matching architecture will be similar slow and unreliable in mental arithmetics like us humans. Extend its architecture with a floating point coprocessor and its arithmetic capabilities will rise by magnitudes.
If you challenge an AI that is superintelligent in engineering but has low performance regarding this challenging requirement it will design a coprocessor for this task. Such coprocessors exist already: FPGA. Programming is highly complex but speedups of magnitudes reward all efforts. Once a coprocessor hardware configuration is in the world it can be shared and further improved by other engineering AIs.
To monitor AI intelligence development of extremly heterogeneous and dynamic architectures we need high dimensional intelligence metrics.
We have to assume only that we will not significantly improve our understanding of what intelligence is without attempting to create it (through reverse engineering, coding, or EMs). If our understanding remains incipient the safe policy is to assume that indeed intelligence is a capacity, or set of capacities that can be used to bootstrap itself. Given the 10¨52 lives at stake, even if we were fairly confident intelligence cannot bootstrap, we should still MaxiPok and act as if it was.
I disagree. By analogy, understanding of agriculture has increased greatly without the creation of an artificial photosynthetic cell. And yes, I know that photovoltic panels exist, but only a long time later.
Do you mind spelling out the analogy? (including where it breaks) I didn’t get it.
Reading my comment I feel compelled to clarify what I meant:
Katja asked: in which worlds should we worry about what ‘intelligence’ designates not being what we think it does?
I responded: in all the worlds where increasing our understanding of ‘intelligence’ has the side effect of increasing attempts to create it—due to feasibility, curiosity, or an urge for power. In these worlds, expanding our knowledge increases the expected risk, because of the side effects.
Whether intelligence is or not what we thought will only be found after the expected risk increased, then we find out the fact, and the risk either skyrockets or plummets. In hindsight, if it plummets, having learned more would look great. In hindsight, if it skyrockets, we are likely dead.
Single-metric versions of intelligence are going the way of the dinosaur. In practical contexts, it’s much better to test for a bunch of specific skills and aptitudes and to create a predictive model of success at the desired task.
In addition, our understanding of intelligence frequently gives a high score to someone capable of making terrible decisions or someone reasoning brilliantly from a set of desperately flawed first principles.
Ok, does this matter for Bostrom’s arguments?
Yeah, having high math or reading comprehension capability does not always make people more effective or productive. They can still, for instance, become suidical, sociopathic or rebel against well-meaning authorities. They still often do not go into their doctor when sick, they develop addictions, they may become too introverted or arrogant when it is counterproductive or fail to escape bad relationships.
We should not strictly be looking to enhance intelligence. If we’re going down the enhancement route at all, we should wish to create good decision-makers without, for example, tendencies to mis-read people, sociopathy and self-harm.
What’s wrong with that?
...and, presumably, without tendencies to rebel against well-meaning authorities?
I don’t think I like the idea of genetic slavery.
For instance, rebelling against well-meaning authorities has been known to cause someone not to adhere to a correct medication regime or to start smoking.
Problems regularly rear their head when it comes to listening to the doctor.
I guess I’ll add that the well-meaning authority is also knowledgeable.
Let me point out the obvious: the knowledgeable well-meaning authority is not necessarily acting in your best interests.
Not to mention that authority that’s both knowledgeable and well-meaning is pretty rare.
Really, what I am getting at is that just like anyone else, smart people may rebel or conform as a knee-jerk reaction. Neither is using reason to come to an appropriate conclusion, but I have seen them do it all the time.
One might think an agent who was sufficiently smart would at some point apply reason to the question of whether they should follow their knee-jerk responses with respect to e.g. these decisions.
I thought that this had become a fairly dominant view, over 20 years ago. See this PDF: http://www.learner.org/courses/learningclassroom/support/04_mult_intel.pdf
I first read the book in the early nineties, though Howard Gardner had published the first edition in 1982. I was at first a bit extra skeptical that it would be based too much on some form of “political correctness”, but I found the concepts to be very compelling.
Most of the discussion I heard in subsequent years, occasionally by psychology professor and grad student friends, continued to be positive.
I might say that I had no ulterior motive in trying to find reasons to agree with the book, since I always score in the genius range myself on standardized, traditional-style IQ tests.
So, it does seem to me that intelligence is a vector, not a scalar, if we have to call it by one noun.
As to Katja’s follow-up question, does it matter for Bostrom’s arguments? Not really, as long as one is clear (which it is from the contexts of his remarks) which kind(s) of intelligence he is referring to.
I think there is a more serious vacuum in our understanding, than whether intelligence is a single property, or comes in several irreducibly different (possibly context-dependent) forms, and that is this : with respect to the sorts of intelligence we usually default to conversing about (like the sort that helps a reader understand Bostrom’s book, an explanation of special relativity, or RNA interference in molecular biology), do we even know what we think we know about what that is.
I would have to explain the idea of this purported “vacuum” in understanding at significant length; it is a set of new ideas that stuck me, together, as a set of related insights. I am working on a paper explaining the new perspective I think I have found, and why it might open up some new important questions and strategies for AGI.
When it is finished and clear enough to be useful, I will make it available by PDF or on a blog. (Too lengthy to put in one post here, so I will put the link up. If these ideas pan out, they may suggest some reconceptualizations with nontrivial consequences, and be informative in a scalable sense—which is what one in this area of research would hope for.)