Did you include your own answer to the question of why AI hasn’t arrived yet in the list? :-)
This is a nice post. Another way of stating the moral might be: “If you want to understand something, you have to stare your confusion right in the face; don’t look away for a second.”
So, what is confusing about intelligence? That question is problematic: a better one might be “what isn’t confusing about intelligence?”
Here’s one thing I’ve pondered at some length. The VC theory states that in order to generalize well a learning machine must implement some form of capacity control or regularization, which roughly means that the model class it uses must have limited complexity (VC dimension). This is just Occam’s razor.
But the brain has on the order of 10^12 synapses, and so it must be enormously complex. How can the brain generalize, if it has so many parameters? Are the vast majority of synaptic weights actually not learned, but rather preset somehow? Or, is regularization implemented in some other way, perhaps by applying random changes to the value of the weights (this would seem biochemically plausible)?
Also, the brain has a very high metabolic cost, so all those neurons must be doing something valuable.
Are the vast majority of synaptic weights actually not learned, but rather preset somehow?”
This is what some philosophers have purposed, others have thought we start as a blank slate. The research into the subject has shown that babies do start with some sort of working model of things. That is we begin life with a set of preset preferences and the ability to distinguish those preferences and a basic understanding of geometric shapes.
It would be shocking if we didn’t have preset functions. Calves, for example, can walk almost straight away and swim not much longer. We aren’t going to entirely eliminate the mammalian ability to start with a set of preset features there just isn’t enough pressure to keep a few of them.
Conversely, studies with newborn mammals have shown that if you deprive them of something as simple as horizontal lines, they will grow up unable to distinguish lines that approach ‘horizontalness’. So even separating the most basic evolved behavior from the most basic learned behavior is not intuitive.
The deprivation you’re talking about takes place over the course of days and weeks—it reflects the effects of (lack of) reinforcement learning, so it’s not really germane to a discussion of preset functions that manifest in the first few minutes after birth.
It’s relevant insofar as we shouldn’t make assumptions on what is and is not preset simply based on observations that take place in a “typical” environment.
Ah, a negative example. Fair point. Guess I wasn’t paying enough attention and missed the signal you meant to send by using “conversely” as the first word of your comment.
Good point. Drink (food), breathe, scream and a couple of cute reactions to keep caretakers interested. All you need to bootstrap a human growth process. There seems to be something built in about eye contact management too—because a lack there is an early indicator that something is wrong.
I’d replace “human” with “mammalian”—most young mammals share a similar set of traits, even those that aren’t constrained as we are by big brains and a pelvic girdle adapted to walking upright. That seems to suggest a more basal cuteness response; I believe the biology term is “baby schema”.
Artificial Neural Networks have been trained with millions of parameters. There are a lot of different methods of regularization like dropconnect or sparsity constraints. But the brain does online learning. Overfitting isn’t as big of a concern because it doesn’t see the data more than once.
On the other hand, architecture matters. The most successful neural network for a given task has connections designed for the structure of that task, so that it will learn much more quickly than a fully-connected or arbitrarily connected network.
The human brain appears to have a great deal of information and structure in its architecture right off the bat.
I’m not saying that you’re wrong, but the state of the art in computer vision is weight sharing which biological NNs probably can’t do. Hyper parameters like the number of layers and how local the connections should be, are important but they don’t give that much prior information about the task.
I may be completely wrong, but I do suspect that biological NNs are far more general purpose and less “pre-programmed” than is usually thought. The learning rules for a neural network are far simpler than the functions they learn. Training neural networks with genetic algorithms is extremely slow.
Architecture of the V1 and V2 areas of the brain, which Convolutional Neural Networks and other ANNs for vision borrow heavily from, is highly geared towards vision, and includes basic filters that detect stripes, dots, corners, etc. that appear in all sorts of computer vision work. Yes, no backpropagation or weight-sharing is directly responsible for this, but the presence of local filters is still what I would call very specific architecture (I’ve studied computer vision and inspiration it draws from early vision specifically, so I can say more about this).
The way genetic algorithms tune weights in an ANN (and yes, this is an awful way to train an ANN) is very different from the way they work in actually evolving a brain; working on the genetic code that develops the brain. I’d say they are so wildly different that no conclusions from the first can be applied to the second.
During a single individual’s life, Hebbian and other learning mechanisms in the brain are distinct from gradient learning, but can achieve somewhat similar things.
The human brain appears to engage in hierarchical learning, which is what allows it to leverage huge amounts of “general case” abstract knowledge in attacking novel specific problems put before it.
Eliezer,
Did you include your own answer to the question of why AI hasn’t arrived yet in the list? :-)
This is a nice post. Another way of stating the moral might be: “If you want to understand something, you have to stare your confusion right in the face; don’t look away for a second.”
So, what is confusing about intelligence? That question is problematic: a better one might be “what isn’t confusing about intelligence?”
Here’s one thing I’ve pondered at some length. The VC theory states that in order to generalize well a learning machine must implement some form of capacity control or regularization, which roughly means that the model class it uses must have limited complexity (VC dimension). This is just Occam’s razor.
But the brain has on the order of 10^12 synapses, and so it must be enormously complex. How can the brain generalize, if it has so many parameters? Are the vast majority of synaptic weights actually not learned, but rather preset somehow? Or, is regularization implemented in some other way, perhaps by applying random changes to the value of the weights (this would seem biochemically plausible)?
Also, the brain has a very high metabolic cost, so all those neurons must be doing something valuable.
This is what some philosophers have purposed, others have thought we start as a blank slate. The research into the subject has shown that babies do start with some sort of working model of things. That is we begin life with a set of preset preferences and the ability to distinguish those preferences and a basic understanding of geometric shapes.
It would be shocking if we didn’t have preset functions. Calves, for example, can walk almost straight away and swim not much longer. We aren’t going to entirely eliminate the mammalian ability to start with a set of preset features there just isn’t enough pressure to keep a few of them.
If you put a newborn whose mother had an unmedicated labor on the mother’s stomach, the baby will move up to a breast and start to feed.
Conversely, studies with newborn mammals have shown that if you deprive them of something as simple as horizontal lines, they will grow up unable to distinguish lines that approach ‘horizontalness’. So even separating the most basic evolved behavior from the most basic learned behavior is not intuitive.
The deprivation you’re talking about takes place over the course of days and weeks—it reflects the effects of (lack of) reinforcement learning, so it’s not really germane to a discussion of preset functions that manifest in the first few minutes after birth.
It’s relevant insofar as we shouldn’t make assumptions on what is and is not preset simply based on observations that take place in a “typical” environment.
Ah, a negative example. Fair point. Guess I wasn’t paying enough attention and missed the signal you meant to send by using “conversely” as the first word of your comment.
That was lazy of me, in retrospect. I find that often I’m poorer at communicating my intent than I assume I am.
Illusion of transparency strikes again!
Good point. Drink (food), breathe, scream and a couple of cute reactions to keep caretakers interested. All you need to bootstrap a human growth process. There seems to be something built in about eye contact management too—because a lack there is an early indicator that something is wrong.
Not terribly relevant to your point, but it’s likely human sense of cuteness is based on what babies do rather than the other way around.
I’d replace “human” with “mammalian”—most young mammals share a similar set of traits, even those that aren’t constrained as we are by big brains and a pelvic girdle adapted to walking upright. That seems to suggest a more basal cuteness response; I believe the biology term is “baby schema”.
Other than that, yeah.
Artificial Neural Networks have been trained with millions of parameters. There are a lot of different methods of regularization like dropconnect or sparsity constraints. But the brain does online learning. Overfitting isn’t as big of a concern because it doesn’t see the data more than once.
On the other hand, architecture matters. The most successful neural network for a given task has connections designed for the structure of that task, so that it will learn much more quickly than a fully-connected or arbitrarily connected network.
The human brain appears to have a great deal of information and structure in its architecture right off the bat.
I’m not saying that you’re wrong, but the state of the art in computer vision is weight sharing which biological NNs probably can’t do. Hyper parameters like the number of layers and how local the connections should be, are important but they don’t give that much prior information about the task.
I may be completely wrong, but I do suspect that biological NNs are far more general purpose and less “pre-programmed” than is usually thought. The learning rules for a neural network are far simpler than the functions they learn. Training neural networks with genetic algorithms is extremely slow.
Architecture of the V1 and V2 areas of the brain, which Convolutional Neural Networks and other ANNs for vision borrow heavily from, is highly geared towards vision, and includes basic filters that detect stripes, dots, corners, etc. that appear in all sorts of computer vision work. Yes, no backpropagation or weight-sharing is directly responsible for this, but the presence of local filters is still what I would call very specific architecture (I’ve studied computer vision and inspiration it draws from early vision specifically, so I can say more about this).
The way genetic algorithms tune weights in an ANN (and yes, this is an awful way to train an ANN) is very different from the way they work in actually evolving a brain; working on the genetic code that develops the brain. I’d say they are so wildly different that no conclusions from the first can be applied to the second.
During a single individual’s life, Hebbian and other learning mechanisms in the brain are distinct from gradient learning, but can achieve somewhat similar things.
The human brain appears to engage in hierarchical learning, which is what allows it to leverage huge amounts of “general case” abstract knowledge in attacking novel specific problems put before it.