The enterprise of achieving it artificially — the field of ‘artificial general intelligence’ or AGI — has made no progress whatever during the entire six decades of its existence.
I disagree with this. The development of probabilistic graphical models (incl. bayesian networks and some types of neural networks) was a very important forward advance, I think.
It remained a guess until the 1980s, when I proved it using the quantum theory of computation.
A little bit of arrogance here from Deutsch, but we can let it slide.
This astounding claim split the intellectual world into two camps, one insisting that AGI was none the less impossible, and the other that it was imminent. Both were mistaken. The first, initially predominant, camp cited a plethora of reasons ranging from the supernatural to the incoherent. All shared the basic mistake that they did not understand what computational universality implies about the physical world, and about human brains in particular.
Absolutely true, and the first camp still persists to this day, and is still extremely confused/ignorant about universality. It’s a view that is espoused even in ‘popular science’ books.
Suppose you were somehow to give them a list, as with the temperature-conversion program, of explanations of Dark Matter that would be acceptable outputs of the program. If the program did output one of those explanations later, that would not constitute meeting your requirement to generate new explanations. For none of those explanations would be new: you would already have created them yourself in order to write the specification. So, in this case, and actually in all other cases of programming genuine AGI, only an algorithm with the right functionality would suffice. But writing that algorithm (without first making new discoveries in physics and hiding them in the program) is exactly what you wanted the programmers to do!
I don’t follow. You can write a program to generate random hypotheses, and you can write a program to figure out the implications of those hypotheses and whether they fit in with current experimental data, and if they do, to come up with tests of those ideas for future experiments. Now, just generating hypotheses completely randomly may not be a very efficient way, but it would work. That’s very different from saying “It’s impossible”. It’s just a question of figuring out how to make it efficient. So what’s the problem here?
Nor can it be met by the technique of ‘evolutionary algorithms’: the Turing test cannot itself be automated without first knowing how to write an AGI program, since the ‘judges’ of a program need to have the target ability themselves.
But the Turing test is very different from coming up with an explanation of dark matter. The Turing test is a very specific test of use of language and common sense, which is only defined in relation to human beings (and thus needs human beings to test) whereas an explanation of dark matter does not need human beings to test. Thus making this particular argument moot.
The prevailing misconception is that by assuming that ‘the future will be like the past’, it can ‘derive’ (or ‘extrapolate’ or ‘generalise’) theories from repeated experiences by an alleged process called ‘induction’. But that is impossible.
What else could it possibly be? Information is either encoded into a brain, or predicted based on past experiences. There is no other way to gain information. Deutsch gives the example of dates starting with 19- or 20-. Surely, such information is not encoded into our brains from birth. It must be learned from past experiences. But knowledge of dates isn’t the only knowledge we have! We have teachers and parents telling us about these things so that we can learn how they work. This all falls under the umbrella of ‘past experiences’. And, indeed, a machine who’s only inputs were dates would have a tough time making meaningful inferences about them, no matter how intelligent or creative it was.
But in reality, only a tiny component of thinking is about prediction at all, let alone prediction of our sensory experiences. We think about the world: not just the physical world but also worlds of abstractions such as right and wrong, beauty and ugliness, the infinite and the infinitesimal, causation, fiction, fears, and aspirations — and about thinking itself.
I cannot make head or tail of this.
Anyway, I stopped reading after this point because it was disappointing. I expected an interesting and insightful argument, one to make me actually question my fundamental assumptions, but that’s not the case here.
I don’t follow. You can write a program to generate random hypotheses, and you can write a program to figure out the implications of those hypotheses and whether they fit in with current experimental data, and if they do, to come up with tests of those ideas for future experiments. Now, just generating hypotheses completely randomly may not be a very efficient way, but it would work. That’s very different from saying “It’s impossible”. It’s just a question of figuring out how to make it efficient. So what’s the problem here?
I think his claim is basically “we don’t know yet how to teach a machine how to identify reasonable hypotheses in a short amount of time,” where the “short amount of time” is implicit. The proposal “let’s just test every possible program, and see which ones explain Dark Matter” is not a workable approach, even if it seems to describe the class that contains actual workable approaches. (Imagine actually going to a conference and proposing a go-bot that considers every possible sequence of moves possible from the current board position, and then picks the tree most favorable to it.)
But the Turing test is very different from coming up with an explanation of dark matter. The Turing test is a very specific test of use of language and common sense, which is only defined in relation to human beings (and thus needs human beings to test) whereas an explanation of dark matter does not need human beings to test. Thus making this particular argument moot.
I think the Turing test is being used as an illustrative example here. It seems unlikely that you could have a genetic algorithm operate on a population of code and end up with a program that passes the Turing test, because at each step the genetic algorithm (as an optimization procedure) needs to have some sense of what is more or less likely to pass the test. It similarly seems unlikely that you could have a genetic algorithm operate on a population of physics explanations and end up with an explanation that successfully explains Dark Matter, because at each step the genetic algorithm needs to have some sense of what is more or less likely to explain Dark Matter.
I think his claim is that a correct inference procedure will point right at the correct answer, but as I disagree with that point I am reluctant to ascribe it to him. I think it likely that a correct inference procedure involves checking out vast numbers of explanations, and discarding most of them very early on. But optimization over explanations instead of over plans is in its infancy, and I think he’s right that AGI will be distant so long as that remains the case.
What else could it possibly be?
My interpretation of that section is that Deutsch is claiming that “induction” is not a complete explanation. If you say “well, the sun rose every day for as long as I can remember, and I suspect it will do so today,” then you get surprised by things like “well, the year starts with 19 every day for as long as I can remember, and I suspect it will do so today.” If you say “the sun rises because the Earth rotates around its axis, the sun emits light because of nuclear fusion, and I think the sun has enough fuel to continue shining, angular momentum is conserved, and the laws of physics do not vary with time,” then your expectation that the sun will rise is very likely to be concordant with reality, and you are very unlikely to make that sort of mistake with the date. But how do you gets beliefs of that sort to begin with? You use science, which is a bit more complicated than induction.
Similarly, the claim that prediction is unimportant seems to be that the target of an epistemology should be at least one level higher than the output predictions- you don’t want “the probability the sun will rise tomorrow” but “conservation of angular momentum” because the second makes you more knowledgeable and more powerful.
I think his claim is basically “we don’t know yet how to teach a machine how to identify reasonable hypotheses in a short amount of time,” where the “short amount of time” is implicit.
My impression was that he was saying that creativity is some mysterious thing that we don’t know how to implement. But we do. Creativity is just search. Search that is possibly guided by experience solving similar problems. By learning from past experiences, search becomes more efficient. This idea is quite consistent with studies on how the human brain works. Beginner chess players rely more on ‘thinking’ (i.e. considering a large variety of moves, most of which are terrible), but grandmasters seem to rely more on their memory.
It similarly seems unlikely that you could have a genetic algorithm operate on a population of physics explanations and end up with an explanation that successfully explains Dark Matter, because at each step the genetic algorithm needs to have some sense of what is more or less likely to explain Dark Matter.
As I said, though, it’s quite different, because a hypothetical explanation for dark matter needs to only be consistent with existing experimental data. It’s true that it’s unfeasible to do this for the Turing test, because you need to test millions of candidate programs against humans, and this cannot be done inside the computer unless you already have AGI. But checking proposals for dark matter against existing data can be done entirely inside the computer.
I think his claim is that a correct inference procedure will point right at the correct answer, but as I disagree with that point I am reluctant to ascribe it to him.
I agree with you.
My interpretation of that section is that Deutsch is claiming that “induction” is not a complete explanation. If you say “well, the sun rose every day for as long as I can remember, and I suspect it will do so today,” then you get surprised by things like “well, the year starts with 19 every day for as long as I can remember, and I suspect it will do so today.”
If the machine’s only inputs were ’1990, 1991, 1992, … , 1999′, and it had no knowledge of math, arithmetic, language, or what years represent, then how on Earth can it possibly make any inference other than the next date will also start with 19? There is no other inference it could make.
On the other hand, if it had access to the sequence ’1900, 1901, 1902, … , 1999′ then it becomes a different story. It can infer that 1 always follows 0, 2 always follows 1, etc., and 0 always follows 9. It could also infer that when 0 follows 9, the next digit is incremented. Thus it can conclude that after 1999, the date 2000 is plausible, and add it to its list of highly-plausible hypotheses. Another hypothesis could be that the 3rd digit is never affected, and that the next date after 1999 is 1900.
Equivalently, if it had already been told about math, it would know how number sequences work, and could say with high confidence that the next year will be 2000. Yes, going to school counts as ‘past experiences’.
It’s is a common mistake that people make when talking about induction. They think induction is simply just ‘X has always happened, therefore it will always happen’. But induction is far more complicated than that! That’s why it took so long to come up with a mathematical theory of induction (Solomonoff induction). Solomonoff induction considers all possible hypotheses—some of them possibly extremely complex—and weighs them according to how simple they are and if they fit the observed data. That is the very definition of science. Solomonoff induction could accurately predict the progression of dates, and could do ‘science’. People have implemented time-limited versions of Solomonoff induction on a computer and they work as expected. We do need to come up with faster and more efficient ways of doing this, though. I agree with that.
I agree that there’s a lot more work to be done in AI. We need to find better learning and search algorithms. What I disagree with is that the work must be this kind of philosophical work that Deutsch is proposing. I think the work that needs to be done is very much engineering work.
Correct, but not helpful; when you say “just search,” that’s like saying “but Dark Matter is just physics.” The physicists don’t have a good explanation of Dark Matter yet, and the search people don’t have a good implementation of creativity (on the level of concepts) yet.
I agree that there’s a lot more work to be done in AI. We need to find better learning and search algorithms. What I disagree with is that the work must be this kind of philosophical work that Deutsch is proposing. I think the work that needs to be done is very much engineering work.
It is not obvious to me that Deutsch is familiar with ideas like Solomonoff induction, Pearl’s work on causality, and so on, and thinks that they’re inadequate to the task. He might be saying “we need a formalized version of induction” while unaware that Solomonoff already proposed one.
Search that is possibly guided by experience solving similar problems. By learning from past experiences, search becomes more efficient.
I agree that there’s a lot more work to be done in AI. We need to find better learning and search algorithms.
Why did I mention this at all? Because there’s no other way to do this. Creativity (coming up with new unprecedented solutions to problems) must utilize some form of search, and due to the no-free-lunch theorem, there is no shortcut to finding the solution to a problem. The only thing that can get around no-free-lunch is to consider an ensemble of problems. That is, to learn from past experiences.
And about your point:
It is not obvious to me that Deutsch is familiar with ideas like Solomonoff induction, Pearl’s work on causality, and so on, and thinks that they’re inadequate to the task.
I agree with this. The fact that he didn’t even mention Solomonoff at all, even in passing, despite the fact that he devoted half the article to talking about induction, is strongly indicative of this.
That doesn’t look helpful to me. Yes, you can define creativity this way but the price you pay is that your search space becomes impossibly huge and high-dimensional.
Defining sculpture as a search for a pleasing arrangement of atoms isn’t very useful.
“It seems unlikely that you could have a genetic algorithm operate on a population of code and end up with a program that passes the Turing test”
Well, we have one case of it working, and that wasn’t even with the process being designed with the “pass the Turing test” specifically as a goal.
“because at each step the genetic algorithm (as an optimization procedure) needs to have some sense of what is more or less likely to pass the test.”
Having an automated process for determining with certainty that something passes the Turing test is quite stronger than merely having nonzero information. Suppose I’m trying to use a genetic algorithm to create a Halting Tester, and I have a Halting Tester that says that a program doesn’t halt. If I know that the program does, in fact, not halt after n steps (by simply running the program for n steps), that provides nonzero information about the efficacy of my Halting Tester. This suggests that I could create a genetic algorithm for creating Halting Testers (obviously, I couldn’t evolve a perfect Halting Tester, but perhaps I could evolve one that is “good enough”, given some standard). And who knows, maybe if I had such a genetic algorithm, not only would my Halting Testers evolve better Halting Testing, but since they are competing against each other, they would evolve better Tricking Other Halting Testers, and maybe that would eventually spawn AGI. I don’t find that inconceivable.
Well, we have one case of it working, and that wasn’t even with the process being designed with the “pass the Turing test” specifically as a goal.
Are you referring to the biological evolution of humans, or stuff like this?
Having an automated process for determining with certainty that something passes the Turing test is quite stronger than merely having nonzero information.
Right; how did you interpret “some sense of what is more or less likely to pass the test”?
I was referring to the biological evolution of humans; in your link, the process appears to have been designed with the Turing test in mind.
There’s probably going to be a lot of guesswork as for as what metrics for “more likely to pass” are best, but the process doesn’t have to be perfect, just good enough to generate intelligence. Obvious places to start would be complex games such as Go and poker, and replicating aspects of human evolution, such as simulating hunting and social maneuvering.
I was referring to the biological evolution of humans
Ok. When I said “you,” I meant modern humans operating on modern programming languages. I also don’t think it’s quite correct to equate actual historical evolution and genetic algorithms, for somewhat subtle technical reasons.
I disagree with this. The development of probabilistic graphical models (incl. bayesian networks and some types of neural networks) was a very important forward advance, I think.
A little bit of arrogance here from Deutsch, but we can let it slide.
Absolutely true, and the first camp still persists to this day, and is still extremely confused/ignorant about universality. It’s a view that is espoused even in ‘popular science’ books.
I don’t follow. You can write a program to generate random hypotheses, and you can write a program to figure out the implications of those hypotheses and whether they fit in with current experimental data, and if they do, to come up with tests of those ideas for future experiments. Now, just generating hypotheses completely randomly may not be a very efficient way, but it would work. That’s very different from saying “It’s impossible”. It’s just a question of figuring out how to make it efficient. So what’s the problem here?
But the Turing test is very different from coming up with an explanation of dark matter. The Turing test is a very specific test of use of language and common sense, which is only defined in relation to human beings (and thus needs human beings to test) whereas an explanation of dark matter does not need human beings to test. Thus making this particular argument moot.
What else could it possibly be? Information is either encoded into a brain, or predicted based on past experiences. There is no other way to gain information. Deutsch gives the example of dates starting with 19- or 20-. Surely, such information is not encoded into our brains from birth. It must be learned from past experiences. But knowledge of dates isn’t the only knowledge we have! We have teachers and parents telling us about these things so that we can learn how they work. This all falls under the umbrella of ‘past experiences’. And, indeed, a machine who’s only inputs were dates would have a tough time making meaningful inferences about them, no matter how intelligent or creative it was.
I cannot make head or tail of this.
Anyway, I stopped reading after this point because it was disappointing. I expected an interesting and insightful argument, one to make me actually question my fundamental assumptions, but that’s not the case here.
I think his claim is basically “we don’t know yet how to teach a machine how to identify reasonable hypotheses in a short amount of time,” where the “short amount of time” is implicit. The proposal “let’s just test every possible program, and see which ones explain Dark Matter” is not a workable approach, even if it seems to describe the class that contains actual workable approaches. (Imagine actually going to a conference and proposing a go-bot that considers every possible sequence of moves possible from the current board position, and then picks the tree most favorable to it.)
I think the Turing test is being used as an illustrative example here. It seems unlikely that you could have a genetic algorithm operate on a population of code and end up with a program that passes the Turing test, because at each step the genetic algorithm (as an optimization procedure) needs to have some sense of what is more or less likely to pass the test. It similarly seems unlikely that you could have a genetic algorithm operate on a population of physics explanations and end up with an explanation that successfully explains Dark Matter, because at each step the genetic algorithm needs to have some sense of what is more or less likely to explain Dark Matter.
I think his claim is that a correct inference procedure will point right at the correct answer, but as I disagree with that point I am reluctant to ascribe it to him. I think it likely that a correct inference procedure involves checking out vast numbers of explanations, and discarding most of them very early on. But optimization over explanations instead of over plans is in its infancy, and I think he’s right that AGI will be distant so long as that remains the case.
My interpretation of that section is that Deutsch is claiming that “induction” is not a complete explanation. If you say “well, the sun rose every day for as long as I can remember, and I suspect it will do so today,” then you get surprised by things like “well, the year starts with 19 every day for as long as I can remember, and I suspect it will do so today.” If you say “the sun rises because the Earth rotates around its axis, the sun emits light because of nuclear fusion, and I think the sun has enough fuel to continue shining, angular momentum is conserved, and the laws of physics do not vary with time,” then your expectation that the sun will rise is very likely to be concordant with reality, and you are very unlikely to make that sort of mistake with the date. But how do you gets beliefs of that sort to begin with? You use science, which is a bit more complicated than induction.
Similarly, the claim that prediction is unimportant seems to be that the target of an epistemology should be at least one level higher than the output predictions- you don’t want “the probability the sun will rise tomorrow” but “conservation of angular momentum” because the second makes you more knowledgeable and more powerful.
My impression was that he was saying that creativity is some mysterious thing that we don’t know how to implement. But we do. Creativity is just search. Search that is possibly guided by experience solving similar problems. By learning from past experiences, search becomes more efficient. This idea is quite consistent with studies on how the human brain works. Beginner chess players rely more on ‘thinking’ (i.e. considering a large variety of moves, most of which are terrible), but grandmasters seem to rely more on their memory.
As I said, though, it’s quite different, because a hypothetical explanation for dark matter needs to only be consistent with existing experimental data. It’s true that it’s unfeasible to do this for the Turing test, because you need to test millions of candidate programs against humans, and this cannot be done inside the computer unless you already have AGI. But checking proposals for dark matter against existing data can be done entirely inside the computer.
I agree with you.
If the machine’s only inputs were ’1990, 1991, 1992, … , 1999′, and it had no knowledge of math, arithmetic, language, or what years represent, then how on Earth can it possibly make any inference other than the next date will also start with 19? There is no other inference it could make.
On the other hand, if it had access to the sequence ’1900, 1901, 1902, … , 1999′ then it becomes a different story. It can infer that 1 always follows 0, 2 always follows 1, etc., and 0 always follows 9. It could also infer that when 0 follows 9, the next digit is incremented. Thus it can conclude that after 1999, the date 2000 is plausible, and add it to its list of highly-plausible hypotheses. Another hypothesis could be that the 3rd digit is never affected, and that the next date after 1999 is 1900.
Equivalently, if it had already been told about math, it would know how number sequences work, and could say with high confidence that the next year will be 2000. Yes, going to school counts as ‘past experiences’.
It’s is a common mistake that people make when talking about induction. They think induction is simply just ‘X has always happened, therefore it will always happen’. But induction is far more complicated than that! That’s why it took so long to come up with a mathematical theory of induction (Solomonoff induction). Solomonoff induction considers all possible hypotheses—some of them possibly extremely complex—and weighs them according to how simple they are and if they fit the observed data. That is the very definition of science. Solomonoff induction could accurately predict the progression of dates, and could do ‘science’. People have implemented time-limited versions of Solomonoff induction on a computer and they work as expected. We do need to come up with faster and more efficient ways of doing this, though. I agree with that.
I agree that there’s a lot more work to be done in AI. We need to find better learning and search algorithms. What I disagree with is that the work must be this kind of philosophical work that Deutsch is proposing. I think the work that needs to be done is very much engineering work.
Correct, but not helpful; when you say “just search,” that’s like saying “but Dark Matter is just physics.” The physicists don’t have a good explanation of Dark Matter yet, and the search people don’t have a good implementation of creativity (on the level of concepts) yet.
It is not obvious to me that Deutsch is familiar with ideas like Solomonoff induction, Pearl’s work on causality, and so on, and thinks that they’re inadequate to the task. He might be saying “we need a formalized version of induction” while unaware that Solomonoff already proposed one.
I made it clear what I mean:
Why did I mention this at all? Because there’s no other way to do this. Creativity (coming up with new unprecedented solutions to problems) must utilize some form of search, and due to the no-free-lunch theorem, there is no shortcut to finding the solution to a problem. The only thing that can get around no-free-lunch is to consider an ensemble of problems. That is, to learn from past experiences.
And about your point:
I agree with this. The fact that he didn’t even mention Solomonoff at all, even in passing, despite the fact that he devoted half the article to talking about induction, is strongly indicative of this.
That doesn’t look helpful to me. Yes, you can define creativity this way but the price you pay is that your search space becomes impossibly huge and high-dimensional.
Defining sculpture as a search for a pleasing arrangement of atoms isn’t very useful.
After that sentence I made it clear what I mean. See my reply to Vaniver.
“It seems unlikely that you could have a genetic algorithm operate on a population of code and end up with a program that passes the Turing test”
Well, we have one case of it working, and that wasn’t even with the process being designed with the “pass the Turing test” specifically as a goal.
“because at each step the genetic algorithm (as an optimization procedure) needs to have some sense of what is more or less likely to pass the test.”
Having an automated process for determining with certainty that something passes the Turing test is quite stronger than merely having nonzero information. Suppose I’m trying to use a genetic algorithm to create a Halting Tester, and I have a Halting Tester that says that a program doesn’t halt. If I know that the program does, in fact, not halt after n steps (by simply running the program for n steps), that provides nonzero information about the efficacy of my Halting Tester. This suggests that I could create a genetic algorithm for creating Halting Testers (obviously, I couldn’t evolve a perfect Halting Tester, but perhaps I could evolve one that is “good enough”, given some standard). And who knows, maybe if I had such a genetic algorithm, not only would my Halting Testers evolve better Halting Testing, but since they are competing against each other, they would evolve better Tricking Other Halting Testers, and maybe that would eventually spawn AGI. I don’t find that inconceivable.
Are you referring to the biological evolution of humans, or stuff like this?
Right; how did you interpret “some sense of what is more or less likely to pass the test”?
I was referring to the biological evolution of humans; in your link, the process appears to have been designed with the Turing test in mind.
There’s probably going to be a lot of guesswork as for as what metrics for “more likely to pass” are best, but the process doesn’t have to be perfect, just good enough to generate intelligence. Obvious places to start would be complex games such as Go and poker, and replicating aspects of human evolution, such as simulating hunting and social maneuvering.
Ok. When I said “you,” I meant modern humans operating on modern programming languages. I also don’t think it’s quite correct to equate actual historical evolution and genetic algorithms, for somewhat subtle technical reasons.
My estimate is 80% prediction, with the rest evaluation and tree pruning.