If building an engineered intelligent agent is likely to be easier, then it is also likely to come first—since easy things can be done before the skills required to do hard things are mastered.
Way to miss my point. Building it might be easier, but that only matters if understanding what to build is easy enough. An abacus is easier to build than stonehenge. Doesn’t mean it came first.
I am sorry to hear you didn’t like my reasons. I can’t articulate all my reasoning in a short space of time—but those are some of the basics: the relative uselessness of bioinspiration in practice, the ineffectual nature of WBE under construction, and the idea that the complexity of the brain comes mostly from the environment and from within individual cells (i.e. their genome).
I read your essay already.
You still don’t say anything about the difficulty of WBE vs understanding intelligence in detail.
One of my approaches is not to try directly quantifying the difficulty of the two tasks, but rather to compare with other engineering feats. To predict that engineered flight solutions would beat scanning a bird one does not have to quantify the two speculative timelines. One simply observes that engineering almost always beats wholescale scanning.
Flight has some abstract principles that don’t depend on all the messy biological details of cells, bones and feathers. It will—pretty obviously IMO—be much the same for machine intelligence. We have a pretty good idea of what some of those abstract principles are. One is compression. If we had good stream compressors we would be able to predict the future consequences of actions—a key ability in shaping the future. You don’t need to scan a brain to build a compressor. That is a silly approach to the problem that pushes the solution many decades into the future. Compression is “just” another computer science problem—much like searching or sorting.
IMO, the appeal of WBE does not have to do with technical difficulty. Technically, the idea is dead in the water. It is to do with things like the topic of this thread—a desire to see a human future. Wishful thinking, in other words.
Wishful thinking is not necessarily bad. It can sometimes help create the desired future. However, probably not in this case—reality is just too heavily stacked against the idea.
Flight has some abstract principles that don’t depend on all the messy biological details of cells, bones and feathers. It will—pretty obviously IMO—be much the same for machine intelligence.
I disagree that it is so obvious. Much of what we call “intelligence” in humans and other animals is actually tacit knowledge about a specific environment. This knowledge gradually accumulated over billions of years, and it works due to immodular systems that improved stepwise and had to retain relevant functionality at each step.
This is why you barely think about bipedal walking, and discovered it on your own, but even now, very few people can explain how it works. It’s also why learning, for humans, largely consists of reducing a problem into something for which we have native hardware.
So intelligence, if it means successful, purposeful manipulation of the environment, does rely heavily on the particulars of our bodies, in a way that powered flight does not.
If we had good stream compressors we would be able to predict the future consequences of actions—a key ability in shaping the future. You don’t need to scan a brain to build a compressor. That is a silly approach to the problem that pushes the solution many decades into the future. Compression is “just” another computer science problem—much like searching or sorting.
Yes, it’s another CS problem, but not like searching or sorting. Those are computable, while (general) compression isn’t. Not surprisingly, the optimal intelligence Hutter presents is uncomputable, as is every other method presented in every research paper that purports to be a general intelligence.
Now, you can make approximations to the ideal, perfect compressor, but that inevitably requires making decisions about what parts of the search space can be ignored at low enough cost—which itself requires insight into the structure of the search space, the very thing you were supposed to be automating!
Attempts to reduce intelligence to comression butt up against the same limits that compression does: you can be good at compressing some kinds of data, only if you sacrifice ability to compress other kinds of data.
With that said, if you can make a computable, general compressor that identifies regularities in the environment many orders of magnitude faster than evolution, then you will have made some progress.
Re: “So intelligence, if it means successful, purposeful manipulation of the environment, does rely heavily on the particulars of our bodies, in a way that powered flight does not.”
Natural selection shaped wings for roughly as long as it has shaped brains. They too are an accumulated product of millions of years of ancestral success stories. Information about both is transmitted via the genome. If there is a point of dis-analogy here between wings and brains, it is not obvious.
Okay, let me explain it this way: when people refer to intelligence, a large part of what they have in mind is the knowedge that we (tacitly) have about a specific environment. Therefore, our bodies are highly informative about a large part (though certainly not the entirety!) of what is meant by intelligence.
In contrast, the only commonality with birds that is desired in the goal “powered human flight” is … the flight thing. Birds have a solution, but they do not define the solution.
In both cases, I agree, the solution afforded by the biological system (bird or human) is not strictly necessary for the goal (flight or intelligence). And I agree that once certain insights are achieved (the workings of aerodynamic lift or the tacit knowledge humans have [such as the assumptions used in interpreting retinal images]), they can be implemented differently from how the biological system does it.
However, for a robot to match the utility of a human e.g. butler, it must know things specific to humans (like what the meanings of words are, given a particular social context), not just intelligence-related things in general, like how to infer causal maps from raw data.
Then why should I care about intelligence by that definition? I want something that performs well in environments humans will want it to perform well in. That’s a tiny, tiny fraction of the set of all computable environments.
A universal intelligent agent should also perform very well in many real world environments. That is part the beauty of the idea of universal intelligence. A powerful universal intelligence can be reasonably expected to invent nanotechnology, fusion, cure cancer, and generally solve many of the world’s problems.
Also, my point is that, yes, something impossibly good could do that. And that would be good. But performing well across all computable universes (with a sorta-short description, etc.) has costs, and one cost is optimality in this universe.
Since we have to choose, I want it optimal for this universe, for purposes we deem good.
A general agent is often sub-optimal on particular problems. However, it should be able to pick them up pretty quick. Plus, it is a general agent, with all kinds of uses.
A lot of people are interested in building generally intelligent agents. We ourselves are highly general agents—i.e. you can pay us to solve an enormous range of different problems.
Generality of intelligence does not imply lack-of-adaptedness to some particular environment. What it means is more that it can potentially handle a broad range of problems. Specialized agents—on the other hand—fail completely on problems outside their domain.
Re: “Attempts to reduce intelligence to comression butt up against the same limits that compression does: you can be good at compressing some kinds of data, only if you sacrifice ability to compress other kinds of data.”
That is not a meaningful limitation. There are general purpose universal compressors. It is part of the structure of reality that sequences generated by short programs are more commonly observed. That’s part of the point of using a compressor—it is an automated way of applying Occam’s razor.
That is not a meaningful limitation. There are general purpose universal compressors.
There are frequently useful general purpose compressors that work by anticipating the most common regularities in the set of files typically generated by humans. But they do not, and cannot, iterate through all the short programs that could have generated the data—it’s too time-consuming.
The point was that general purpose compression is possible. Yes, you sacrifice the ability to compress other kinds of data—but those other kinds of data are highly incompressible and close to random—not the kind of data which most intelligent agents are interested in finding patterns in in the first place.
Yes, you sacrifice the ability to compress other kinds of data—but those other kinds of data are highly incompressible and close to random.
No, they look random and incompressible because effective compression algorithms optimized for this universe can’t compress them. But algorithms optimized for other computable universes may regard them as normal and have a good way to compress them.
Which kinds of data (from computable processes) are likely to be observed in this universe? Ay, there’s the rub.
Compressing sequences from this universe is good enough for me.
Except that the problem you were attacking at the beginning of this thread was general intelligence, which you claimed to be solvable just by good enough compression, but that requires knowing which parts of the search space in this universe are unlikely, which you haven’t shown how to algorithmize.
“Which kinds of data (from computable processes) are likely to be observed in this universe? Ay, there’s the rub.”
Not really—there are well-known results about that—see: …
Yes, but as I keep trying to say, those results are far from enough to get something workable, and it’s not the methodology behind general compression programs.
Arithmetic compression, Huffman compression, Lempel-Ziv compression, etc are all excellent at compressing sequences produced by small programs. Things like:
Those compressors (crudely) implement a computable approximation of Solomonoff induction without iterating through programs that generate the output. How they work is not very relevant here—the point is that they act as general-purpose compressors—and compress a great range of real world data types.
The complaint that we don’t know what types of data are in the universe is just not applicable—we do, in fact, know a considerable amount about that—and that is why we can build general purpose compressors.
What’s with complaining that compressors are uncomputable?!? Just let your search through the space of possible programs skip on to the next one whenever you spend more than an hour executing. Then you have a computable compressor. That ignores a few especially tedious and boring areas of the search space—but so what?!? Those areas can be binned with no great loss.
Did you do the math on this one? Even with only 10% of programs caught in a loop, then it would take almost 400 years to get through all programs up to 24 bits long.
We need something faster.
(Do you see now why Hutter hasn’t simply run AIXI with your shortcut?)
Uh, I was giving a computable algorithm, not a rapid one.
But you were implying that the uncomputability is somehow “not a problem” because of a quick fix you gave, when the quick fix actually means waiting at least 400 years—under unrealistically optimistic assumptions.
The objection that compression is uncomputable strategy is a useless one—you just use a computable approximation instead—with no great loss.
Yes, I do use a computable approximation, and my computable approximation has already done the work of identifying the important part of the search space (and the structure thereof).
And that’s the point—compression algorithms haven’t done so, except to the extent that a programmer has fed them the “insights” (known regularities of the search space) in advance. That doesn’t tell you the algorithmic way to find those regularities in the first place.
Re: “But you were implying that the uncomputability is somehow “not a problem”″
That’s right—uncomputability in not a problem—you just use a computable compression algorithm instead.
Re: “And that’s the point—compression algorithms haven’t done so, except to the extent that a programmer has fed them the “insights” (known regularities of the search space) in advance.”
The universe itself exhibits regularities. In particular sequences generated by small automata are found relatively frequently. This principle is known as Occam’s razor. That fact is exploited by general purpose compressors to compress a wide range of different data types—including many never seen before by the programmers.
“But you were implying that the uncomputability is somehow “not a problem”″
That’s right—uncomputability in not a problem—you just use a computable compression algorithm.
You said that it was not a problem with respect to creating superintelligent beings, and I showed that it is.
The universe itself exhibits regularities. …
Yes, it does. But, again, scientists don’t find them by iterating through the set of computable generating functions, starting with the smallest. As I’ve repeatedly emphasized, that takes too long. Which is why you’re wrong to generalize compression as a practical, all-encompassing answer to the problem of intelligence.
This is growing pretty tedious, for me, and probably others :-(
You did not show uncomputability is a problem in that context.
I never claimed iterating through programs was an effective practical means of compression. So it seems as though you are attacking a straw man.
Nor do I claim that compression is “a practical, all-encompassing answer to the problem of intelligence”.
Stream compression is largely what you need if you want to predict the future, or build parsimonious models based on observations. Those are important things that many intelligent agents want to do—but they are not themselves a complete solution to the problem.
What’s with complaining that compressors are uncomputable?!? Just let your search through the space of possible programs skip on to the next one whenever you spend more than an hour executing. Then you have a computable compressor. That ignores a few especially tedious and boring areas of the search space—but so what?!? Those areas can be binned with no great loss.
You also say:
Nor do I claim that compression is “a practical, all-encompassing answer to the problem of intelligence”.
Again, yes you did. Right here. Though you said compression was only one of the abilities needed, you did claim “If we had good stream compressors we would be able to predict the future consequences of actions...” and predicting the future is largely what people would classify as having solved the problem of intelligence.
I disagree with all three of your points. However, because the discussion has already been going on already for so long—and because it is so tedious and low grade for me, I am not going to publicly argue the toss with you any more. Best wishes...
So you think the reason why we can’t build a slow running human level AI today with today’s hardware is not because we don’t know how we should go about it, but because we don’t have sufficiently good compression algorithms (and a couple of other things of a similar nature)? And you don’t think a compressor that can deduce and compress causal relations in the real world well enough to be able to predict the future consequences of any actions a human level AI might take would either have to be an AGI itself or be a lot more impressive than a “mere” AGI?
Everything else i you post is a retreat of points you have already made.
I’d rather say intelligence is a requirement for really good compression, and thus compression can make for a reasonably measurement of a lower bound of intelligence (but not even a particularly good proxy). And you can imagine an intelligent system made up of an intelligent compressor and a bunch of dumb modules. That’s no particularly good reason to think that the best way to develop an intelligent compressor (or even a viable way) is to scale up a dumb compressor.
Since a random list of words has a higher entropy than a list of grammatically correct sentences, which in turn has a higher entropy than intelligible text for the best possible compression of Wikipedia the compressor would have to understand English and enough of the content to exploit semantic redundancy, so the theoretically ideal algorithm would have to be intelligent. But that doesn’t mean that an algorithm that does better in the Hutter test than another automatically is more intelligent in that way. As far as I understand the current algorithms haven’t even gone all that far in exploiting the redundancy of grammar, in that a text generated to be statistically indistinguishable from a grammatical text for those compression algorithms wouldn’t have to be grammatical at all, and there seems no reason to believe the current methods would scale up to the theoretical ideal. Nor does there seem to be any reason to think that compression is a good niche/framework to build a machine that understands English. Machine translation seems better to me and even there methods that try to exploit grammar seem to be out-competed by methods that don’t bother and rely on statistics alone, so even the impressive progress of e. g. google translation doesn’t seem like strong evidence that we are making good progress on the path to a machine that actually understands language. (evidence certainly, but not strong)
TL;DR
Just because tool A would be much improved by a step ” and here magic happens” doesn’t mean that working on improving tool A is a good way to learn magic.
Compression details are probably not too important here.
Compression is to brains what lift is to wings. In both cases, people could see that there’s an abstract principle at work—without necessarily knowing how best to implement it. In both cases people considered a range of solutions—with varying degrees of bioinspiration.
There are some areas where we scan. Pictures, movies, audio, etc. However, we didn’t scan bird wings, we didn’t scan to make solar power, or submarines, or cars, or memories. Look into this issue a bit, and I think most reasonable people will put machine intelligence into the “not scanned” category. We already have a mountain of machine intelligence in the world. None of it was made by scanning.
Compression details are probably not too important here.
And yet you build the entire case for assigning a greater than 90% confidence on the unproven assertion that compression is the core principle of intelligence—the only argument you make that even addresses the main reason for considering WBE at all.
That’s no particularly good reason to think that the best way to develop an intelligent compressor (or even a viable way) is to scale up a dumb compressor.
Compression is a white bear approach to AGI.
Tolstoy recounts that as a boy, his eldest brother promised him and his siblings that their wishes would come true provided that, among other things, they could stand in a corner and not think about a white bear. The compression approach to AGI seems to be the same sort of enterprise: if we just work on mere, mundane, compression algorithms and not think about AGI, then we’ll achieve it.
Do you mean “avoiding being overwhelmed by the magnitude of the problem as a whole and making steady progress in small steps” or “substituting wishful thinking for thinking about the problem”, or something else?
″Solomonoff’s model of induction rapidly learns to make optimal predictions for any computable sequence, including probabilistic ones. It neatly brings together the philosophical principles of Occam’s razor, Epicurus’ principle of multiple explanations, Bayes theorem and Turing’s model of universal computation into a theoretical sequence predictor with astonishingly powerful properties.″
It is hard to describe the idea that thinking Solomonoff induction bears on machine intelligence as “wishful thinking”. Prediction is useful and important—and this is basically how you do it.
“Indeed the problem of sequence prediction could well be considered solved, if it were not for the fact that Solomonoff’s theoretical model is incomputable.”
and:
“Could there exist elegant computable prediction algorithms that are in some sense universal? Unfortunately this is impossible, as pointed out by Dawid.”
and:
“We then prove that some sequences, however, can only be predicted by very complex predictors. This implies that very general prediction algorithms, in particular those that can learn to predict all sequences up to a given Kolmogorov complex[ity], must themselves be complex. This puts an end to our hope of there being an extremely general and yet relatively simple prediction algorithm. We then use this fact to prove that although very powerful prediction algorithms exist, they cannot be mathematically discovered due to Gödel incompleteness. Given how fundamental prediction is to intelligence, this result implies that beyond a moderate level of complexity the development of powerful artificial intelligence algorithms can only be an experimental science.”
While Solomonoff induction is mathematically interesting, the paper itself seems to reject your assessment of it.
Way to miss my point. Building it might be easier, but that only matters if understanding what to build is easy enough. An abacus is easier to build than stonehenge. Doesn’t mean it came first.
I read your essay already. You still don’t say anything about the difficulty of WBE vs understanding intelligence in detail.
Well, that’s just a way of restating the topic.
One of my approaches is not to try directly quantifying the difficulty of the two tasks, but rather to compare with other engineering feats. To predict that engineered flight solutions would beat scanning a bird one does not have to quantify the two speculative timelines. One simply observes that engineering almost always beats wholescale scanning.
Flight has some abstract principles that don’t depend on all the messy biological details of cells, bones and feathers. It will—pretty obviously IMO—be much the same for machine intelligence. We have a pretty good idea of what some of those abstract principles are. One is compression. If we had good stream compressors we would be able to predict the future consequences of actions—a key ability in shaping the future. You don’t need to scan a brain to build a compressor. That is a silly approach to the problem that pushes the solution many decades into the future. Compression is “just” another computer science problem—much like searching or sorting.
IMO, the appeal of WBE does not have to do with technical difficulty. Technically, the idea is dead in the water. It is to do with things like the topic of this thread—a desire to see a human future. Wishful thinking, in other words.
Wishful thinking is not necessarily bad. It can sometimes help create the desired future. However, probably not in this case—reality is just too heavily stacked against the idea.
I disagree that it is so obvious. Much of what we call “intelligence” in humans and other animals is actually tacit knowledge about a specific environment. This knowledge gradually accumulated over billions of years, and it works due to immodular systems that improved stepwise and had to retain relevant functionality at each step.
This is why you barely think about bipedal walking, and discovered it on your own, but even now, very few people can explain how it works. It’s also why learning, for humans, largely consists of reducing a problem into something for which we have native hardware.
So intelligence, if it means successful, purposeful manipulation of the environment, does rely heavily on the particulars of our bodies, in a way that powered flight does not.
Yes, it’s another CS problem, but not like searching or sorting. Those are computable, while (general) compression isn’t. Not surprisingly, the optimal intelligence Hutter presents is uncomputable, as is every other method presented in every research paper that purports to be a general intelligence.
Now, you can make approximations to the ideal, perfect compressor, but that inevitably requires making decisions about what parts of the search space can be ignored at low enough cost—which itself requires insight into the structure of the search space, the very thing you were supposed to be automating!
Attempts to reduce intelligence to comression butt up against the same limits that compression does: you can be good at compressing some kinds of data, only if you sacrifice ability to compress other kinds of data.
With that said, if you can make a computable, general compressor that identifies regularities in the environment many orders of magnitude faster than evolution, then you will have made some progress.
Re: “So intelligence, if it means successful, purposeful manipulation of the environment, does rely heavily on the particulars of our bodies, in a way that powered flight does not.”
Natural selection shaped wings for roughly as long as it has shaped brains. They too are an accumulated product of millions of years of ancestral success stories. Information about both is transmitted via the genome. If there is a point of dis-analogy here between wings and brains, it is not obvious.
Okay, let me explain it this way: when people refer to intelligence, a large part of what they have in mind is the knowedge that we (tacitly) have about a specific environment. Therefore, our bodies are highly informative about a large part (though certainly not the entirety!) of what is meant by intelligence.
In contrast, the only commonality with birds that is desired in the goal “powered human flight” is … the flight thing. Birds have a solution, but they do not define the solution.
In both cases, I agree, the solution afforded by the biological system (bird or human) is not strictly necessary for the goal (flight or intelligence). And I agree that once certain insights are achieved (the workings of aerodynamic lift or the tacit knowledge humans have [such as the assumptions used in interpreting retinal images]), they can be implemented differently from how the biological system does it.
However, for a robot to match the utility of a human e.g. butler, it must know things specific to humans (like what the meanings of words are, given a particular social context), not just intelligence-related things in general, like how to infer causal maps from raw data.
FWIW, I’m thinking of intelligence this way:
“Intelligence measures an agent’s ability to achieve goals in a wide range of environments.”
http://www.vetta.org/definitions-of-intelligence/
Nothing to do with humans, really.
Then why should I care about intelligence by that definition? I want something that performs well in environments humans will want it to perform well in. That’s a tiny, tiny fraction of the set of all computable environments.
A universal intelligent agent should also perform very well in many real world environments. That is part the beauty of the idea of universal intelligence. A powerful universal intelligence can be reasonably expected to invent nanotechnology, fusion, cure cancer, and generally solve many of the world’s problems.
Oracles for uncomputable problems tend to be like that...
Also, my point is that, yes, something impossibly good could do that. And that would be good. But performing well across all computable universes (with a sorta-short description, etc.) has costs, and one cost is optimality in this universe.
Since we have to choose, I want it optimal for this universe, for purposes we deem good.
A general agent is often sub-optimal on particular problems. However, it should be able to pick them up pretty quick. Plus, it is a general agent, with all kinds of uses.
A lot of people are interested in building generally intelligent agents. We ourselves are highly general agents—i.e. you can pay us to solve an enormous range of different problems.
Generality of intelligence does not imply lack-of-adaptedness to some particular environment. What it means is more that it can potentially handle a broad range of problems. Specialized agents—on the other hand—fail completely on problems outside their domain.
Re: “Attempts to reduce intelligence to comression butt up against the same limits that compression does: you can be good at compressing some kinds of data, only if you sacrifice ability to compress other kinds of data.”
That is not a meaningful limitation. There are general purpose universal compressors. It is part of the structure of reality that sequences generated by short programs are more commonly observed. That’s part of the point of using a compressor—it is an automated way of applying Occam’s razor.
There are frequently useful general purpose compressors that work by anticipating the most common regularities in the set of files typically generated by humans. But they do not, and cannot, iterate through all the short programs that could have generated the data—it’s too time-consuming.
The point was that general purpose compression is possible. Yes, you sacrifice the ability to compress other kinds of data—but those other kinds of data are highly incompressible and close to random—not the kind of data which most intelligent agents are interested in finding patterns in in the first place.
No, they look random and incompressible because effective compression algorithms optimized for this universe can’t compress them. But algorithms optimized for other computable universes may regard them as normal and have a good way to compress them.
Which kinds of data (from computable processes) are likely to be observed in this universe? Ay, there’s the rub.
Re: “they look random and incompressible because effective compression algorithms optimized for this universe can’t compress them”
Compressing sequences from this universe is good enough for me.
Re: “Which kinds of data (from computable processes) are likely to be observed in this universe? Ay, there’s the rub.”
Not really—there are well-known results about that—see:
http://en.wikipedia.org/wiki/Occam’s_razor
http://www.wisegeek.com/what-is-solomonoff-induction.htm
Except that the problem you were attacking at the beginning of this thread was general intelligence, which you claimed to be solvable just by good enough compression, but that requires knowing which parts of the search space in this universe are unlikely, which you haven’t shown how to algorithmize.
Yes, but as I keep trying to say, those results are far from enough to get something workable, and it’s not the methodology behind general compression programs.
Arithmetic compression, Huffman compression, Lempel-Ziv compression, etc are all excellent at compressing sequences produced by small programs. Things like:
1010101010101010 110110110110110110 1011011101111011111
...etc.
Those compressors (crudely) implement a computable approximation of Solomonoff induction without iterating through programs that generate the output. How they work is not very relevant here—the point is that they act as general-purpose compressors—and compress a great range of real world data types.
The complaint that we don’t know what types of data are in the universe is just not applicable—we do, in fact, know a considerable amount about that—and that is why we can build general purpose compressors.
What’s with complaining that compressors are uncomputable?!? Just let your search through the space of possible programs skip on to the next one whenever you spend more than an hour executing. Then you have a computable compressor. That ignores a few especially tedious and boring areas of the search space—but so what?!? Those areas can be binned with no great loss.
Did you do the math on this one? Even with only 10% of programs caught in a loop, then it would take almost 400 years to get through all programs up to 24 bits long.
We need something faster.
(Do you see now why Hutter hasn’t simply run AIXI with your shortcut?)
Of course, in practice many loops can be caught, but combinatorial explosions really does blow any technique out of the water.
Uh, I was giving a computable algorithm, not a rapid one.
The objection that compression is uncomputable strategy is a useless one—you just use a computable approximation instead—with no great loss.
But you were implying that the uncomputability is somehow “not a problem” because of a quick fix you gave, when the quick fix actually means waiting at least 400 years—under unrealistically optimistic assumptions.
Yes, I do use a computable approximation, and my computable approximation has already done the work of identifying the important part of the search space (and the structure thereof).
And that’s the point—compression algorithms haven’t done so, except to the extent that a programmer has fed them the “insights” (known regularities of the search space) in advance. That doesn’t tell you the algorithmic way to find those regularities in the first place.
Re: “But you were implying that the uncomputability is somehow “not a problem”″
That’s right—uncomputability in not a problem—you just use a computable compression algorithm instead.
Re: “And that’s the point—compression algorithms haven’t done so, except to the extent that a programmer has fed them the “insights” (known regularities of the search space) in advance.”
The universe itself exhibits regularities. In particular sequences generated by small automata are found relatively frequently. This principle is known as Occam’s razor. That fact is exploited by general purpose compressors to compress a wide range of different data types—including many never seen before by the programmers.
You said that it was not a problem with respect to creating superintelligent beings, and I showed that it is.
Yes, it does. But, again, scientists don’t find them by iterating through the set of computable generating functions, starting with the smallest. As I’ve repeatedly emphasized, that takes too long. Which is why you’re wrong to generalize compression as a practical, all-encompassing answer to the problem of intelligence.
This is growing pretty tedious, for me, and probably others :-(
You did not show uncomputability is a problem in that context.
I never claimed iterating through programs was an effective practical means of compression. So it seems as though you are attacking a straw man.
Nor do I claim that compression is “a practical, all-encompassing answer to the problem of intelligence”.
Stream compression is largely what you need if you want to predict the future, or build parsimonious models based on observations. Those are important things that many intelligent agents want to do—but they are not themselves a complete solution to the problem.
Just to show the circles I’m going in here:
Right, I showed it is a problem in the context in which you originally brought up compression—as a means to solve the problem of intelligence.
Yes, you did. Right here:
You also say:
Again, yes you did. Right here. Though you said compression was only one of the abilities needed, you did claim “If we had good stream compressors we would be able to predict the future consequences of actions...” and predicting the future is largely what people would classify as having solved the problem of intelligence.
I disagree with all three of your points. However, because the discussion has already been going on already for so long—and because it is so tedious and low grade for me, I am not going to publicly argue the toss with you any more. Best wishes...
Okay, onlookers: please decide which of us (or both, or neither) was engaging the arguments of the other, and comment or vote accordingly.
ETA: Other than timtyler, I mean.
So you think the reason why we can’t build a slow running human level AI today with today’s hardware is not because we don’t know how we should go about it, but because we don’t have sufficiently good compression algorithms (and a couple of other things of a similar nature)? And you don’t think a compressor that can deduce and compress causal relations in the real world well enough to be able to predict the future consequences of any actions a human level AI might take would either have to be an AGI itself or be a lot more impressive than a “mere” AGI?
Everything else i you post is a retreat of points you have already made.
Compression is a key component, yes. See:
http://prize.hutter1.net/
http://marknelson.us/2006/08/24/the-hutter-prize/
We don’t know how we should go about making good compression algorithms—though progress is gradually being made in that area.
Your last question seems to suggest one way of thinking about how closely related the concepts of stream compression and intelligence are.
I’d rather say intelligence is a requirement for really good compression, and thus compression can make for a reasonably measurement of a lower bound of intelligence (but not even a particularly good proxy). And you can imagine an intelligent system made up of an intelligent compressor and a bunch of dumb modules. That’s no particularly good reason to think that the best way to develop an intelligent compressor (or even a viable way) is to scale up a dumb compressor.
Since a random list of words has a higher entropy than a list of grammatically correct sentences, which in turn has a higher entropy than intelligible text for the best possible compression of Wikipedia the compressor would have to understand English and enough of the content to exploit semantic redundancy, so the theoretically ideal algorithm would have to be intelligent. But that doesn’t mean that an algorithm that does better in the Hutter test than another automatically is more intelligent in that way. As far as I understand the current algorithms haven’t even gone all that far in exploiting the redundancy of grammar, in that a text generated to be statistically indistinguishable from a grammatical text for those compression algorithms wouldn’t have to be grammatical at all, and there seems no reason to believe the current methods would scale up to the theoretical ideal. Nor does there seem to be any reason to think that compression is a good niche/framework to build a machine that understands English. Machine translation seems better to me and even there methods that try to exploit grammar seem to be out-competed by methods that don’t bother and rely on statistics alone, so even the impressive progress of e. g. google translation doesn’t seem like strong evidence that we are making good progress on the path to a machine that actually understands language. (evidence certainly, but not strong)
TL;DR Just because tool A would be much improved by a step ” and here magic happens” doesn’t mean that working on improving tool A is a good way to learn magic.
Compression details are probably not too important here.
Compression is to brains what lift is to wings. In both cases, people could see that there’s an abstract principle at work—without necessarily knowing how best to implement it. In both cases people considered a range of solutions—with varying degrees of bioinspiration.
There are some areas where we scan. Pictures, movies, audio, etc. However, we didn’t scan bird wings, we didn’t scan to make solar power, or submarines, or cars, or memories. Look into this issue a bit, and I think most reasonable people will put machine intelligence into the “not scanned” category. We already have a mountain of machine intelligence in the world. None of it was made by scanning.
And yet you build the entire case for assigning a greater than 90% confidence on the unproven assertion that compression is the core principle of intelligence—the only argument you make that even addresses the main reason for considering WBE at all.
Compression is a white bear approach to AGI.
Tolstoy recounts that as a boy, his eldest brother promised him and his siblings that their wishes would come true provided that, among other things, they could stand in a corner and not think about a white bear. The compression approach to AGI seems to be the same sort of enterprise: if we just work on mere, mundane, compression algorithms and not think about AGI, then we’ll achieve it.
Do you mean “avoiding being overwhelmed by the magnitude of the problem as a whole and making steady progress in small steps” or “substituting wishful thinking for thinking about the problem”, or something else?
Using wishful thinking to avoid the magnitude of the problem.
This is Solomonoff induction:
″Solomonoff’s model of induction rapidly learns to make optimal predictions for any computable sequence, including probabilistic ones. It neatly brings together the philosophical principles of Occam’s razor, Epicurus’ principle of multiple explanations, Bayes theorem and Turing’s model of universal computation into a theoretical sequence predictor with astonishingly powerful properties.″
http://www.vetta.org/documents/IDSIA-12-06-1.pdf
It is hard to describe the idea that thinking Solomonoff induction bears on machine intelligence as “wishful thinking”. Prediction is useful and important—and this is basically how you do it.
But:
“Indeed the problem of sequence prediction could well be considered solved, if it were not for the fact that Solomonoff’s theoretical model is incomputable.”
and:
“Could there exist elegant computable prediction algorithms that are in some sense universal? Unfortunately this is impossible, as pointed out by Dawid.”
and:
“We then prove that some sequences, however, can only be predicted by very complex predictors. This implies that very general prediction algorithms, in particular those that can learn to predict all sequences up to a given Kolmogorov complex[ity], must themselves be complex. This puts an end to our hope of there being an extremely general and yet relatively simple prediction algorithm. We then use this fact to prove that although very powerful prediction algorithms exist, they cannot be mathematically discovered due to Gödel incompleteness. Given how fundamental prediction is to intelligence, this result implies that beyond a moderate level of complexity the development of powerful artificial intelligence algorithms can only be an experimental science.”
While Solomonoff induction is mathematically interesting, the paper itself seems to reject your assessment of it.
Not at all! I have no quarrel whatsoever with any of that (except some minor quibbles about the distinction between “math” and “science”).
I suspect you are not properly weighing the term “elegant” in the second quotation.
The paper is actually arguing that sufficiently comprehensive universal prediction algorithms are necessarily large and complex. Just so.