Complexity and Intelligence
Followup to: Building Something Smarter, Say Not “Complexity”, That Alien Message
One of the Godel-inspired challenges to the idea of self-improving minds is based on the notion of “complexity”.
Now “complexity”, as I’ve previously mentioned, is a dangerous sort of word. “Complexity” conjures up the image of a huge machine with incomprehensibly many gears inside—an impressive sort of image. Thanks to this impressiveness, “complexity” sounds like it could be explaining all sorts of things—that all sorts of phenomena could be happening because of “complexity”.
It so happens that “complexity” also names another meaning, strict and mathematical: the Kolmogorov complexity of a pattern is the size of the program code of the shortest Turing machine that produces the pattern as an output, given unlimited tape as working memory.
I immediately note that this mathematical meaning, is not the same as that intuitive image that comes to mind when you say “complexity”. The vast impressive-looking collection of wheels and gears? That’s not what the math term means.
Suppose you ran a Turing machine with unlimited tape, so that, starting from our laws of physics, it simulated our whole universe—not just the region of space we see around us, but all regions of space and all quantum branches. (There’s strong indications our universe may be effectively discrete, but if not, just calculate it out to 3^^^3 digits of precision.)
Then the “Kolmogorov complexity” of that entire universe—throughout all of space and all of time, from the Big Bang to whatever end, and all the life forms that ever evolved on Earth and all the decoherent branches of Earth and all the life-bearing planets anywhere, and all the intelligences that ever devised galactic civilizations, and all the art and all the technology and every machine ever built by those civilizations...
...would be 500 bits, or whatever the size of the true laws of physics when written out as equations on a sheet of paper.
The Kolmogorov complexity of just a single planet, like Earth, would of course be much higher than the “complexity” of the entire universe that contains it.
“Eh?” you say. “What’s this?” you say. “How can a single planet contain more wheels and gears, more complexity, than the whole vast turning universe that embeds it? Wouldn’t one planet contain fewer books, fewer brains, fewer species?”
But the new meaning that certain mathematicians have formulated and attached to the English word “complexity”, is not like the visually impressive complexity of a confusing system of wheels and gears.
It is, rather, the size of the smallest Turing machine that unfolds into a certain pattern.
If you want to print out the entire universe from the beginning of time to the end, you only need to specify the laws of physics.
If you want to print out just Earth by itself, then it’s not enough to specify the laws of physics. You also have to point to just Earth within the universe. You have to narrow it down somehow. And, in a big universe, that’s going to take a lot of information. It’s like the difference between giving directions to a city, and giving directions for finding a single grain of sand on one beach of one city. Indeed, with all those quantum branches existing side-by-side, it’s going to take around as much information to find Earth in the universe, as to just directly specify the positions of all the atoms.
Kolmogorov complexity is the sense at which we zoom into the endless swirls of the Mandelbrot fractal and think, not “How complicated”, but rather, “How simple”, because the Mandelbrot set is defined using a very simple equation. But if you wanted to find a subset of the Mandelbrot set, one particular swirl, you would have to specify where to look, and that would take more bits.
That’s why we use the Kolmogorov complexity in Occam’s Razor to determine how “complicated” or “simple” something is. So that, when we think of the entire universe, all the stars we can see and all the implied stars we can’t see, and hypothesize laws of physics standing behind it, we will think “what a simple universe” and not “what a complicated universe”—just like looking into the Mandelbrot fractal and thinking “how simple”. We could never accept a theory of physics as probably true, or even remotely possible, if it got an Occam penalty the size of the universe.
As a logical consequence of the way that Kolmogorov complexity has been defined, no closed system can ever increase in Kolmogorov complexity. (At least, no closed system without a ‘true randomness’ source.) A program can pattern ever more interacting wheels and gears into its RAM, but nothing it does from within itself can increase “the size of the smallest computer program that unfolds into it”, by definition.
Suppose, for example, that you had a computer program that defined a synthetic biology and a synthetic gene system. And this computer program ran through an artificial evolution that simulated 10^44 organisms (which is one estimate for the number of creatures who have ever lived on Earth), and subjected them to some kind of competition. And finally, after some predefined number of generations, it selected the highest-scoring organism, and printed it out. In an intuitive sense, you would (expect to) say that the best organisms on each round were getting more complicated, their biology more intricate, as the artificial biosystem evolved. But from the standpoint of Kolmogorov complexity, the final organism, even after a trillion years, is no more “complex” than the program code it takes to specify the tournament and the criterion of competition. The organism that wins is implicit in the specification of the tournament, so it can’t be any more “complex” than that.
But if, on the other hand, you reached into the biology and made a few million random changes here and there, the Kolmogorov complexity of the whole system would shoot way up: anyone wanting to specify it exactly, would have to describe the random changes that you made.
I specify “random” changes, because if you made the changes with beneficence aforethought—according to some criterion of goodness—then I could just talk about the compact criterion you used to make the changes. Only random information is incompressible on average, so you have to make purposeless changes if you want to increase the Kolmogorov complexity as fast as possible.
So! As you’ve probably guessed, the argument against self-improvement is that since closed systems cannot increase their “complexity”, the AI must look out upon the world, demanding a high-bandwidth sensory feed, in order to grow up.
If, that is, you believe that “increasing Kolmogorov complexity” is prerequisite to increasing real-world effectiveness.
(We will dispense with the idea that if system A builds system B, then system A must be “by definition” as smart as system B. This is the “Einstein’s mother must have been one heck of a physicist” sophistry. Even if a future planetary ecology is in some sense “implicit” in a single self-replicating RNA strand in some tidal pool, the ecology is a lot more impressive in a real-world sense: in a given period of time it can exert larger optimization pressures and do more neat stuff.)
Now, how does one argue that “increasing Kolmogorov complexity” has something to do with increasing intelligence? Especially when small machines can unfold into whole universes, and the maximum Kolmogorov complexity is realized by random noise?
One of the other things that a closed computable system provably can’t do, is solve the general halting problem—the problem of telling whether any Turing machine halts.
A similar proof shows that, if you take some given solver, and consider the maximum size bound such that the solver can solve the halting problem for all machines of that size or less, then this omniscience is bounded by at most the solver’s own complexity plus a small constant.
So… isn’t increasing your Kolmogorov complexity through outside sensory inputs, the key to learning to solve the halting problem for ever-larger systems?
And doesn’t this show that no closed system can “self-improve”?
In a word, no.
I mean, if you were to try to write it out as logic, you’d find that one of the steps involves saying, “If you can solve all systems of complexity N, you must be of complexity greater than N (maybe minus a small constant, depending on the details of the proof). Therefore, by increasing your complexity, you increase the range of systems you can solve.” This is formally a non-sequitur.
It’s also a non-sequitur in practice.
I mean, sure, if we’re not dealing with a closed system, you can’t prove that it won’t solve the halting problem. You could be looking at an external bright light in the sky that flashes on or off to reveal the halting solution.
But unless you already have that kind of mathematical ability yourself, you won’t know just from looking at the light that it’s giving you true solutions to the halting problem. You must have just been constructed with faith in the light, and the light must just happen to work.
(And in any computable universe, any light in the sky that you see won’t happen to solve the halting problem.)
It’s not easy for “sensory information” to give you justified new mathematical knowledge that you could not in principle obtain with your eyes closed.
It’s not a matter, for example, of seeing written in the sky a brilliant proof, that you would never have thought of on your own. A closed system with infinite RAM can close its eyes, and write out every possible sensory experience it could have, along with its own reactions to them, that could occur within some bounded number of steps. Doing this does not increase its Kolmogorov complexity.
So the notion can’t be that the environment tells you something that you recognize as a proof, but didn’t think of on your own. Somehow, having that sensory experience in particular, has to increase your mathematical ability even after you perfectly imagined that experience and your own reaction to it in advance.
Could it be the healing powers of having a larger universe to live in, or other people to talk to? But you can simulate incredibly large universes—vastly larger than anything we see in our telescopes, up-arrow large—within a closed system without increasing its Kolmogorov complexity. Within that simulation you could peek at people watching the stars, and peek at people interacting with each other, and plagiarize the books they wrote about the deep wisdom that comes from being embedded in a larger world.
What justified novel mathematical knowledge—about the halting problem in particular—could you gain from a sensory experience, that you could not gain from perfectly imagining that sensory experience and your own reaction to it, nor gain from peeking in on a simulated universe that included someone having that sensory experience?
Well, you can always suppose that you were born trusting the light in the sky, and the light in the sky always happens to tell the truth.
But what’s actually going on is that the non-sequitur is coming back to bite: Increasing your Kolmogorov complexity doesn’t necessarily increase your ability to solve math problems. Swallowing a bucket of random bits will increase your Kolmogorov complexity too.
You aren’t likely to learn any mathematics by gazing up at the external stars, that you couldn’t learn from “navel-gazing” into an equally large simulated universe. Looking at the actual stars around you is good for figuring out where in the universe you are (the extra information that specifies your location) but not much good for learning new math that you couldn’t learn by navel-gazing into a simulation as large as our universe.
In fact, it’s just bloody hard to fundamentally increase your ability to solve math problems in a way that “no closed system can do” just by opening the system. So far as I can tell, it basically requires that the environment be magic and that you be born with faith in this fact.
Saying that a 500-state Turing machine might be able to solve all problems up to at most 500 states plus a small constant, is misleading. That’s an upper bound, not a lower bound, and it comes from having a constructive way to build a specific unsolvable Turing machine out of the solver. In reality, you’d expect a 500-state Turing machine to get nowhere near solving the halting problem up to 500. I would drop dead of shock if there were a 500-state Turing machine that solved the halting problem for all the Turing machines up to 50 states. The vast majority of 500-state Turing machines that implement something that looks like a “halting problem solver” will go nowhere near 500 states (but see this comment).
Suppose you write a relatively short Turing machine that, by virtue of its unlimited working memory, creates an entire universe containing googols or up-arrows of atoms...
...and within this universe, life emerges on some planet-equivalent, and evolves, and develops intelligence, and devises science to study its ecosphere and its universe, and builds computers, and goes out into space and investigates the various physical systems that have formed, and perhaps encounters other life-forms...
...and over the course of trillions or up-arrows of years, a transgalactic intrauniversal economy develops, with conduits conducting information from one end of the universe to another (because you didn’t pick a physics with inconvenient lightspeed limits), a superWeb of hyperintelligences all exchanging information...
...and finally—after a long, long time—your computer program blinks a giant message across the universe, containing a problem to be solved and a specification of how to answer back, and threatening to destroy their universe if the answer is wrong...
...then this intrauniversal civilization—and everything it’s ever learned by theory or experiment over the last up-arrow years—is said to contain 400 bits of complexity, or however long the original program was.
But where did it get its math powers, from inside the closed system?
If we trace back the origins of the hypergalactic civilization, then every belief it ever adopted about math, came from updating on some observed event. That might be a star exploding, or it might be the output of a calculator, or it might be an observed event within some mind’s brain… but in all cases, the update will occur because of a logic that says, “If I see this, it means that”. Before you can learn, you must have the prior that structures learning. If you see something that makes you decide to change the way you learn, then you must believe that seeing X implies you should learn a different way Y. That’s how it would be for superintelligences, I expect.
If you keep tracing back through that simulated universe, you arrive at something before the dawn of superintelligence—the first intelligent beings, produced by evolution. These first minds are the ones who’ll look at Peano Arithmetic and think, “This has never produced a contradiction, so it probably never will—I’ll go ahead and program that into my AI.” These first minds are the ones who’ll think, “Induction seems like it works pretty well, but how do I formalize the notion of induction?” And these first minds are the ones who’ll think, “If I build a self-improving AI, how should it update itself—including changing its own updating process—from the results of observation?”
And how did the first minds get the ability to think those thoughts? From natural selection, that generated the adaptations that executed to think all those thoughts, using the simple evolutionary rule: “keep what works”.
And in turn, natural selection in this universe popped out of the laws of physics.
So everything that this hypergalactic civilization ever believes about math, is really just induction in one form or another. All the evolved adaptations that do induction, produced by inductive natural selection; and all the generalizations made from experience, including generalizations about how to form better generalizations. It would all just unfold out of the inductive principle...
...running in a box sealed as tightly shut as our own universe appears to be.
And I don’t see how we, in our own closed universe, are going to do any better. Even if we have the ability to look up at the stars, it’s not going to give us the ability to go outside that inductive chain to obtain justified mathematical beliefs.
If you wrote that 400-bit simulated universe over the course of a couple of months using human-level intelligence and some mysterious source of unlimited computing power, then you are much more complex than that hypergalactic civilization. You take much more than 400 bits to find within the space of possibilities, because you are only one particular brain.
But y’know, I think that your mind, and the up-arrow mind of that inconceivable civilization, would still be worth distinguishing as Powers. Even if you can figure out how to ask them questions. And even if you’re asking them questions by running an internal simulation, which makes it all part of your own “complexity” as defined in the math.
To locate a up-arrow-sized mind within an up-arrow-sized civilization, would require up-arrow bits—even if the entire civilization unfolded out of a 400-bit machine as compact as our own laws of physics. But which would be more powerful, that one “complex” mind, or the “simple” civilization it was part of?
None of this violates Godelian limitations. You can transmit to the hypergalactic civilization a similar Turing machine to the one that built it, and ask it how that Turing machine behaves. If you can fold a hypergalactic civilization into a 400-bit Turing machine, then even a hypergalactic civilization can confront questions about the behavior of 400-bit Turing machines that are real stumpers.
And 400 bits is going to be an overestimate. I bet there’s at least one up-arrow-sized hypergalactic civilization folded into a halting Turing machine with 15 states, or something like that. If that seems unreasonable, you are not acquainted with the Busy-Beaver problem.
You can get a hell of a lot of mathematical ability out of small Turing machines that unfold into pangalactic hypercivilizations. But unfortunately, there are other small Turing machines that are hellishly difficult problems—perhaps unfolding into hypercivilizations themselves, or things even less comprehensible. So even the tremendous mathematical minds that can unfold out of small Turing machines, won’t be able to solve all Turing machines up to a larger size bound. Hence no Godelian contradiction.
(I wonder: If humanity unfolded into a future civilization of infinite space and infinite time, creating descendants and hyperdescendants of unlimitedly growing size, what would be the largest Busy Beaver number ever agreed upon? 15? Maybe even as large as 20? Surely not 100 - you could encode a civilization of similar origins and equivalent power into a smaller Turing machine than that.)
Olie Lamb said: “I don’t see anything good about complexity. There’s nothing artful about complexity. There’s nothing mystical about complexity. It’s just complex.” This is true even when you’re talking about wheels and gears, never mind Kolmogorov complexity. It’s simplicity, not complexity, that is the admirable virtue.
The real force behind this whole debate is that the word “complexity” sounds impressive and can act as an explanation for anything you don’t understand. Then the word gets captured by a mathematical property that’s spelled using the same letters, which happens to be provably constant for closed systems.
That, I think, is where the argument really comes from, as it rationalizes the even more primitive intuition of some blind helpless thing in a box.
This argument is promulgated even by some people who can write proofs about complexity—but frankly, I do not think they have picked up the habit of visualizing what their math symbols mean in real life. That thing Richard Feynman did, where he visualized two balls turning colors or growing hairs? I don’t think they do that. On the last step of interpretation, they just make a quick appeal to the sound of the words.
But I will agree that, if the laws we know are true, then a self-improving mind which lacks sensory inputs, shouldn’t be able to improve its mathematical abilities beyond the level of a up-arrow-sized civilization—for example, it shouldn’t be able to solve Busy-Beaver(100).
It might perhaps be more limited than this in mere practice, if it’s just running on a laptop computer or something. But if theoretical mathematical arguments about closed systems show anything, that is what they show.
- The Solomonoff prior is malign. It’s not a big deal. by 25 Aug 2022 8:25 UTC; 41 points) (
- Complexity and Intelligence by 3 Nov 2008 20:27 UTC; 33 points) (
- Estimating the kolmogorov complexity of the known laws of physics? by 8 Jul 2013 4:30 UTC; 26 points) (
- The Weighted Majority Algorithm by 12 Nov 2008 23:19 UTC; 23 points) (
- Greg Egan and the Incomprehensible by 19 May 2011 10:38 UTC; 23 points) (
- 23 May 2011 9:28 UTC; 20 points) 's comment on What makes Less Wrong awesome? by (
- Are Magical Categories Relatively Simple? by 14 Apr 2012 20:59 UTC; 5 points) (
- 2 May 2012 19:41 UTC; 5 points) 's comment on The Wonder of Evolution by (
- 23 May 2010 22:29 UTC; 4 points) 's comment on Open Thread: May 2010, Part 2 by (
- [SEQ RERUN] Complexity and Intelligence by 15 Oct 2012 14:51 UTC; 4 points) (
- 26 Jun 2011 12:22 UTC; 4 points) 's comment on The Kolmogorov complexity of a superintelligence by (
- 14 Feb 2016 22:05 UTC; 4 points) 's comment on Does Kolmogorov complexity imply a bound on self-improving AI? by (
- 2 Feb 2010 19:36 UTC; 4 points) 's comment on Rationality Quotes: February 2010 by (
- 15 Feb 2016 23:32 UTC; 1 point) 's comment on Does Kolmogorov complexity imply a bound on self-improving AI? by (
- 6 Feb 2010 6:02 UTC; 1 point) 's comment on Rationality Quotes: February 2010 by (
- 18 May 2011 18:37 UTC; 1 point) 's comment on The greater a technology’s complexity, the more slowly it improves? by (
- 13 May 2012 21:33 UTC; 0 points) 's comment on Thoughts on the Singularity Institute (SI) by (
- The Critical Rationalist View on Artificial Intelligence by 6 Dec 2017 17:26 UTC; 0 points) (
- 7 Jan 2012 16:48 UTC; 0 points) 's comment on Welcome to Less Wrong! by (
- 11 Feb 2012 12:56 UTC; 0 points) 's comment on [SEQ RERUN] “Can’t Say No” Spending by (
- 29 Dec 2011 0:02 UTC; 0 points) 's comment on Q&A with Michael Littman on risks from AI by (
- 13 Dec 2008 15:29 UTC; 0 points) 's comment on What I Think, If Not Why by (
- 17 Nov 2008 12:09 UTC; -6 points) 's comment on The Nature of Logic by (
Can you give an example of the kind of argument that you are trying to refute here?
The argument:
“a (real-world) agent such as a human or AI cannot self improve because the K-complexity of a closed system is constant”
seems obviously silly. A human or AI is not a closed system, it receives sensory inputs, and no-one would argue that self-improvement has to happen without any sensory inputs.
Aside: I would caution, echoing Eliezer, that theoretical results about what is computable and what isn’t, and about Turing machines with infinite memory and how many 1′s they can output before they halt are hard to apply to the real world without making subtle errors causing you to falsely pump your intuition. The real world problems that we face rely upon finite machines operating in finite time.
“Then the “Kolmogorov complexity” of that entire universe—throughout all of space and all of time, from the Big Bang to whatever end, and all the life forms that ever evolved on Earth and all the decoherent branches of Earth and all the life-bearing planets anywhere, and all the intelligences that ever devised galactic civilizations, and all the art and all the technology and every machine ever built by those civilizations...would be 500 bits.”
I must be missing something… what exactly do you mean by the phrase “the ‘complexity’ of that entire universe… would be 500 bits.”
I would think the “complexity of the universe” would include the location of every bit of matter at all times, among other things. So, maybe I just don’t understand what you mean by the “complexity of” something.
Also, I’m not sure whether you believe that all human behavior is purely deterministic in the most extreme sense. Or, do you think people can actually make their own decisions, i.e., “free will”? Obviously, if someone can turn either left or right at any given time, then that decision is still not explained (in any predictive sense) by the laws of physics.
Wouldn’t the 500 bits have to include understanding all psychological/economic behavior?
Roko: here.
Henry: Kolmogorov complexity.
He means 500 bits plus log2 the size of the initial conditions, plus log2 the number of instants elapsed since then—which is the number of bits you would need to specify a current multiverse state under the specified assumptions.
Tim, I already specified I was talking about the whole universe throughout time and space (and inflation and decoherence). The laws probably don’t have any initial conditions under that specification, even if different things happen in different places.
Henry, follow the link to the Mandelbrot set, then look up how the Mandelbrot set is computed (a single equation).
You need tostate and justify your assumptions, since it is possible to get just about any answer fro 0 to infinity to the question “how much information does the universe contain”, depending on choice of theory.
You’re right about what you said about the instants. However, we don’t know how big the initial conditions were—and I’m not clear on what it would mean for them not to exist. Maybe you are talking about eternal return? Even then there’s a specification of which path you are on.
@Nick Tarleton:
Thanks, Nick. I can see I’m going to have to spend an exciting evening or 10 alone with the SL4 archives...
Perhaps one could consider Eliezer’s OB posts as an “edited and sanitized summary of the morass of SL4”?
Show us simple laws which predict how many galaxies worth of atoms there would be in a universe (why not one?), and I will take 500 bits for generating the universe somewhat seriously...
Suppose, for example, that you had a computer program that defined a synthetic biology and a synthetic gene system. And this computer program ran through an artificial evolution that simulated 10^44 organisms (which is one estimate for the number of creatures who have ever lived on Earth), and subjected them to some kind of competition. And finally, after some predefined number of generations, it selected the highest-scoring organism, and printed it out. In an intuitive sense, you would (expect to) say that the best organisms on each round were getting more complicated, their biology more intricate, as the artificial biosystem evolved. But from the standpoint of Kolmogorov complexity, the final organism, even after a trillion years, is no more “complex” than the program code it takes to specify the tournament and the criterion of competition. The organism that wins is implicit in the specification of the tournament, so it can’t be any more “complex” than that.
and later: …then this intrauniversal civilization—and everything it’s ever learned by theory or experiment over the last up-arrow years—is said to contain 400 bits of complexity, or however long the original program was.
“You say it as if it were a good thing.”
But this seems, on the contrary, to indicate that Kolmogorov complexity might not be a good candidate for the formalized notion of intuitive complexity, including the one used by Occam’s Razor. The fact that it often can be anti-intuitive doesn’t necessarily mean that it’s our intuition of what is simple and what is complicated that must change; the mere fact that the notion of Kolmogorov complexity is itself very simple and easy to play with mathematically doesn’t by itself entitle it to anything.
There exists a proof of Fermat’s Last Theorem with very small Kolmogorov complexity, much smaller than, say, the size of this post of yours in characters, or the Kolmogorov complexity of some well-known formal proof of a basic theorem in calculus. Does that mean that this proof of FLT is a much simpler object than these others? It does if you insist on interpreting “simple” to mean “small Kolmogorov complexity”, but why must you so insist? If you show this proof to a mathematician, he or she won’t say “oh, this is very simple, simpler than basic calculus proofs.” If you then tell the mathematician that in fact this very complex proof was obtained by running this very short Turing machine, they’ll just shrug in response—so what? As a piece of mathematics, the proof is still evidently extremely complicated. Maybe your notion of complexity just isn’t so goo d at capturing that.
The “500 bits” only works if you take a hidden variable or Bohmian position on quantum mechanics. If (as the current consensus would say) non-linear dynamics can amplify quantum noise then enormous amounts of new information are being “produced” locally everywhere all the time. The current state of the universe incorporates much or all of that information. (Someone who understands the debates about black holes and the holographic principle should chime in with more precise analysis.)
I couldn’t follow the whole argument so I’m not sure how this affects it, but given that Eliezer keeps referring to this claim I guess it is important.
Tim: The initial wavefunction may be completely determined by the laws, and under MWI that’s all you need.
Roko: I actually don’t notice much duplication between OB and SL4. It’s definitely worth reading.
Will: Why don’t you take it seriously? You know the Mandelbrot set is generated from very few bits, and I don’t imagine you expect there to be a simple derivation of its structures. Taken literally, your request is highly unreasonable because it involves so many levels of poorly understood processes—inflation, baryogenesis, galaxy formation… and the answer is probably infinite anyway. Still, I wouldn’t be surprised if a better physicist could produce a reasonable first-principles estimate of the number of baryons in the observable universe.
Jed: Or many-worlds.
Will Pearson: Show us simple laws which predict how many galaxies worth of atoms there would be in a universe (why not one?), and I will take 500 bits for generating the universe somewhat seriously...
Been done. It didn’t work out, and Eddington’s theory is now only history, but the example shows that it is not out of the question that there be such laws.
It might be worthwhile to note that cogent critiques of the proposition that a machine intelligence might very suddenly “become a singleton Power” do not deny the inefficacies of the human cognitive architecture offering improvement via recursive introspection and recoding, nor do they deny the improvements easily available via hardware substitution and expansion of more capable hardware and I/O.
The do, however, highlight the distinction between a vastly powerful machine madly exploring vast reaches of a much vaster “up-arrow” space of mathematical complexity, and a machine of the same power bounded in growth of intelligence—by definition necessarily relevant—due to starvation for relevant novelty in its environment of interaction.
If, Feynman-like, we imagine the present state of knowledge about our world in terms of a distribution of vertical domains, like silos, some broader with relevance to many diverse facets of real-world interaction, some thin and towering into the haze of leading-edge mathematical reality, then we can imagine the powerful machine quickly identifying and making a multitude of latent connections and meta-connections, filling in the space between the silos and even somewhat above—but to what extent, given the inevitable diminishing returns among the latent, and the resulting starvation for the novel?
Given such boundedness, speculation is redirected to growth in ecological terms, and the Red Queen’s Race continues ever faster.
Here’s the algorithm: given as input a Turing machine M with less than 400 states,
Compute a number k greater than BB(400).
Simulate M for k steps.
If it halts by then, then output “M halts”. If it doesn’t, then output “M doesn’t halt”.
Correctness proof: If it halts in less than k steps, then obviously it halts. If it doesn’t, then by the definition of BB(400), it never will.
Analysis of number of states: This is possible with 400+C states because you use about 400 states for step 1, and a constant number of overhead states for simulating Turing machines. Because there exist universal Turing machines with less than 25 states, you can arrange for C to be less than 100.
There’s a small amount of subtlety in actually doing step 1. Let me know if you want more detail. However, I believe the overall result and method should be clear; moreover, I hope the result is unsurprising once you see the method. As such, please don’t drop dead of shock.
I thought you had some reason for giving us extra detail, why the number 500. Yeah it could be short as 500 bits, but it could equally be very long.
I will like shorter explanation for the universe if I am presented with one compared to a longer one, but there is no reason to expect one of a certain length. Since we don’t have any that actually manage to do so, why attach any number to it?
I don’t think there is a Theory of Everything for the universe. Just as that there isn’t one for Mathematics.
What if the universe is not continuous, the simulation does lazy evaluation, and the laws of nature can only be described by an infinite number of bit? How would this change your reasoning?
drops dead of shock
Oh… well, at least there’s no reasonable way for a superintelligence that can perform any finite computation, to obtain that particular 500-bit Turing machine, through any amount of internal thinking or induction from sensory experience, without using a reasoning method guaranteed at some point to make a mistake.
I don’t understand. Am I too dumb or is this gibberish?
@pk I don’t understand. Am I too dumb or is this gibberish?
It’s not so complicated; it’s just that we’re so formal...
Kolmogorov complexity of universe is more than just a description of its physical laws.
First of all, besides the physical laws, K should include the initial state of the universe. It may be large or even infinite. And it should be properly expressed in qubits.
Second, you’d need to pick the “present state”, whatever that means, our of all possible future and past states. In a classical universe it may be only a 100bits or so, in a quantum multiverse, it’s at least 1 bit per every branch.
Third, it is not at all obvious that the laws of universe are computable. Maybe it uses an oracle (the multiverse branching process can provide such an oracle) or maybe it uses real values in a non-trivial way. if this is true then we should either say that K is infinite, or use K relative to an oracle.
Fourth, K is defined only up to an additive constant, so it’s not really correct to talk about The Kolmogorov Complexity. You can only talk about Kolmogorov complexity relative to a fixed Turing machine. For different turning machines the difference between the Ks will be roughly equal to the number of bits in a translator between their machine languages. In the extreme case when the reference Turing machine IS the universe then K(universe) is just 0.
In fact, it’s just bloody hard to fundamentally increase your ability to solve math problems in a way that “no closed system can do” just by opening the system. So far as I can tell, it basically requires that the environment be magic and that you be born with faith in this fact.
Eliezer, you’re making an important error here. I don’t think it affects the main argument you’re making in this article (that considerations of “complexity” doesn’t rule out self-improving minds), but this error may have grave consequences elsewhere. The error is that while the environment does have to be magic, you don’t need to have faith in this, just not have faith that it’s impossible.
Suppose you get a hold of a black box that seems to act as a halting-problem oracle. You’ve thrown thousands of problems at it, and have never seen in incorrect or inconsistent answer. What are the possibilities here? Either (A) the environment really is magic (i.e. there is uncomputable physics that enables implementation of actual halting-problem oracles), or (B) the box is just giving random answers that happen to be correct by chance, or (C) you’re part of a simulation where the box is giving all possible combinations of answers and you happen to be in the part of the simulation where the box is giving correct answers. As long as your prior probability for (A) is not zero, as you do more and more tests and keep getting correct answers, it’s posterior probability will eventually dominate (B) and (C).
Why is this so important? Well in standard Solomonoff Induction, the prior for (A) is zero, and if we program that into an AI, it won’t do the right thing in this situation. This may have a large effect on expected utility (of us, people living today), because while the likelihood of us living in an uncomputable universe with halting-problem oracles is low, the utility we gain from correctly recognizing and exploiting such a universe could be huge.
Wei, essentially the black box has to be such that it seems more likely for it to be a halting oracle than something only slightly smarter than you are—since when it says “doesn’t halt”, and you don’t know if it halts or not, it’s not like you can easily check.
Suppose, though, that the black box has the apparent qualities of a closed timelike curve in a universe that permits CTCs to be constructed with perfection (even if our own universe permitted CTCs they would break down well before up-arrow time). Then it would seem pretty plausible, after constructing the CTC and verifying its apparent performance on the cases you knew how to test, that the CTC wasn’t a trickster SI in disguise.
But -
in this case, you can just build a CTC inside yourself, as a closed system, and examine the results with your eyes shut. Your “complexity” won’t change, it’ll just have to be defined outside the Turing formalism.
So once again I say: it is really hard to improve your math abilities with eyes open in a way that you couldn’t theoretically do with eyes closed.
Good point, but when the box says “doesn’t halt”, how do I know it’s correct?
If humanity unfolded into a future civilization of infinite space and infinite time, creating descendants and hyperdescendants of unlimitedly growing size, what would be the largest Busy Beaver number ever agreed upon?
Suppose they run a BB evaluator for all of time. They would, indeed, have no way at any point of being certain that the current champion 100-bit program is the actual champion that produces BB(100). However, if they decide to anthropically reason that “for any time t, I am probably alive after time t, even though I have no direct evidence one way or the other once t becomes too large”, then they will believe (with arbitrarily high probability) that the current champion program is the actual champion program, and an arbitrarily high percentage of them will be correct in their belief.
Boris: There’s a small amount of subtlety in actually doing step 1.
Isn’t it simply impossible? That doesn’t interfere with your claim that such a Turing machine exists, but step 1 claims that it’s computable.
Nick wrote: Good point, but when the box says “doesn’t halt”, how do I know it’s correct?
A halting-problem oracle can be used for all kinds of things besides just checking whether an individual Turing machine will halt or not. For example you can use it to answer various mathematical questions and produce proofs of the answers, and then verify the proofs yourself. You should be able to obtain enough proofs to convince yourself that the black box is not just giving random answers or just being slightly smarter than you are.
If P!=NP, you should be able to convince yourself that the black box has at least exponentially more computational power than you do. So if you are an AI with say the computational resources of a solar system, you should be able to verify that the black box either contains exotic physics or has access to more resources than the rest of the universe put together.
Eliezer wrote: So once again I say: it is really hard to improve your math abilities with eyes open in a way that you couldn’t theoretically do with eyes closed.
It seems to me that an AI should/can never completely rule out the possibility that the universe contains physics that is mathematically more powerful than what it has already incorporated into itself, so it should always keep its eyes open. Even if it has absorbed the entire universe into itself, it might still be living inside a simulation, right?
+1 to Anatoly Vorobey. Using K-complexity to capture the human notion of complexity seems to be even worse than using game-theoretic rationality to capture human rationality—something that’s been attacked to death already.
Boris: There’s a small amount of subtlety in actually doing step 1.
Douglas: Isn’t it simply impossible? That doesn’t interfere with your claim that such a Turing machine exists, but step 1 claims that it’s computable.
It’s impossible to bound BB(n) computably in n, but that leaves open the following question, on which Boris’s argument depends.
Let BBB(n), the Busy Beaver Bound function, be the size of the smallest Turing machine that, starting from a blank tape, prints out an upper bound for BB(n). Boris’s step (1) claims that BBB(n) is bounded by n+c for some constant c. It is not obvious to me either that this is true or that this is false.
Richard: Boris’s step (1) claims that BBB(n) is bounded by n+c for some constant c. It is not obvious to me either that this is true or that this is false. c is the size of a universal Turing machine which, in addition to simulating its argument, keeps track of how many steps it simulates. BB(n) is the maximum number of steps that any halting TM of size n performs, therefore there is some particular TM of size n that performs BB(n) steps. Give that TM as input the the aforementioned UTM, and it will compute BB(n). The uncomputability of BB prevents you from finding the right TM, and from proving it correct if it’s handed to you by a magic light in the sky. But the TM still exists even if you’ll never know which it is.
Eliezer, I’m not quite sure why you are obsessed with the ability to find mathematical truths. You can different sets of mathematical truths if you take the parallel postulate true or not. Which ones are applicable in the real world depends where you are.
Unless you know where you are in the universe can not know what is friendly for you to do (or even useful).
In fact, it’s just bloody hard to fundamentally increase your ability to solve math problems in a way that “no closed system can do” just by opening the system. So far as I can tell, it basically requires that the environment be magic and that you be born with faith in this fact.
As Wei mentioned, P≠NP is basically the conjecture that this isn’t true: i.e., that you can exponentially increase your ability to solve math problems by your environment being magic and your not being born with faith in that fact. So for example, if your environment immediately inverted any one-way function, that would be evidence (no faith required) that your environment is not merely ‘slightly’ smarter than you are, but astoundingly smarter. In qualitative terms, I think it would be almost as astounding as if the environment solved the halting problem.
I would drop dead of shock
Eliezer, just as it was interesting to ask what probability estimate ‘Nuts!’ amounted to, I think it would be very useful for the forum of Overcoming Bias to ask what your implicit probability estimate for a 500 state TM being able to solve the halting problem for all TMs of up to 50 states.
I imagine that ‘I would drop dead of shock’ was intended to convey a probability of less than 1 in 10,000, or maybe 1 in 1,000,000?
@Toby: Why, yes, I was feeling rather grateful at that point that I hadn’t quantified the probability. It’s fair to assume that it would have been low enough that I couldn’t plausibly recover from the calibration hit, like 1% or something.
@Scott: This entire discussion assumes unlimited finite internal computing power, so P and NP cut no dice here. Otherwise, of course a larger environment can outsmart you mathematically.
@Will: Mathematical truths are about which axioms imply which theorems.
@Wei: A halting oracle is usually said to output 1s or 0s, not proofs or halting times, right?
Also @Wei: I don’t recall if I’ve mentioned this before, but Solomonoff induction in the mixture form makes no mention of the truth of its models. It just says that any computable probability distribution is in the mixture somewhere, so you can do as well as any computable form of cognitive uncertainty up to a constant.
In other words, if there’s any computable reaction that you have to discovering what looks like a black box halting solver—any computable reasoning that decides that “this looks like an uncomputable halting solver” and produces new distributions over computably related events as a result—then that’s in the Solomonoff mixture.
Solomonoff is not really as bad as it sounds.
But when it comes to making use of the results to incorporate the halting oracle via self-modification—then Solomonoff blows a fuse, of course, because it was never designed for self-modification in the first place; it’s a Cartesian formalism that puts the universe irrevocably on the outside.
Let me expand my point. Assume you run your hyper civilisation for 1 googol years, you ask it, “Do the angles in a three straight sided shape add up to 180?”
This can be answered yes or no, depending upon what axiom scheme you adopt, cartesian, hyperbolic or something more obscure. Having a system that has thought of all mathematical truths doesn’t get you a Powerful system, unless it knows which apply to solving the problems you ask.
Beautiful post, Eli. Seriously, one of the better—and more useful, at least pedagogically—ones. Should be required reading for, well, just about everyone.
Oops, I mis-stated the result. If we fix a universal Turing machine (equivalently, a programming language), then there exists a constant C so that deciding the halting problem for programs of length less than N can be done by a program of length less than N+C. Pengvado’s solution works here.
Unfortunately, as Richard observed, you can’t do this with Turing machines, essentially because Turing machines can not inspect their own finite control, unlike the stored program on a von Neumann machine. In fact, it appears that my claim—in Richard’s notation, that BBB(N) is N+O(1)---is open. It is known that BBB(N) is O(N), and moreover, a 2002 paper “Improved Bounds for Functions Related to Busy Beavers” by Ben-Amran and Petersen showed that BBB(N) is N+o(N). I haven’t found better results, but on the off-chance that I do, I’ll post another comment here.
Now, because my claim is open, we still have to check if Eliezer’s original intuition (that 500-state Turing machines can’t solve the halting problem for 50-state ones) holds up. Luckily for me, it doesn’t. It’s known that BBB(N) is less than 3N+6 (that’s the O(N) result above), so at worst, there exists a 500-state Turing machine that solves the halting problem for 100-state ones. This is less impressive than what I originally claimed, because it’s off by a multiplicative constant rather than an additive one, but it still does the trick.
What’s the moral here? I think it’s that you really should define Kolmogorov complexity in terms of the number of bits in the shortest program doing whatever you’re analyzing, as is standard, rather than the number of states in a Turing machine. Indeed, this is how Eliezer defines it in this post; furthermore, Eliezer alludes to it when he responds to my post by mentioning a “500-bit Turing machine” [emphasis mine]. Only in the aside beginning “I would drop dead of shock” was the number of states actually mentioned.
Otherwise, of course a larger environment can outsmart you mathematically.
No, not of course. For example, suppose P were equal to PSPACE. Then even though a larger environment could fundamentally outsmart you mathematically (say by solving the halting problem), it couldn’t prove to you that it was doing so. In other words, the situation with polynomial-time computation would be more-or-less the same as it is with unlimited computation: superintelligent machines could only prove their superintelligence to other superintelligent machines.
That the situation with efficient computation appears to be different—i.e., that it appears superintelligent machines can indeed prove their superintelligence to fundamentally dumber machines—is (if true) a profound fact about the world that seems worth calling attention to. Sure, of course you can nullify it by assuming away all complexity considerations, but why? :-)
Scott, while I don’t mean to deflate in any sense the shattering import of your wisdom...
...there’s a pretty large segment of the planetary population upon whom the distinction between “polynomially more computing power” and “exponentially more computing power” would be lost.
These unenlightened fools, if they had computing power N, would be just as impressed by a superintelligence that had N^googol computing power as a superintelligence that had 2^N computing power. They’d just lump both of those together as “the environment is smarter than me, and can prove things I can’t and then exhibit the proof”. Poor bastards.
Um, except that we also don’t know whether there are computations that can be checked in N time but only performed in Ngoogol time. The situation is qualitatively the same as for N versus 2N.
(the superscripts didn’t show up: that was N^googol and 2^N)
Bundling the initial state into the laws of physics seems like a confused way of looking at things to me. The state and the laws are different things—if you muddle them together conceptually, your accounting is likely to get equally muddled. Also, most of the rationale for the size of the “laws of physics” being small evaporates. The initial state might be enormous, for all anyone knows. The laws of physics could be infinite too—but there the hope that they are small and finite seems slightly more reasonable.
Eliezer and Nick: I don’t think “Kolmogorov complexity” and “Mandelbrot set” actually answers what I was trying to understand.
Both of those concepts seem completely apt for describing perfectly deterministic systems. But, in describing the “complexity” of the universe even in something as simple as the “pattern of stars that exists” one would still have to take into account potential non-deterministic factors such as human behavior. Unless you are a strict determinist, you have to allow for the potential for a sufficiently advanced human (or non-human) intelligence to eradicate a particular star, or even a galaxy, or to produce an artificial black hole that alters the pattern of stars.
How are these factors accounted for in a “500 bit” code? Or, are you saying that you are a strict determinist?
Henry: “Both of those concepts seem completely apt for describing perfectly deterministic systems. But, in describing the “complexity” of the universe even in something as simple as the ‘pattern of stars that exists’ one would still have to take into account potential non-deterministic factors such as human behavior. [...] [A]re you saying that you are a strict determinist?”
I’ll take this one. Yes, we’re presuming determinism here, although the determinism of the Many Worlds Interpretation is a little different from the single-world determinism we’re used to thinking about. Also, I notice that in an earlier comment, you spoke of free will and determinism as if the concepts were opposed, but depending on exactly what you mean by free will, this does is not necessarily the case. For Eliezer’s take, see, e.g., “Timeless Control,” or for the philosophical mainstream, google compatibilism.
POSTSCRIPT—Strike that stray does in the penultimate sentence. And re compatibilism, the short version is that no, you don’t have free will, but you shouldn’t feel bad about it.
A halting oracle is usually said to output 1s or 0s, not proofs or halting times, right?
It’s easy to use such an oracle to produce proofs and halting times. The following assumes that the oracle outputs 1 if the input TM halts, and 0 otherwise.
For proofs: Write a program p which on inputs x and i, enumerates all proofs. If it finds a proof for x, and the i-th bit of that proof is 1, then it halts, otherwise it loops forever. Now query the oracle with (p,x,0), (p,x,1), …, and you get a proof for x if it has a proof.
Halting times: Write a program p which on inputs x and i, runs x for i steps. If x halts before i steps, then it halts, otherwise it loops forever. Now query the oracle with (p,x,2), (p,x,4), (p,x,8), …, until you get an output of “1” and then use binary search to get the exact halting time.
I don’t recall if I’ve mentioned this before, but Solomonoff induction in the mixture form makes no mention of the truth of its models. It just says that any computable probability distribution is in the mixture somewhere, so you can do as well as any computable form of cognitive uncertainty up to a constant.
Eliezer, if what you say is true, then it shouldn’t be possible for anyone, using just a Turing machine, to beat Solomonoff Induction at a pure prediction game (by more than a constant), even if the input sequence is uncomputable. But here is a counterexample. Suppose the input sequence consists of the unary encodings of Busy Beaver numbers BB(1), BB(2), BB(3), …, that is, BB(1) number of 1s followed by a zero, then BB(2) number of 1s followed by a 0, and so on. Let’s ask the predictor, after seeing n input symbols, what is the probability that it will eventually see a 0 again, and call this p(n). With Solomonoff Induction, p(n) will approach arbitrarily close to 0 as you increase n. A human mathematician on the other hand will recognize that the input sequence may not be computable and won’t let p(n) fall below some non-zero bound.
I don’t quite see this. With Solomonoff induction, as with a computable human mathematician, the probability of the next symbol being 0 will approach 0. I don’t see why a Solomonoff inductor using mixtures (that is, evaluating computable probability distributions rather than computable sequences) will ever assign a probability arbitrarily close to 0 of seeing another 0, ever.
Ask the human mathematician, over and over, what their probability of the next symbol being 0 is. They’re computable, so this distribution is in the mixture. What other distribution is it necessarily dominated by, in the Solomonoff mixture?
Or are we allowing the human mathematician to have an inconsistent probability distribution where he says “You’ll see another 0 eventually, I’m sure of it, but I’m also pretty sure that no matter how large a number of 1s I pick, it won’t be high enough.” If so, to be fair, we should factor out the symbol for “see another 0 eventually” and just ask the Solomonoff inductor about that separately via some input encoding, the same way we ask the human about it separately.
Good question, Eliezer. If the human mathematician is computable, why isn’t it already incorporated into Solomonoff Induction? It seems to me that the human mathematician does not behave like a Bayesian. Let H be the hypothesis that the input sequence is the unary encodings of Busy Beaver numbers. The mathematician will try to estimate, as best as he can, P(next symbol is 0|H). But when the next symbol turns out to be 1, he doesn’t do a Bayesian update and decrease P(H), but instead says “Ok, so I was wrong. The next Busy Beaver number is bigger than I expected.”
I’m not sure I understand what you wrote after “to be fair”. If you think a Solomonoff inductor can duplicate the above behavior with an alternative setup, can you elaborate how?
@ Wei. Thanks for the response. I will look at the refs but haven’t yet done so. I’m skeptical about whether they’ll change my mind on the subject, but I’ll take a look.
It seems to me that the belief in pure determinism is axiomatic (I’m not even sure what it means to be a Bayesian in a purely deterministic system!), so most of this discussion appears to be pure conjecture. I’m not sure that it has any meaningful application.
@Wei: p(n) will approach arbitrarily close to 0 as you increase n.
This doesn’t seem right. A sequence that requires knowledge of BB(k), has O(2^-k) probability according to our Solomonoff Inductor. If the inductor compares a BB(k)-based model with a BB(k+1)-based model, then BB(k+1) will on average be about half as probable as BB(k).
In other words, P(a particular model of K-complexity k is correct) goes to 0 as k goes to infinity, but the conditional probability, P(a particular model of K-complexity k is correct | a sub-model of that particular model with K-complexity k-1 is correct), does not go to 0 as k goes to infinity.
After further thought, I need to retract my last comment. Consider P(next symbol is 0|H) again, and suppose you’ve seen 100 0′s so far, so essentially you’re trying to predict BB(101). The human mathematician knows that any non-zero number he writes down for this probability would be way too big, unless he resorts to non-constructive notation like 1/BB(101). If you force him to answer “over and over, what their probability of the next symbol being 0 is” and don’t allow him to use notation like 1/BB(101) then he’d be forced to write down an inconsistent probability distribution. But in fact the distribution he has in mind is not computable, and that explains how he can beat Solomonoff Induction.
Rolf, I was implicitly assuming that even knowing BB(k), it still takes O(k) bits to learn BB(k+1). But if this assumption is incorrect, then I need to change the setup of my prediction game so that the input sequence consists of the unary encodings of BB(1), BB(2), BB(4), BB(8), â¦, instead. This shouldnât affect my overall point, I think.
Since our whole universe may be defined by something as frugal as hundreds of bits of complexity then the fact that there’s an upper bound on speed is an interesting thing since it makes locality a very tangible thing, and not just a question of adding some bits of information to define that where in the universe that “locality” is.
“If you want to print out just Earth by itself, then it’s not enough to specify the laws of physics. You also have to point to just Earth within the universe. You have to narrow it down somehow. And, in a big universe, that’s going to take a lot of information.”
Yes. To put it another way: it takes much more than 500 bits of information to predict a lifetime’s experiences. You have to no where you are in the universe/multiverse. The counterintuitive fact is that a set can contain much less information (down to 0 bits) [*] than one of it subsets..and what we see is always a subset.
[*]”A skeptic worries about all the information necessary to specify all those unseen worlds. But an entire ensemble is often much simpler than one of its members. This principle can be stated more formally using the notion of algorithmic information content. The algorithmic information content in a number is, roughly speaking, the length of the shortest computer program that will produce that number as output. For example, consider the set of all integers. Which is simpler, the whole set or just one number? Naively, you might think that a single number is simpler, but the entire set can be generated by quite a trivial computer program, whereas a single number can be hugely long. Therefore, the whole set is actually simpler. Similarly, the set of all solutions to Einstein’s field equations is simpler than a specific solution. The former is described by a few equations, whereas the latter requires the specification of vast amounts of initial data on some hypersurface. The lesson is that complexity increases when we restrict our attention to one particular element in an ensemble, thereby losing the symmetry and simplicity that were inherent in the totality of all the elements taken together. In this sense, the higher-level multiverses are simpler. Going from our universe to the Level I multiverse eliminates the need to specify initial conditions, upgrading to Level II eliminates the need to specify physical constants, and the Level IV multiverse eliminates the need to specify anything at all.”—Tegmark
“produced by inductive natural selection;”
Natural selection is not an inductive process.
In practice, natural selection produces intelligent agents, who can do induction, and then they make selective choices that affect who lives, who dies and who reproduces. What that means is that any idea that evolution (or natural selection) in the real world does not involve induction is not correct.
Nice try Tyler. What individuals “do” does not define what natural selection does.
One could also say: “In practice, natural selection produces intelligent agents, who can predict, and then they make selective choices that affect who lives, who dies and who reproduces.”
That does NOT mean that natural selection operates via a predictive process. Ask any good biologist and they will tell you than natural selection is not predictive. That’s why species go extinct all the time.
Natural selection is a non-inductive proces that can produce individual organisms that can “do induction”. The process of natural selection is “Trial and error” which is quite literally analogous to Poppers theory of falsification.
BTW, it is not at all clear the intelligent agent operates primarily using “induction” either. Induction is primarily useful in learning but that’s only part of intelligence. Furthermore, even in the sub-function of learning it is not at all clear that induction is a primary algorithm. Clearly your brain needs to first model the world in order to classify observations in order to attribute them to anything. The induction itself is not primary. The model building capabilities themselves are the product of a non-inductive process.
Actually, seeing as how humans are so incredibly bad at Bayesian induction it’s a wonder anyone believes that we use induction at all. One would think that if our primary systems work on Baysian induction we’d be able to somehow tap into that.
Try explaining to yourself how you do induction and you will see that even you don’t do it in areas that you think you do. Do you really believe the sun comes up tomorrow because of induction? … or do you have a mental model of how the sun operates that you contengently believe in? When you learn some new aspect about the sun do you try to devise a new mental model or do you just change the odds it operates one way or another. My brain certainly doesn’t operate on odds.
It most certainly does—if we are talking about natural selection in evolution and nature.
Examples of failure to predict do not prove evolution does not predict—just that it does not predict perfectly.
Heh. Not even science works via Popper’s theory of falsification.
A straw man AFAICT—nobody said “primarily”—and yes, organisms’ belief that the sun will come up tomorrow shows that they are performing induction.
“It most certainly does—if we are talking about natural selection in evolution and nature.”
Unfortunately, that has things backwards. Dawkins is exactly right when he says.
He’s certainly an expert on the subject. As are Gould and the vast majority of evolutionary biologists. In fact, this is the first I’ve heard of this idea and you are linking to yourself as an authority. I chalk this up to your misunderstandings.
“organisms’ belief that the sun will come up tomorrow ”
Do organism’s “believe” the sun will come up tomorrow? I’m not even sure I believe it in the inductionist sense. Nor did I even consider it as a belief as a child. It wasn’t even something I considered. I certainly operated as if the sun would come up tomorrow but that doesn’t mean I arrived at my behavior via inductivist belief.
Let’s consider broadleaf trees dropping their leaves in the fall. Do they “believe” that winter will come? Did they arrive at that belief by “induction”. Not if you understand evolution. There is absolutely no requirement for any kind of induction in order to get an organism, a plant, that drops it’s leaves at the onset of winter. It’s not a prediction but more analogous to the copying of an successful accident.
Every thing has a nature and will tend to behave according to that nature. That doesn’t mean that the behavior being followed is necessarily held on the basis of induction. Every morning when I get out of bed I behave as if the floor will be there and has not magically transformed into a hoard of werewolves. That does not mean I decided by induction that the floor wasn’t going to turn into werewolves. In fact, the possibility need not even cross my mind or any mind.
When trees drop their leaves in autumn they are behaving postdictively, not predictively. That postdictive behavior is not about classical induction, generating universals from observations.
Natural selection operated for a very long time before there were the kinds of brains that make predictions. Natural selection operates just fine without brains, without prediction and without induction.
Even in the case of artificial selection humans, in the past, have been highly restrained by what mutations nature doles out. I’ve bred animals and you can’t just get to where you want to go. It would be great to breed a guppy that survives outdoors in the winter. My brain predicts that would be a best seller. However, it isn’t happening via standard methods of breeding.
Now perhaps the brains will some day use their predictive capability to create something called genetic engineering and perhaps that will generate cold hardy guppys. However to label that “natural selection” is to misunderstand the definitions involved.
“Not even science works via Popper’s theory of falsification.”
Actually, it does. This is probably another area you don’t fully comprehend. Kuhn was falsified long ago.
“A straw man AFAICT—nobody said “primarily”—and yes, organisms’ belief that the sun will come up tomorrow shows that they are performing induction.”
When you say that something “operates by induction” the word primarily is implicit. I was only making it explicit. Exposing the bias. Don’t we want to overcome that?
Now I must stop commenting lest I transgress the limits. Your confusion requires much more verbiage than this but such is not allowed here. This policy tending to maintain the bias here in the direction of the blog owners.
You cite the same rhetoric by Dawkins that I do. Dawkins is one of those whom I am arguing against. The issue is simple enough not to require an appeal to authority to resolve. Natural selection does not involve prediction if it acts on a system which does not make predictions. It does involve prediction when it acts on a system which does make predictions. If the selecting agent is intelligent, and thinks about the future, you cannot fully model the resulting process without modelling that predictive process. Tree leaves do not make predictions—and I never claimed that they did, so that example is irrelevant. It is not plants but organisms with complex brains that make predictions.
As for my rejection of Popper meaning that I support Kuhn: philosophy of science has moved on a bit since the 1960s.
Tim,
I think you are confusing an emergent property created by a system with how the system operates on its own. That architects draft blueprints that result in public housing projects that leads to forced living relationships that makes it hard to evict drug dealers doesnât mean that the discipline of architecture runs on drug dealing. Even if that drug dealing impacts the architects.
You seem to have written some simulations of natural selection. In writing those algorithms did you have to code in the ability to predict? Of course not. Can natural selection operate without prediction? Yes! Can natural selection generate organisms that have the ability to predict without prediction being built into natural selection? Of course!
Perhaps if I use a technique of substitution for this you will quickly grasp the confusion here between the process of natural evolution and the emergent property of âthe ability to make predictionsâ. You are making a category error.
Since the dolphin has evolved flippers would you therefore say that ânatural selection operates on flippers?â That would be very misleading. One could also go to your web page and substitute pictures of flippers wherever youâve draw in little brains and it would still make as much sense to those who understand evolution.
Of course, the existence of brains and flippers influence the direction evolution takes and so do asteroids collisions. That doesnât mean that natural selection operates on the basis of asteroid hits. Asteroids arenât the primary cause for the emergent order of life. The primary cause is also not predictive.
If from political events it was predictable that nuclear war was inevitable and imminent then, do you expect humans to experience a sudden increase in mutation of alleles increasing fitness towards nuclear survivability? Would you expect those alleles to also become more frequent in the population? All this merely because a brain somewhere predicted correctly that a nuclear event was about to happen?
Think what you are saying here. How natural selection works is well known. It works on differential survival of random mutations and on prediction and proactive behavior. Natural selection itself has no mechanism for making predictions or acting proactively.
Dawkins isnât saying that evolution isnât powerful. It is extremely powerful. What heâs saying is that that power is not due a conscious or unconscious ability to predict, or to act on prediction.
I quote from your article:
No kidding, and that’s a fatal flaw if you are claiming that scientists are choosing their theories primarily based on probability.
How that refutes Popper he doesn’t say. Popper addressed such issues. Is this guy really so ill informed as to think that Popper wasn’t aware that scientists collect data. Guess what, the religious do also, for example, lists of miracles.
I say “How so, and what a load of baloney.” Popper clarified a very important issue in the demarcation of science from non-science. Collecting data isnât one of the things that demark science from non-science. The only reason the writer of this article would bring data collection up is if he were totally ignorant of Poppers theories.
One important lesson that should be learned to âovercome biasâ is to understand the theory you are criticizing before you open your mouth.
Oh, so it’s not so “1960s” is it?
Not sure how this is any kind of obstacle that would “muddy the waters” or “corrupt progress”. Is that what itâs supposed to be an example of? Popper says that the fundamental process is a series of guesses and refutations of those guess. Your hypothesis can be anything including âI think that X is not random but caused by something elseâ. In which case you are free to go out and research anything you like.
This sounds more like a complaint that they can take other peoples money via taxes to pursue whatever they feel like with out some sort of justification. Boo, hoo.
No, itâs like expecting to get science, not art, when you are paying for science, not art. If the author doesnât want oversight then perhaps you should raise his own funds privately, or use his own money, instead of trying to divert tax money into his pet project.
I can see why some âscientistsâ are objecting to Popper, itâs cutting into their ability to pursue non-science on the public dole. Much of the anti-Popper backlash has been in the area of the social sciences where theyâd like to pursue things in a more post-modernist way.
Not sure how the author, using his standards, would expect to reject a request by the Catholic Church for scientific research funds from the government in order to maintain lists of saints and miracles. That is if he rejects falsification as an important breakthrough in demarking science from non-science.
Oh, and I just now noticed this guy is from the “Department of Psychology”. How’s that for a prediction.
Yes—provided we were talking about a process that included dolphin evolution. The process of natural selection in general does not necessarily involve prediction—but from the beginning I have been discussing natural selection in evolution—which does involve prediction.
Regarding Popper, I won’t say much—since I would be preaching to the choir on this blog. I recommend you start by going and reading the bit beginning: “Previously, the most popular philosophy of science was probably Karl Popper’s falsificationism—this is the old philosophy that the Bayesian revolution is currently dethroning.” on this page.
Tim,
Bayes? To paraphrase you, the philosophy of science has moved on a bit since the 1700s.
I’ve already read Yudkowsky’s article. I’m familar with Bayes since high school which is thirty years ago. I was one of those kids who found it very easy to grasp. It’s just a simple mathematical fact and certainly nothing to build a philosophy of science around.
Yudkowsky writes:
Really? So the “experimental method” which by which I assume he means the “scientific method” boils down to a special case of what, a probability calculation?
Come on, how gullible do you think I am? So under this view scientists have been, all along, calculating probability frequencies based on observations and just picking the theory that had the highest probability. Heck that sounds easy. Guess I’ll just write an scientist simulating AI program over the weekend with that. We’ll just replace all the real scientists with Bayesian decision models.
So, according to this view, if I go back to and do some historical investigation on The Great Devonian Controversy I should find that the scientists involved were calculating and comparing notes on the probabilities they were getting on all the Bayesian calculations they were doing. Right?
This, besides being ludicrous, is just not how people reason, as admitted by Yudkowsky in the very same article:
As my other comment pointed out in a quote from your own article it’s pretty clear that scientists do not choose to believe or not to believe based on calculating Bayes probabilities. They do no use Bayes for good reasons. Often they don’t have any underlying probabilities and have unknown distributions, but furthermore most problems don’t reduce to a Bayesian probability.
It’s hard to imagine that Darwin did a explicit Bayesian calculation in order to choose his theory over Lamarckism, let alone come up with the theory in the first place. It’s even harder to imagine that he did it implicitly in his mind when it is quite clear that a) “People do not employ Bayesian reasoning intuitively”
b) Bayes theorem only applies in special cases where you have known probabilities, and distributions.
In the Devonian Controversy no probabilities or distributions were known, no probability calculations done, and finally the winning hypothesis was NOT picked on the basis of greatest probability. There was no greatest probability and there was no winner. All competing hypotheses were rejected.
If intelligence and understanding of the natural world were just a matter of applying Bayesian logic don’t you think that not just human brains, but brains in general would likely be selected for that already? We should be good at it, not bad.
The human brain has evolved lots of subunits that are good as solving lots of different kinds of problems. Heck, even crows seem to be able to count. We seem to be able to classify and model things as consistent or inconsistent, periodic, dangerous, slow, fast, level, slanted, near, far, etc. These are mental models or modules that are probably prefabricated. Yet we humans don’t seem to have any prefabricated Bayesian deduction unit built in (by the way Bayes induction is based on deduction not induction, funny that). It’s actually the other way round from what he wrote. Bayesian Induction is subsumed by a scientific method characterized by Popperian Falsification (actually pan-critical rationalism).
Don’t be confused by the name. Popperian falsification is not merely about falsification any more than the Theory of Natural Selection is merely about “survival of the fittest”. Popperian falsification is about holding beliefs tentatively and testing them. Thus one might hold as a belief that Bayesian Induction works. Although you will find that this induction is more about deducing from some assumed base probabilities. Probabilities that are founded on tentatively believing you did your measurements correctly. So on and so forth.
In his example there is plenty of room for falsification:
The above can be addressed from many Popperian angles. Are our assumptions correct? How did we come up with that number 1%. That could be falsified by pointing out that our sample was biased. Perhaps the women were lying about their ages. What measures were taken to verify their ages? Where did the 80% claim come from. Is that from a particular demographic that matches the woman we are talking about? So on and so forth.
It’s not just a matter of plugging numbers into a Bayesian equation and the answer falls out. Yeah, you can use Bayesian logic to create a decision model. That doesn’t mean you should believe or act on the result. It doesn’t mean it’s the primary method being used.
“Yes—provided we were talking about a process that included dolphin evolution.”
I’m flabbergasted by this response. There is nothing inherent about natural selection that is going to give you flippers. That’s dependent on the environment, the mutations, and other factors. The process itself has no “flippers”. It’s a process that works fine without flippers, yet you insist that natural selection “operates on flippers” just because of a quite contengent possible outcome of that process.
Meanwhile even where flippers evolve they can also go extinct. The natural selection continues, was influenced by flipper existence at one point, but now is no longer influenced by flipper existence because all such animals are extinct. Natural selection was there all along, seems to operate just fine with or without flippers and yet you want to say that it “operates on flippers”. Seems using your rather strange logic one would have to say that it “Doesn’t operate on flippers” when there are none around.
Do you further claim that natural selection “operates on flippers” in the case of for example, chimps, just because in parallel there are dophins in the sea? How remote or close do these things have to be. If there is an alien on another planet with wingdings does that mean that one can say about Natural Selection while standing on Earth, “Natural Selection operates on wingdings?”
You see, to me, the term “operates on flippers” means that the flipper is an integral and mandatory requirement for the thing to work. Not something unneccesary and contengent. Otherwise, the list is endless and meaningless, and the definition pointless to saying exactly what Natural Selection is or isn’t.
Boy, I can just imagine trying to teach a class of kids Natural Selection using your concept of “operates on” as a basis. This is so far out there I’m not even going to assume Yudkowski meant this in his original quote “natural selection operates on induction”. I’m sure he’d reject this interpretation also. It would make his statement as profound as saying “natural selection operates on banana peels.” and as silly.
Re: the philosophy of science—you don’t seem to get it. Oh well, I tried.
Re: evolution and natural selection—yes there is “the class of all things which involve inheritance, variation and selection”. However, there is also the instance of evolution via natural selection running on our planet. You have consistently talked about the former. I have consistently talked about the latter. I figure that once you stop attempting to apply my comments to the former notion—which they simply do not apply to—our disagreement here will evaporate.
The point of emphasizing that the process of evolution that we witness involves foresight is because there is a widespread misunderstanding of evolution which holds that the process that produced human beings is “blind to the future”—and cannot make predictions. That idea inaccurate—and that fact often needs pointing out. Intelligence and foresight are part of the evolutionary process on this planet—and they have been for many millions of years.
Tim,
The problem is that there is no mechanism in the process of natural selection for stuffing that foresight generated by brains back into the genome. Learn all you want but it isn’t passed onto you kids via your genes. That’s the rub. That’s why natural selection is blind to the future. The idea that natural selection is blind is perfectly accurate.
That’s also true for the small minority of organisms on the planet that actually have brains that predict the future. So I am talking about the instance of natural selection operating on this planet. Your idea is not a valid notion even when restricted to down to the minority instance of human evolution. I gave you that specific example of humans in the last comment.
We do not predict the future via natural selection, and our predictions about the future do not become translated into a form which can be transmitted via the genome for future selection. Natural selection can only select with hindsight. It can only select after a prediction works and only on the genes that produced the predictor and not the prediction.
BTW, the leaf fall that trees perform is a kind of prediction. It wasn’t generated by induction, or foresight. It was generated by simple pruning, falsification of bad hypothesis (mutations) on whether and when to drop leaves. Natural selection produced an organism that does prediction without doing any prediction on it’s own.
There most certainly are such mechanisms. Most obviously sexual selection and the Baldwin effect—but also via vanilla natural selection. I explain this issue clearly enough in my essay.
No, sexual selection does not determine which mutations occur. It’s merely a reinforcing feedback loop that is actually considered an example of how evolution can run off the rails, so to speak.
Sure females might by accident happen to pick a feature in males that might prove adaptive. Unfortunately for your argument it is not based on prediction, but is happenstance. Even were the “correct” feature choosen initially there is then the tendency of sexual selection to over select for the feature merely because other females find the feature attractive.
So it might be that slightly longer horns might be more adaptive in fighting off predators. However once females start mating with males based on horn lenght for the sake of horn length then they just keep getting longer to the point of detriment to the survival of the males. This is quite obviously a very bad example of prediction. Again, it’s all in retrospect, and if no mutations happen to occur in the direction of longer horns then no matter how much females want longer horns it isn’t happening.
Furthermore, sexual selection operates in reverse on the females and that’s why it also gets out of hand. Mutations that happen to drive females brains to desire longer horns even more will tend to make them more likely bear sons that are attractive to other females. No prediction here, just a run away process that ends up being limited by the ability of males to suffer under their oversize horns.
Notice that there is no mechanims here for the female preferences to invent new mutations for either longer horns or increased preference for longer horns. If a female happens to have an extra heavy fetish for long horns that was environmentally driven that cannot and will not cause any mutation that she could pass on to her offspring to make them have the same level of passion for long horns.
It’s the genes that build the female brain to prefer long horns in the first place, and not some inductive process in the brain that generated the preference. By definition there must be preexisting gene or the trait wouldn’t be heritable and by definition sexual selection could NOT occur.
The Balwin effect is merely the believe that socially passed behaviors can lead to fixation of genetic traits. Perfectly possible and again it has nothing to do with prediction. Genetic fixation could only occur given the random creation of the correct mutations, plus a static environment with regard to the trait in question. This really is no different than geneticially mediated behavioral changes driving changes in other traits and behavior.
Once plants took to growing on land by minor adaptations to say drying then selection pressures on other traits change. Traits like tall stems to overshadow competitors on land are selected for. That’s not predictive. The new selective pressures are merely the result of the change in habitat. The adaptation to drying didn’t “predict” that longer stems would be needed, nor did it generate the mutations for longer stems.
Likewise a behavioral change of say a fish hunting on land like the mud skipper will naturally lead to new selective pressures on other traits like, ability to withstand sun, or drying, or whatever. That doesn’t mean that the fish behaviorially deciding to hunt further ashore in any way predicted the need for the other traits, nor does it mean that it’s brain created new mutations that were stuffed back into the genome. It’s perfectly possible that the random process of mutations just never produces the proper mutations to allow that mud skipper to fully adapt to the land.
The mutations are the random guesses as to what might work and are entirely unintentional. In fact if you’ve read Dawkins there is selective pressure against mutation. Those random mistakes however allow natural selection to explore haphazardly through different body plans and sometimes things go in the right direction, and sometimes not.
Even if females liked males with bigger brains as evidenced by say singing. That doesn’t neccesarily mean that males spending lots of time singing, and females listening to that singing is predictive of anything. Big brains are just one more trait and it might be that the selective pressures in the environment are actually changing in a way that is selecting for smaller brains just as sexual selection is operating in the opposite direction. Rich food sources needed to supprot big brains might be decreasing over time as the habitat becomes more arid. In which case extinction is a likely outcome.
Which is another lesson I think you need to learn from biologists. You seem to believe in some kind of inherent “progress” to all this. That’s not the case. It’s perfectly possible for organisms to be subjected to selective pressures that move them to what most people would see as regression. That’s why there are so many “backwards” parasites from what are more “advanced” lineages. Often in animals that have brains that predict.
Many a species with a predictive brain has walked the path down specialization to extinction.
To me, this is all as silly as the Chinese Room. Naive intuition says that the Chinese Room doesn’t understand chinese in the way we do… because it doesn’t, running as it does on exponentially slower timescales. Similarly, I can close the box on my turing machine and it can, over mega-exponential timescales and using mega-exponential amounts of tape, get just as much smarter as it could by looking around and talking to people, but there are still several reasons why the latter strategy is superior.
Of course, I may just not understand what is supposed to be being demonstrated (or refuted) here. But if you’re arguing that Godel doesn’t outlaw singularities, then this is a massive non-sequitur. What you’re saying is that Godel doesn’t outlaw singularities which are r-e-a-l-l-y s-l-o-w, which seems to me to violate any useful definition of singularities. (OK, sure, it shows the wall is breached, but don’t expect it to convince any skeptics.)
I hardly know where to begin—and I’m not sure I should bother, since we are not really very on topic, and this has gone on long enough already. Very briefly:
1. Nobody ever claimed that sexual selection determines which mutations occur. Sexual selection provides examples of selection by intelligent agents affecting the course of evolution;
2. Sexual selection is not “happenstance”. Females tend to pick traits that—among other things—reflect male health, fitness, and resistance to parasite infestations;
3. The cases where sexual selection drives species extinct does not prove that no prediction is occurring. Predictions can fail. That doesn’t mean that they were not predictions in the first place;
4. The Baldwin effect allows the fixation of characteristics which are repeatably acquired in the genome. Traits may be repeatably acquired as a result of prediction. For example an organism may become able to predict the onset of winter—and so learn that burying nuts is a good idea—and that could go on to influence nuclear evolution;
5. Regarding progress—see my essays Life’s Direction and God’s Utility Function.
Associating Kolmogorov complexity with intelligence causes brain damage: Example.
″...the Kolmogorov complexity of a pattern is the size of the program code of the shortest Turing machine that produces the pattern as an output, given unlimited tape as working memory.”
I am confused. If each pattern gets its own shortest program to describe it, why can’t complexity numbers be computed (wikipedia says they can’t be)?
I had thought that the reason they could not be was that each string has its own best program as a prefix that minimizes the number of bits needed to express the prefix plus the string, and there was no neutral way to compare between similarly complex strings since each might be shorter under its personal program than the other under that same program.
Thanks for the site!
If you could compute them, you could use that computation (in a complicated way) to produce strings that could only be produced by much longer programs. This is a contradiction, so it’s not possible to compute complexity.
And I guess you can’t just enumerate all programs and take the length of the shortest (=first) one that produces the string because of the halting problem?
I wonder if I could turn that around and spin a computation of complexity into a halting oracle...
So, any string can be made by some programs, each program has a mathematical complexity of K, we say each string has a K of the program with the smallest K that makes that string. If knowing a string gives us a precise value for K, knowing a precise value for K lets us generate some strings that are more complex than K but no more complex than any other strings higher than K. Some strings are random, and have Ks that are strings more complex than the strings they describe. For some string the shortest program that makes it will be “The shortest string with K greater than K1”, and “The shortest string with K greater than K1″ is a program with a low K. Yet that string was chosen by the program because it had a higher K than “the shortest string with complexity greater than K” does, so the string thus delineated “really” has a very small K, not a high one, so it must be unselected, and thus unselected it “really” has a high K again, falsifying that program’s selection of anything else.
The problem is that programs can be made out of self referencing words that select their components and targets based on what other programs are doing.
Can’t we still calculate K in a mathematical universe forbidding self referencing sentences? In our world we are comfortable saying K1 is greater than K2 if and only if we know it is orders of magnitude greater. What would be wrong with saying K1 is greater than K2 because K1=100 and K2=10, where K is complexity in the mathematical world with the extra axiom?
I’m not mathematician enough to say yes, but I think so. I don’t know how well/simple a no-self-reference version of some theories stacks up against the self-referencing versions (I am in the mind of lisp’s recursion features; without that ability lisp programs would be … longer and more complicated).
Is this right:
For every string there is a program that gives it its smallest k, s+p=K. For every s there is a k bigger than s but no bigger than any other k, yet “For some sn there is a Kn bigger than sn but no bigger than any other K” < pn?
So computing K for every s means there is a paradox for some s.
Can we make our lives easy by picking a s, accepting it is a paradox, and moving on to compute all other Ks? Or more intensely, by randomly picking a representative sample of strings to be paradoxes, calculate each (non-paradox) string’s K in each paradox world, and then average out the Ks?
It seems like Eliezer only affirmatively says one K is bigger than another when they are of vastly different orders of magnitude.
If this is obviously wrong, sorry but I have no math background!
Allow me to try again.
Any string can be made by some programs, each program has a mathematical complexity of K, we say each string has a K of the program with the smallest K that makes that string. If knowing a string gives us a precise value for K, knowing a precise value for K lets us generate some strings that are more complex than K but no more complex than any other strings higher than K. Some strings are random, and have Ks that are strings more complex than the strings they describe. For some string the shortest program that makes it will be “The shortest string with K greater than K1”, and “The shortest string with K greater than K1″ is a program with a low K. Yet that string was chosen by the program because it had a higher K than “the shortest string with complexity greater than K” does, so the string thus delineated “really” has a very small K, not a high one, so it must be unselected, and thus unselected it “really” has a high K again, falsifying that program’s selection of anything else.
The problem is that programs can be made out of self referencing words that select their components and targets based on what other programs are doing.
Can’t we still calculate K in a mathematical universe forbidding self referencing sentences? In our world we are comfortable saying K1 is greater than K2 if and only if we know it is orders of magnitude greater. What would be wrong with saying K1 is greater than K2 because K1=100 and K2=10, where K is complexity in the mathematical world with the extra axiom?