A prima facie case against the likelihood of a major-impact intelligence-explosion singularity:
Firstly, the majoritarian argument. If the coming singularity is such a monumental, civilization-filtering event, why is there virtually no mention of it in the mainstream? If it is so imminent, so important, and furthermore so sensitive to initial conditions that a small group of computer programmers can bring it about, why are there not massive governmental efforts to create seed AI? If nothing else, you might think that someone could exaggerate the threat of the singularity and use it to scare people into giving them government funds. But we don’t even see that happening.
Second, a theoretical issue with self-improving AI: can a mind understand itself? If you watch a simple linear Rube Goldberg machine in action, then you can more or less understand the connection between the low- and the high-level behavior. You see all the components, and your mind contains a representation of those components and of how they interact. You see your hand, and understand how it is made of fingers. But anything more complex than an adder circuit quickly becomes impossible to understand in the same way. Sure, you might in principle be able to isolate a small component and figure out how it works, but your mind simply doesn’t have the capacity to understand the whole thing. Moreover, in order to improve the machine, you need to store a lot of information outside your own mind (in blueprints, simulations, etc.) and rely on others who understand how the other parts work.
You can probably see where this is going. The information content of a mind cannot exceed the amount of information necessary to specify a representation of that same mind. Therefore, while the AI can understand in principle that it is made up of transistors etc., its self-representation necessary has some blank areas. I posit that the AI cannot purposefully improve itself because this would require it to understand in a deep, level-spanning way how it itself works. Of course, it could just add complexity and hope that it works, but that’s just evolution, not intelligence explosion.
So: do you know any counterarguments or articles that address either of these points?
First, it is being mentioned in the mainstream—there was a New York Times article about it recently.
Secondly, I can think of another monumental, civilisation-filtering event that took a long time to enter mainstream thought—nuclear war. I’ve been reading Bertrand Russel’s autobiography recently, and am up to the point where he begins campaigning against the possibility of nuclear destruction. In 1948 he made a speech to the House of Lords (UK’s upper chamber), explaining that more and more nations would attempt to acquire nuclear weapons, until mutual annihilation seemed certain. His fellow Lords agreed with this, but believed the matter to be a problem for their grandchildren.
Looking back even further, for decades after the concept of a nuclear bomb was first formulated, the possibility of nuclear was was only seriously discussed amongst physicists.
I think your second point is stronger. However, I don’t think a single AI rewiring itself is the only way it can go FOOM. Assume the AI is as intelligent as a human; put it on faster hardware (or let it design its own faster hardware) and you’ve got something that’s like a human brain, but faster. Let it replicate itself, and you’ve got the equivalent of a team of humans, but which have the advantages of shared memory and instantaneous communication.
Now, if humans can design an AI, surely a team 1,000,000 human equivalents running 1000x faster can design an improved AI?
The information content of a mind cannot exceed the amount of information necessary to specify a representation of that same mind.
If your argument is based on information capacity alone, it can be knocked down pretty easily. An AI can understand some small part of its design and improve that, then pick another part and improve that, etc. For example, if the AI is a computer program, it has a sure-fire way of improving itself without completely understanding its own design: build faster processors. Alternatively you could imagine a population of a million identical AIs working together on the problem of improving their common design. After all, humans can build aircraft carriers that are too complex to be understood by any single human. Actually I think today’s humanity is pretty close to understanding the human mind well enough to improve it.
I don’t think the number of AIs actually matters. If multiple AI’s can do a job, then a single AI should be able to simulate them as though it was multiple AI’s (or better yet just figure out how to do it on it’s own) and then do it as well. Another thing to note is that if the AI makes a copy of its program and puts it in external storage, it doesn’t add any extra complexity to itself. It can then run it’s optimization process on it, although I do agree that it would be more practical if it only improved parts of itself at a time.
It depends upon what designing a mind is like. How much minds intrinsically rely on interactions between parts and how far those interactions reach.
In the brain most of the interesting stuff such as science and the like is done by culturally created components. The evidence for this is the stark variety of the worldviews that exist in the world and have existed in history (with most of the same genes) and the ways those views made those that hold them interact with the world.
Making a powerful AI, in this view, is not just a problem of making a system with lots of hardware or the right algorithms from birth; it is a problem of making a system with the right ideas. And ideas interact heavily in the brain. They can squash or encourage each other. If one idea goes, others that rely on it might go as well.
I suspect that we might be close to making the human mind able to store more ideas or make the ideas process more quickly. How much that will lead to the creation of better ideas I don’t know. That is will we get a feedback loop? We might just get better at storing gossip and social information.
The information content of a mind cannot exceed the amount of information necessary to specify a representation of that same mind. Therefore, while the AI can understand in principle that it is made up of transistors etc., its self-representation necessary has some blank areas.
This is strictly true if you’re talking about the working memory that is part of a complete model of your “mind”. But a mind can access an unbounded amount of externally stored data, where a complete self-representation can be stored.
A Turing Machine of size N can run on an unbounded-size tape. A von Neumann PC with limited main memory can access an unbounded-size disk.
Although we can only load a part of the data into working memory at a time, we can use virtual memory to run any algorithm written in terms of the data as a whole. If we had an AI program, we could run it on today’s PCs and while we could run out of disk space, we couldn’t run out of RAM.
I’d just forget the majoritarian argument altogether, it’s a distraction.
The second question does seem important to me, I too am skeptical that an AI would “obviously” have the capacity to recursively self-improve.
The counter-argument is summarized here, whereas we humans are stuck with an implementation substrate which was never designed for understandability, an AI could be endowed with both a more manageable internal representation of its own capacities and a specifically designed capacity for self-modification.
It’s possible—and I find it intuitively plausible—that there is some inherent general limit to a mind’s capacity for self-knowledge, self-understanding and self-modification. But an intuition isn’t an argument.
I see Yoreth’s version of the majoritarian argument as ahistorical. The US Government did put a lot of money into AI research and became disillusioned. Daniel Crevier wrote a book AI: The tumultuous history of the search for artificial intelligence. It is a history book. It was published in 1993, 17 years ago.
There are two possible responses. One might argue that time has moved on, things are different now, and there are serious reasons to distinguish today’s belief that AI is around the corner from yesterday’s belief that AI is around the corner. Wrong then, right now, because...
Alternatively one might argue that scaling died at 90 nanometers, practical computer science is just turning out Java monkeys, the low hanging fruit has been picked, there is no road map, theoretical computer science is a tedious sub-field of pure mathematics, partial evaluation remains an esoteric backwater, theorem provers remain an esoteric backwater, the theorem proving community is building the wrong kind of theorem provers and will not rejuvenate research into partial evaluation,...
The lack of mainstream interest in explosive developments in AI is due to getting burned in the past. Noticing that the scars are not fading is very different from being unaware of AI.
There are two possible responses. One might argue that time has moved on, things are different now, and there are serious reasons to distinguish today’s belief that AI is around the corner from yesterday’s belief that AI is around the corner. Wrong then, right now, because...
I’m reminded of a historical analogy from reading Artificial Addition. Think of it this way: a society that believes addition is the result of adherence to a specific process (or a process isomorphic thereto), and understands part of that process, is closer to creating “general artificial addition” than one that tries to achieve “GAA” by cleverly avoiding the need to discover this process.
We can judge our own distance to artificial general intelligence, then, by the extent to which we have identified constraints that intelligent processes must adhere to. And I think we’ve seen progress on this in terms of more refined understanding of e.g. how to apply Bayesian inference. For example, the work by Sebastian Thrun on how to seamlessly aggregate knowledge across sensors to create a coherent picture of the environment, which has produced tangible results (navigating the desert).
Can you point me to an overview of this understanding? I would like to apply it to the problem of detecting different types of data in a raw binary file.
I don’t know of a good one. You could try this, but it’s light on the math. I’m looking through Thrun’s papers to find a good one that gives a simple overview of the concepts, and through the CES documentation.
I was introduced to this advancement in EY’s Selling nonapples article.
And I’m not sure how this helps for detecting file types. I mean, I understand generally how they’re related, but not how it would help with the specifics of that problem.
Thanks I’ll have a look. I’m looking for general purpose insights. Otherwise you could use the same sorts of reasoning to argue that the technology behind deep blue was on the right track.
True, the specific demonstration of Thrun’s that referred to was specific to navigating a terrestrial desert environment, but it was a much more general problem than chess, and had to deal with probabilistic data and uncertainty. The techniques detailed in Thrun’s papers easily generalize beyond robotics.
I’ve had a look, and I don’t see anything much that will make the techniques easily generalize to my problems (or any problem that has similar characteristics to mine, such as very large amounts of possibly relevant data). Oh, I am planning to use bayesian techniques. But easy is not how I would characterize the translating of the problem.
Now that you mention it, one of the reasons I’m trying to get acquainted with the methods Thrun uses is to see how much they rely on advance knowledge of exactly how the sensor works (i.e. its true likelihood function). Then, I want to see if it’s possible to infer enough relevant information about the likelihood function (such as through unsupervised learning) so that I can design a program that doesn’t have to be given this information about the sensors.
And that’s starting to sound more similar to what you would want to do.
That’d be interesting. More posts on the real world use of bayesian models would be good for lesswrong I think.
But I’m not sure how relevant to my problem. I’m in the process of writing up my design deliberations and you can judge better once you have read them.
The reason I say that our problems are related is that inferring the relevant properties of a sensor’s likelihood function looks like a standard case of finding out how the probability distribution clusters. Your problem, that of identifying a file type from its binary bitstream, is doing something similar—finding what file types have what PD clusters.
I know of partial evaluation in the context of optimization, but I hadn’t previously heard of much connection between that and AI or theorem provers. What do you see as the connection?
Or, more concretely: what do you think would be the right kind of theorem provers?
I think I made a mistake in mentioning partial evaluation. It distracts from my main point. The point I’m making a mess of is that Yoreth asks two questions:
If the coming singularity is such a monumental, civilization-filtering event, why is there virtually no mention of it in the mainstream? If it is so imminent, so important, and furthermore so sensitive to initial conditions that a small group of computer programmers can bring it about, why are there not massive governmental efforts to create seed AI?
I read (mis-read?) the rhetoric here as containing assumptions that I disagree with. When I read/mis-read it I feel that I’m being slipped the idea that governments have never been interested in AI. I also pick up a whiff of “the mainstream doesn’t know, we must alert them.” But mainstream figures such as John McCarthy and Peter Norvig know and are refraining from sounding the alarm.
So partial evaluation is a distraction and I only made the mistake of mentioning it because it obsesses me. But it does! So I’ll answer anyway ;-)
Why am I obsessed? My Is Lisp a Blub post suggests one direction for computer programming language research. Less speculatively, three important parts of computer science are compiling (ie hand compiling), writing compilers, and tools such as Yacc for compiling compilers. The three Futamura projections provide a way of looking at these three topics. I suspect it is the right way to look at them.
Lambda-the-ultimate had an interesting thread on the type-system feature-creep death-spiral. Look for the comment By Jacques Carette at Sun, 2005-10-30 14:10 linking to Futamura’s papers. So there is the link to having a theorem proving inside a partial evaluator.
Now partial evaluating looks like it might really help with self-improving AI. The AI might look at its source, realise that the compiler that it is using to compile itself is weak because it is a Futamura projection based compiler with an underpowered theorem prover, prove some of the theorems itself, re-compile, and start running faster.
Well, maybe, but the overviews I’ve read of the classic text by Jones, Gomard, and Sestoft, make me think that the start of the art only offers linear speed ups. If you write a bubble sort and use partial evaluation to compile it, it stays order n squared. The theorem prover will never transform to an n log n algorithm.
I’m trying to learn ACL2. It is a theorem prover and you can do things such as proving that quicksort and bubble sort agree. That is a nice result and you can imagine that fitting into a bigger picture. The partial evaluator wants to transform a bubble sort into something better, and the theorem prover can annoint the transformation as correct. I see two problems.
First, the state of the art is a long way from being automatic. You have to lead the theorem prover by the hand. It is really just a proof checker. Indeed the ACL2 book says
You are responsible for guiding it, usually by getting it to prove the necessary lemmas. Get used to thinking that it rarely proves anything substantial by itself.
it is a long way from proving (bubble sort = quick sort) on its own.
Second that doesn’t actually help. There is no sense of performance here. It only says that they agree, without saying which is faster. I can see a way to fix this. ACL2 can be used to prove that interpreters conform to their semantics. Perhaps it can be used to prove that an instrumented interpreter performs a calculation in fewer than n log n cycles. Thus lifting the proofs from proofs about programs to proofs about interpreters running programs would allow ACL2 to talk about performance.
This solution to problem two strikes me as infeasible. ACL2 cannot cope with the base level without hand holding, which I have not managed to learn to give. I see no prospect of lifting the proofs to include performance without adding unmanageable complications.
Could performance issues be built in to a theorem prover, so that it natively knows that quicksort is faster than bubble sort, without having to pass its proofs through a layer of interpretation? I’ve no idea. I think this is far ahead of the current state of computer science. I think it is preliminary to, and much simple than, any kind of self-improving artificial intelligence. But that is what I had in mind as the right kind of theorem prover.
There is a research area of static analysis and performance modelling. One of my Go playing buddies has just finished a PhD in it. I think that he hopes to use the techniques to tune up the performance of the TCP/IP stack. I think he is unaware of and uninterested in theorem provers. I see computer science breaking up into lots of little specialities, each of which takes half a life time to master. I cannot see the threads being pulled together until the human lifespan is 700 years instead of 70.
Ah, thanks, I see where you’re coming from now. So ACL2 is pretty much state-of-the-art from your point of view, but as you point out, it needs too much handholding to be widely useful. I agree, and I’m hoping to build something that can perform fully automatic verification of nontrivial code (though I’m not focusing on code optimization).
You are right of course that proving quicksort is faster than bubble sort, is even considerably more difficult than proving it is equivalent.
But the good news is, there is no need! All we need to do to check which is faster, is throw some sample inputs at each and run tests. To be sure, that approach is fallible, but what of it? The optimized version only needs to be probably faster than the original. A formal guarantee is only needed for equivalence.
“But the good news is, there is no need! All we need to do to check which is faster, is throw some sample inputs at each and run tests.”
“no need”? Sadly, it’s hard to use such simple methods as anything like a complete replacement for proofs. As an example which is simultaneously extreme and simple to state, naive quicksort has good expected asymptotic performance, but its (very unlikely) worst-case performance falls back to bubble sort. Thus, if you use quicksort naively (without, e.g., randomizing the input in some way) somewhere where an adversary has strong influence over the input seen by quicksort, you can create a vulnerability to a denial-of-service attack. This is easy to understand with proofs, not so easy either to detect or to quantify with random sampling. Also, the pathological input has low Kolmogorov complexity, so the universe might well happen give it to your system accidentally even in situations where your aren’t faced by an actual malicious intelligent “adversary.”
Also sadly, we don’t seem to have very good standard technology for performance proofs. Some years ago I made a horrendous mistake in an algorithm preprint, and later came up with a revised algorithm. I also spent more than a full-time week studying and implementing a published class of algorithms and coming to the conclusion that I had wasted my time because the published claimed performance is provably incorrect. Off and on since then I’ve put some time into looking at automated proof systems and the practicalities of proving asymptotic performance bounds. The original poster mentioned ACL2; I’ve looked mostly at HOL Light (for ordinary math proofs) and to a lesser extent Coq (for program/algorithm proofs). The state of the art for program/algorithm proofs doesn’t seem terribly encouraging. Maybe someday it will be a routine master’s thesis to, e.g., gloss Okasaki’s Purely Functional Data Structures with corresponding performance proofs, but we don’t seem to be quite there yet.
Part of the problem with these is that there are limits to how much can be proven about correctness of programs. In particular, the general question of whether two programs will give the same output on all inputs is undecidable.
Proposition: There is no Turing machine which when given the description of two Turing machines accepts iff both the machines will agree on all inputs.
Proof sketch: Consider our hypothetical machine A that accepts descriptions iff they correspond to two Turing machines which agree on all inputs. We shall show that how we can construct a machine H from A which would solve the halting problem. Note that for any given machine D we can construct a machine [D, s] which mimics D when fed input string s (simply append states to D so that the machine first erases everything on the tape, writes out s on the tape and then executed the normal procedure for D). Then, to determine whether a given machine T accepts a given input s, ask machine A whether [T,s] agrees with the machine that always accepts. Since we’ve now constructed a Turing machine which solves the haling problem, our original assumption, the existence of A must be false.
There are other theorems of a similar nature that can be proven with more work. The upshot is that in general, there are very few things that a program can say about all programs.
True. Test inputs suffice for an optimizer that on average wins more than it loses, which is good enough to be useful, but if you want guaranteed efficiency, that comes back to proof, and the current state-of-the-art is a good way short of doing that in typical cases.
Right, that’s optimization again. Basically the reason I’m asking about this is that I’m working on a theorem prover (with the intent of applying it to software verification), and if Alan Crowe considers current designs the wrong kind, I’m interested in ideas about what the right kind might be, and why. (The current state of the art does need to be extended, and I have some ideas of my own about to do that, but I’m sure there are things I’m missing.)
Because I am not just saying it’s not obvious an AI would recursively self-improve, I’m also referring to Eliezer’s earlier claims that such recursive self-improvement (aka FOOM) is what we’d expect given our shared assumptions about intelligence. I’m sort-of quoting Eliezer as saying FOOM obviously falls out of these assumptions.
I’m worried about the “sort-of quoting” part. I get nervous when people put quote marks around things that aren’t actually quotations of specific claims.
Noted, and thanks for asking. I’m also somewhat over-fond of scare quotes to denote my using a term I’m not totally sure is appropriate. Still, I believe my clarification above is sufficient that there isn’t any ambiguity left now as to what I meant.
Stephen Hawking, Martin Rees, Max Tegmark, Nick Bostrom, Michio Kaku, David Chalmers and Robin Hanson are all smart people who broadly agree that >human AI in the next 50-100 years is reasonably likely (they’d all give p > 10% to that with the possible exception of Rees). On the con side, who do we have? To my knowledge, no one of similarly high academic rank has come out with a negative prediction.
Edit: See Carl’s comment below. Arguing majoritarianism against a significant chance of AI this century is becoming less tenable, as a significant set of experts come down on the “yes” side.
It is notable that I can’t think of any very reputable nos. The ones that come to mind are Jaron Lanier and that Glenn Zorpette.
10% is a low bar, it would require a dubiously high level of confidence to rule out AI over a 90 year time frame (longer than the time since Turing and Von Neumann and the like got going, with a massively expanding tech industry, improved neuroimaging and neuroscience, superabundant hardware, and perhaps biological intelligence enhancement for researchers). I would estimate the average of the group you mention as over 1/3rd by 2100. Chalmers says AI is more likely than not by 2100, I think Robin and Nick are near half, and I am less certain about the others (who have said that it is important to address AI or AI risks but not given unambiguous estimates).
Here’s Ben Goertzel’s survey. I think that Dan Dennett’s median estimate is over a century, although at the 10% level by 2100 I suspect he would agree. Dawkins has made statements that suggest similar estimates, although perhaps with someone shorter timelines. Likewise for Doug Hofstadter, who claimed at the Stanford Singularity Summit to have raised his estimate of time to human-level AI from 21st century to mid-late millenium, although he weirdly claimed to have done so for non-truth-seeking reasons.
None of those people are AI theorists so it isn’t clear that their opinions should get that much weight given that it is outside their area of expertise (incidentally, I’d be curious what citation you have for the Hawking claim). From the computer scientists I’ve talked to, the impression I get is that they see AI as such a failure that most of them just aren’t bothering to do much in the way of research in it except for narrow purpose machine learning or expert systems. There’s also an issue of a sampling bias: the people who think a technology is going to work are generally more loud about that than people who think it won’t. For example, a lot of physicists are very skeptical of Tokamak fusion reactors being practical anytime in the next 50 years, but the people who talk about them a lot are the people who think they will be practical.
Note also that nothing in Yoreth’s post actually relied on or argued that there won’t be moderately smart AI so it doesn’t go against what he’s said to point out that some experts think there will be very smart AI (although certainly some people on that list, such as Chalmers and Hanson do believe that some form of intelligence explosion like event will occur). Indeed, Yoreth’s second argument applies roughly to any level of intelligence. So overall, I don’t think the point about those individuals does much to address the argument.
None of those people are AI theorists so it isn’t clear that their opinions should get that much weight given that it is outside their area of expertise
I disagree with this, basically because AI is a pre-paradigm science. Having been at a big CS/AI dept, I know that the amount of accumulated wisdom about AI is virtually nonexistent compared to that for physics.
What does an average AI prof know that a physics graduate who can code doesn’t know? I’m struggling to name even one thing. If you set the two of them to code AI for some competition like controlling a robot, I doubt that there would be much advantage to the AI guy.
The only examples of genuine scientific insight in AI I have seen are in the works of Pearl, Hutter, Drew McDerrmot and recently Josh Tenebaum.
That’s a very good point. The AI theorist presumably knows more about avenues that have not done very well (neural nets, other forms of machine learning, expert systems) but isn’t likely to have much general knowledge. However, that does mean the AI individual has a better understanding of how many different approaches to AI have failed miserably. But that’s just a comparison to your example of the physics grad student who can code. Most of the people you mentioned in your reply to Yoreth are clearly people who have knowledge bases closer to that of the AI prof than to the physics grad student. Hanson certainly has looked a lot at various failed attempts at AI. I think I’ll withdraw this argument. You are correct that these individuals on the whole are likely to have about as much relevant expertise as the AI professor.
What does an average AI prof know that a physics graduate who can code doesn’t know? I’m struggling to name even one thing. If you set the two of them to code AI for some competition like controlling a robot, I doubt that there would be much advantage to the AI guy.
So people with no experience programming robots but who know the equations governing them would just be able to, on the spot, come up with comparable code to AI profs? What do they teach in AI courses, if not the kind of thing that would make you better at this?
How to code, and rookie Bayesian stats/ML, plus some other applied stuff, like statistical Natural Language Processing (this being an application of the ML/stats stuff, but there are some domain tricks and tweaks you need).
So people with no experience programming robots but who know the equations governing them would just be able to, on the spot, come up with comparable code to AI profs?
The point is that there would only be experience, not theory, separating someone who knew Bayesian stats, coding and how to do science from an AI “specialist”. Yes, there are little shortcuts and details that a PhD in AI would know, but really there’s no massive intellectual gulf there.
Each prof will, of course, have a niche app that they do well (in fact sometimes there is too much pressure to have a “trick” you can do to justify funding), but the key question is: are they more like a software engineer masquerading as a scientist than a real scientist? Do they have a paradigm and theory that enables thousands of engineers to move into completely new design-spaces?
I think that the closest we have seen is the ML revolution, but when you look at it, it is not new science, it is just statistics correctly applied.
I have seen some instances of people trying to push forward the frontier, such as the work of Hutter, but it is very rare.
Could you clarify exactly what Hutter has done that has advanced the frontier? I used to be very nearly a “Hutter enthusiast”, but I eventually concluded that his entire work is:
“Here’s a few general algorithms that are really good, but take way too long to be of any use whatsoever.”
Am I missing something? Is there something of his I should read that will open my eyes to the ease of mechanizing intelligence?
I think that the way of looking at the problem that he introduced is the key, i.e. thinking of the agent and environment as programs. The algorithms (AIXI, etc) are just intuition pumps.
This seems like a fairly reasonable description of the work’s impact:
“Another theme that I picked up was how central Hutter’s AIXI and my work on the universal intelligence measure has become: Marcus and I were being cited in presentations so often that by the last day many of the speakers were simply using our first names. As usual there were plenty of people who disagree with our approach, however it was clear that our work has become a major landmark in the area.”
But why does it get those numerous citations? What real-world, non-academic consequences have resulted from this massive usage of Hutter’s intelligence definition, which would distinguish it from a mere mass frenzy?
No time for a long explanation from me—but “universal intelligence” seems important partly since it shows how simple an intelligent agent can be—if you abstract away most of its complexity into a data-compression system. It is just a neat way to break down the problem.
A good physics or math grad who has done bayesian stats is at no disadvantage on the machine learning stuff, but what do you mean by “belief networks background”?
There is ton of knowledge about probabilistic processes defined by networks in various ways, numerical methods for inference in them, clustering, etc. All the fundamental stuff in this range has applications to physics, and some of it was known in physics before getting reinvented in machine learning, so in principle a really good physics grad could know that stuff, but it’s more than standard curriculum requires. On the other hand, it’s much more directly relevant to probabilistic methods in machine learning. Of course both should have good background in statistics and bayesian probability theory, but probabilistic analysis of nontrivial processes in particular adds unique intuitions that a physics grad won’t necessarily possess.
Re: “What does an average AI prof know that a physics graduate who can code doesn’t know? I’m struggling to name even one thing. If you set the two of them to code AI for some competition like controlling a robot, I doubt that there would be much advantage to the AI guy.”
A very odd opinion. We have 60 years of study of the field, and have learned quite a bit, judging by things like the state of translation and speech recognition.
The AI prof is more likely to know more things that don’t work and the difficulty of finding things that do. Which is useful knowledge when predicting the speed of AI development, no?
Dan Dennett and Douglas Hofstadater don’t think machine intelligence is coming anytime soon. Those folk actually know something about machine intelligence, too!
Another argument against the difficulties of self-modeling point: It’s possible to become more capable by having better theories rather than by having a complete model, and the former is probably more common.
It could notice inefficiencies in its own functioning, check to see if the inefficiencies are serving any purpose, and clean them up without having a complete model of itself.
Suppose a self-improving AI is too cautious to go mucking about in its own programming, and too ethical to muck about in the programming of duplicates of itself. It still isn’t trapped at its current level, even aside from the reasonable approach of improving its hardware, though that may be a more subtle problem than generally assumed.
What if it just works on having a better understanding of math, logic, and probability?
Of course, it could just add complexity and hope that it works, but that’s just evolution, not intelligence explosion.
The critical aspect of a “major-impact intelligence-explosion singularity” isn’t the method for improvement but the rate of improvement. If computer processing power continues to grow at an exponential rate, even an inefficiently improving AI will have the growth in raw computing power behind it.
So: do you know any counterarguments or articles that address either of these points?
I don’t have any articles but I’ll take a stab at counterarguments.
A Majoritarian counterargument: AI turned out to be harder and further away than originally thought. The general view is still tempered by the failure of AI to live up to those expectations. In short, the AI researchers cried “wolf!” too much 30 years ago and now their predictions aren’t given much weight because of that bad track record.
A mind can’t understand itself counterargument: Even accepting as a premise that a mind can’t completely understand itself, that’s not an argument that it can’t understand itself better than it currently does. The question then becomes which parts of the AI mind are important for reasoning/intelligence and can an AI understand and improve that capability at a faster rate than humans.
Re: “If nothing else, you might think that someone could exaggerate the threat of the singularity and use it to scare people into giving them government funds. But we don’t even see that happening.”
? I see plenty of scaremongering around machine intelligence. So far, few governments have supported it—which seems fairly sensible of them.
imminent, so important, and furthermore so sensitive to initial conditions
To a risk-neutral consequentialist or to an elected or corporate official?
Crash programs in basic science because of speculative applications are very uncommon. Decades of experimentation with nuclear fission only brought a crash program with the looming threat of the Nazis, and after a practical demonstration of a chain reaction.
Over the short time spans over which governments make their plans, the probability of big advances in AI basic science coming is relatively small, even if substantially over the longer term. So you get all the usual issues with attending to improbable (in any given short period) dangers that no one has recent experience with. Note things like hurricane Katrina, the Gulf oil spill, etc. The global warming effects of fossil fuel use have been seen as theoretically inevitable since at least the Eisenhower administration, and momentum for action has only gotten mobilized after a long period of actual warming providing pretty irrefutable (and yet widely rejected anyway!). evidence.
A prima facie case against the likelihood of a major-impact intelligence-explosion singularity:
Firstly, the majoritarian argument. If the coming singularity is such a monumental, civilization-filtering event, why is there virtually no mention of it in the mainstream? If it is so imminent, so important, and furthermore so sensitive to initial conditions that a small group of computer programmers can bring it about, why are there not massive governmental efforts to create seed AI? If nothing else, you might think that someone could exaggerate the threat of the singularity and use it to scare people into giving them government funds. But we don’t even see that happening.
Second, a theoretical issue with self-improving AI: can a mind understand itself? If you watch a simple linear Rube Goldberg machine in action, then you can more or less understand the connection between the low- and the high-level behavior. You see all the components, and your mind contains a representation of those components and of how they interact. You see your hand, and understand how it is made of fingers. But anything more complex than an adder circuit quickly becomes impossible to understand in the same way. Sure, you might in principle be able to isolate a small component and figure out how it works, but your mind simply doesn’t have the capacity to understand the whole thing. Moreover, in order to improve the machine, you need to store a lot of information outside your own mind (in blueprints, simulations, etc.) and rely on others who understand how the other parts work.
You can probably see where this is going. The information content of a mind cannot exceed the amount of information necessary to specify a representation of that same mind. Therefore, while the AI can understand in principle that it is made up of transistors etc., its self-representation necessary has some blank areas. I posit that the AI cannot purposefully improve itself because this would require it to understand in a deep, level-spanning way how it itself works. Of course, it could just add complexity and hope that it works, but that’s just evolution, not intelligence explosion.
So: do you know any counterarguments or articles that address either of these points?
Two counters to the majoritarian argument:
First, it is being mentioned in the mainstream—there was a New York Times article about it recently.
Secondly, I can think of another monumental, civilisation-filtering event that took a long time to enter mainstream thought—nuclear war. I’ve been reading Bertrand Russel’s autobiography recently, and am up to the point where he begins campaigning against the possibility of nuclear destruction. In 1948 he made a speech to the House of Lords (UK’s upper chamber), explaining that more and more nations would attempt to acquire nuclear weapons, until mutual annihilation seemed certain. His fellow Lords agreed with this, but believed the matter to be a problem for their grandchildren.
Looking back even further, for decades after the concept of a nuclear bomb was first formulated, the possibility of nuclear was was only seriously discussed amongst physicists.
I think your second point is stronger. However, I don’t think a single AI rewiring itself is the only way it can go FOOM. Assume the AI is as intelligent as a human; put it on faster hardware (or let it design its own faster hardware) and you’ve got something that’s like a human brain, but faster. Let it replicate itself, and you’ve got the equivalent of a team of humans, but which have the advantages of shared memory and instantaneous communication.
Now, if humans can design an AI, surely a team 1,000,000 human equivalents running 1000x faster can design an improved AI?
If your argument is based on information capacity alone, it can be knocked down pretty easily. An AI can understand some small part of its design and improve that, then pick another part and improve that, etc. For example, if the AI is a computer program, it has a sure-fire way of improving itself without completely understanding its own design: build faster processors. Alternatively you could imagine a population of a million identical AIs working together on the problem of improving their common design. After all, humans can build aircraft carriers that are too complex to be understood by any single human. Actually I think today’s humanity is pretty close to understanding the human mind well enough to improve it.
I don’t think the number of AIs actually matters. If multiple AI’s can do a job, then a single AI should be able to simulate them as though it was multiple AI’s (or better yet just figure out how to do it on it’s own) and then do it as well. Another thing to note is that if the AI makes a copy of its program and puts it in external storage, it doesn’t add any extra complexity to itself. It can then run it’s optimization process on it, although I do agree that it would be more practical if it only improved parts of itself at a time.
You’re right, I used the million AIs as an intuition pump, imitating Eliezer’s That Alien Message.
It depends upon what designing a mind is like. How much minds intrinsically rely on interactions between parts and how far those interactions reach.
In the brain most of the interesting stuff such as science and the like is done by culturally created components. The evidence for this is the stark variety of the worldviews that exist in the world and have existed in history (with most of the same genes) and the ways those views made those that hold them interact with the world.
Making a powerful AI, in this view, is not just a problem of making a system with lots of hardware or the right algorithms from birth; it is a problem of making a system with the right ideas. And ideas interact heavily in the brain. They can squash or encourage each other. If one idea goes, others that rely on it might go as well.
I suspect that we might be close to making the human mind able to store more ideas or make the ideas process more quickly. How much that will lead to the creation of better ideas I don’t know. That is will we get a feedback loop? We might just get better at storing gossip and social information.
This is strictly true if you’re talking about the working memory that is part of a complete model of your “mind”. But a mind can access an unbounded amount of externally stored data, where a complete self-representation can be stored.
A Turing Machine of size N can run on an unbounded-size tape. A von Neumann PC with limited main memory can access an unbounded-size disk.
Although we can only load a part of the data into working memory at a time, we can use virtual memory to run any algorithm written in terms of the data as a whole. If we had an AI program, we could run it on today’s PCs and while we could run out of disk space, we couldn’t run out of RAM.
I’d just forget the majoritarian argument altogether, it’s a distraction.
The second question does seem important to me, I too am skeptical that an AI would “obviously” have the capacity to recursively self-improve.
The counter-argument is summarized here, whereas we humans are stuck with an implementation substrate which was never designed for understandability, an AI could be endowed with both a more manageable internal representation of its own capacities and a specifically designed capacity for self-modification.
It’s possible—and I find it intuitively plausible—that there is some inherent general limit to a mind’s capacity for self-knowledge, self-understanding and self-modification. But an intuition isn’t an argument.
I see Yoreth’s version of the majoritarian argument as ahistorical. The US Government did put a lot of money into AI research and became disillusioned. Daniel Crevier wrote a book AI: The tumultuous history of the search for artificial intelligence. It is a history book. It was published in 1993, 17 years ago.
There are two possible responses. One might argue that time has moved on, things are different now, and there are serious reasons to distinguish today’s belief that AI is around the corner from yesterday’s belief that AI is around the corner. Wrong then, right now, because...
Alternatively one might argue that scaling died at 90 nanometers, practical computer science is just turning out Java monkeys, the low hanging fruit has been picked, there is no road map, theoretical computer science is a tedious sub-field of pure mathematics, partial evaluation remains an esoteric backwater, theorem provers remain an esoteric backwater, the theorem proving community is building the wrong kind of theorem provers and will not rejuvenate research into partial evaluation,...
The lack of mainstream interest in explosive developments in AI is due to getting burned in the past. Noticing that the scars are not fading is very different from being unaware of AI.
I’m reminded of a historical analogy from reading Artificial Addition. Think of it this way: a society that believes addition is the result of adherence to a specific process (or a process isomorphic thereto), and understands part of that process, is closer to creating “general artificial addition” than one that tries to achieve “GAA” by cleverly avoiding the need to discover this process.
We can judge our own distance to artificial general intelligence, then, by the extent to which we have identified constraints that intelligent processes must adhere to. And I think we’ve seen progress on this in terms of more refined understanding of e.g. how to apply Bayesian inference. For example, the work by Sebastian Thrun on how to seamlessly aggregate knowledge across sensors to create a coherent picture of the environment, which has produced tangible results (navigating the desert).
Can you point me to an overview of this understanding? I would like to apply it to the problem of detecting different types of data in a raw binary file.
I don’t know of a good one. You could try this, but it’s light on the math. I’m looking through Thrun’s papers to find a good one that gives a simple overview of the concepts, and through the CES documentation.
I was introduced to this advancement in EY’s Selling nonapples article.
And I’m not sure how this helps for detecting file types. I mean, I understand generally how they’re related, but not how it would help with the specifics of that problem.
Thanks I’ll have a look. I’m looking for general purpose insights. Otherwise you could use the same sorts of reasoning to argue that the technology behind deep blue was on the right track.
True, the specific demonstration of Thrun’s that referred to was specific to navigating a terrestrial desert environment, but it was a much more general problem than chess, and had to deal with probabilistic data and uncertainty. The techniques detailed in Thrun’s papers easily generalize beyond robotics.
I’ve had a look, and I don’t see anything much that will make the techniques easily generalize to my problems (or any problem that has similar characteristics to mine, such as very large amounts of possibly relevant data). Oh, I am planning to use bayesian techniques. But easy is not how I would characterize the translating of the problem.
Now that you mention it, one of the reasons I’m trying to get acquainted with the methods Thrun uses is to see how much they rely on advance knowledge of exactly how the sensor works (i.e. its true likelihood function). Then, I want to see if it’s possible to infer enough relevant information about the likelihood function (such as through unsupervised learning) so that I can design a program that doesn’t have to be given this information about the sensors.
And that’s starting to sound more similar to what you would want to do.
That’d be interesting. More posts on the real world use of bayesian models would be good for lesswrong I think.
But I’m not sure how relevant to my problem. I’m in the process of writing up my design deliberations and you can judge better once you have read them.
Looking forward to it!
The reason I say that our problems are related is that inferring the relevant properties of a sensor’s likelihood function looks like a standard case of finding out how the probability distribution clusters. Your problem, that of identifying a file type from its binary bitstream, is doing something similar—finding what file types have what PD clusters.
I know of partial evaluation in the context of optimization, but I hadn’t previously heard of much connection between that and AI or theorem provers. What do you see as the connection?
Or, more concretely: what do you think would be the right kind of theorem provers?
I think I made a mistake in mentioning partial evaluation. It distracts from my main point. The point I’m making a mess of is that Yoreth asks two questions:
I read (mis-read?) the rhetoric here as containing assumptions that I disagree with. When I read/mis-read it I feel that I’m being slipped the idea that governments have never been interested in AI. I also pick up a whiff of “the mainstream doesn’t know, we must alert them.” But mainstream figures such as John McCarthy and Peter Norvig know and are refraining from sounding the alarm.
So partial evaluation is a distraction and I only made the mistake of mentioning it because it obsesses me. But it does! So I’ll answer anyway ;-)
Why am I obsessed? My Is Lisp a Blub post suggests one direction for computer programming language research. Less speculatively, three important parts of computer science are compiling (ie hand compiling), writing compilers, and tools such as Yacc for compiling compilers. The three Futamura projections provide a way of looking at these three topics. I suspect it is the right way to look at them.
Lambda-the-ultimate had an interesting thread on the type-system feature-creep death-spiral. Look for the comment By Jacques Carette at Sun, 2005-10-30 14:10 linking to Futamura’s papers. So there is the link to having a theorem proving inside a partial evaluator.
Now partial evaluating looks like it might really help with self-improving AI. The AI might look at its source, realise that the compiler that it is using to compile itself is weak because it is a Futamura projection based compiler with an underpowered theorem prover, prove some of the theorems itself, re-compile, and start running faster.
Well, maybe, but the overviews I’ve read of the classic text by Jones, Gomard, and Sestoft, make me think that the start of the art only offers linear speed ups. If you write a bubble sort and use partial evaluation to compile it, it stays order n squared. The theorem prover will never transform to an n log n algorithm.
I’m trying to learn ACL2. It is a theorem prover and you can do things such as proving that quicksort and bubble sort agree. That is a nice result and you can imagine that fitting into a bigger picture. The partial evaluator wants to transform a bubble sort into something better, and the theorem prover can annoint the transformation as correct. I see two problems.
First, the state of the art is a long way from being automatic. You have to lead the theorem prover by the hand. It is really just a proof checker. Indeed the ACL2 book says
it is a long way from proving (bubble sort = quick sort) on its own.
Second that doesn’t actually help. There is no sense of performance here. It only says that they agree, without saying which is faster. I can see a way to fix this. ACL2 can be used to prove that interpreters conform to their semantics. Perhaps it can be used to prove that an instrumented interpreter performs a calculation in fewer than n log n cycles. Thus lifting the proofs from proofs about programs to proofs about interpreters running programs would allow ACL2 to talk about performance.
This solution to problem two strikes me as infeasible. ACL2 cannot cope with the base level without hand holding, which I have not managed to learn to give. I see no prospect of lifting the proofs to include performance without adding unmanageable complications.
Could performance issues be built in to a theorem prover, so that it natively knows that quicksort is faster than bubble sort, without having to pass its proofs through a layer of interpretation? I’ve no idea. I think this is far ahead of the current state of computer science. I think it is preliminary to, and much simple than, any kind of self-improving artificial intelligence. But that is what I had in mind as the right kind of theorem prover.
There is a research area of static analysis and performance modelling. One of my Go playing buddies has just finished a PhD in it. I think that he hopes to use the techniques to tune up the performance of the TCP/IP stack. I think he is unaware of and uninterested in theorem provers. I see computer science breaking up into lots of little specialities, each of which takes half a life time to master. I cannot see the threads being pulled together until the human lifespan is 700 years instead of 70.
Ah, thanks, I see where you’re coming from now. So ACL2 is pretty much state-of-the-art from your point of view, but as you point out, it needs too much handholding to be widely useful. I agree, and I’m hoping to build something that can perform fully automatic verification of nontrivial code (though I’m not focusing on code optimization).
You are right of course that proving quicksort is faster than bubble sort, is even considerably more difficult than proving it is equivalent.
But the good news is, there is no need! All we need to do to check which is faster, is throw some sample inputs at each and run tests. To be sure, that approach is fallible, but what of it? The optimized version only needs to be probably faster than the original. A formal guarantee is only needed for equivalence.
“But the good news is, there is no need! All we need to do to check which is faster, is throw some sample inputs at each and run tests.”
“no need”? Sadly, it’s hard to use such simple methods as anything like a complete replacement for proofs. As an example which is simultaneously extreme and simple to state, naive quicksort has good expected asymptotic performance, but its (very unlikely) worst-case performance falls back to bubble sort. Thus, if you use quicksort naively (without, e.g., randomizing the input in some way) somewhere where an adversary has strong influence over the input seen by quicksort, you can create a vulnerability to a denial-of-service attack. This is easy to understand with proofs, not so easy either to detect or to quantify with random sampling. Also, the pathological input has low Kolmogorov complexity, so the universe might well happen give it to your system accidentally even in situations where your aren’t faced by an actual malicious intelligent “adversary.”
Also sadly, we don’t seem to have very good standard technology for performance proofs. Some years ago I made a horrendous mistake in an algorithm preprint, and later came up with a revised algorithm. I also spent more than a full-time week studying and implementing a published class of algorithms and coming to the conclusion that I had wasted my time because the published claimed performance is provably incorrect. Off and on since then I’ve put some time into looking at automated proof systems and the practicalities of proving asymptotic performance bounds. The original poster mentioned ACL2; I’ve looked mostly at HOL Light (for ordinary math proofs) and to a lesser extent Coq (for program/algorithm proofs). The state of the art for program/algorithm proofs doesn’t seem terribly encouraging. Maybe someday it will be a routine master’s thesis to, e.g., gloss Okasaki’s Purely Functional Data Structures with corresponding performance proofs, but we don’t seem to be quite there yet.
Part of the problem with these is that there are limits to how much can be proven about correctness of programs. In particular, the general question of whether two programs will give the same output on all inputs is undecidable.
Proposition: There is no Turing machine which when given the description of two Turing machines accepts iff both the machines will agree on all inputs.
Proof sketch: Consider our hypothetical machine A that accepts descriptions iff they correspond to two Turing machines which agree on all inputs. We shall show that how we can construct a machine H from A which would solve the halting problem. Note that for any given machine D we can construct a machine [D, s] which mimics D when fed input string s (simply append states to D so that the machine first erases everything on the tape, writes out s on the tape and then executed the normal procedure for D). Then, to determine whether a given machine T accepts a given input s, ask machine A whether [T,s] agrees with the machine that always accepts. Since we’ve now constructed a Turing machine which solves the haling problem, our original assumption, the existence of A must be false.
There are other theorems of a similar nature that can be proven with more work. The upshot is that in general, there are very few things that a program can say about all programs.
Wouldn’t it have been easier to just link to Rice’s theorem?
I didn’t remember the name of the theorem and my Googlefu is weak,
True. Test inputs suffice for an optimizer that on average wins more than it loses, which is good enough to be useful, but if you want guaranteed efficiency, that comes back to proof, and the current state-of-the-art is a good way short of doing that in typical cases.
Partial evaluation is interesting to me in a AI sense. If you haven’t have a look at the 3 projections of Futamura.
But instead of compilers and language specifications you have learning systems and problem specifications. Or something along those lines.
Right, that’s optimization again. Basically the reason I’m asking about this is that I’m working on a theorem prover (with the intent of applying it to software verification), and if Alan Crowe considers current designs the wrong kind, I’m interested in ideas about what the right kind might be, and why. (The current state of the art does need to be extended, and I have some ideas of my own about to do that, but I’m sure there are things I’m missing.)
Why is the word obviously in quotes?
Because I am not just saying it’s not obvious an AI would recursively self-improve, I’m also referring to Eliezer’s earlier claims that such recursive self-improvement (aka FOOM) is what we’d expect given our shared assumptions about intelligence. I’m sort-of quoting Eliezer as saying FOOM obviously falls out of these assumptions.
I’m worried about the “sort-of quoting” part. I get nervous when people put quote marks around things that aren’t actually quotations of specific claims.
Noted, and thanks for asking. I’m also somewhat over-fond of scare quotes to denote my using a term I’m not totally sure is appropriate. Still, I believe my clarification above is sufficient that there isn’t any ambiguity left now as to what I meant.
Stephen Hawking, Martin Rees, Max Tegmark, Nick Bostrom, Michio Kaku, David Chalmers and Robin Hanson are all smart people who broadly agree that >human AI in the next 50-100 years is reasonably likely (they’d all give p > 10% to that with the possible exception of Rees). On the con side, who do we have? To my knowledge, no one of similarly high academic rank has come out with a negative prediction.
Edit: See Carl’s comment below. Arguing majoritarianism against a significant chance of AI this century is becoming less tenable, as a significant set of experts come down on the “yes” side.
It is notable that I can’t think of any very reputable nos. The ones that come to mind are Jaron Lanier and that Glenn Zorpette.
10% is a low bar, it would require a dubiously high level of confidence to rule out AI over a 90 year time frame (longer than the time since Turing and Von Neumann and the like got going, with a massively expanding tech industry, improved neuroimaging and neuroscience, superabundant hardware, and perhaps biological intelligence enhancement for researchers). I would estimate the average of the group you mention as over 1/3rd by 2100. Chalmers says AI is more likely than not by 2100, I think Robin and Nick are near half, and I am less certain about the others (who have said that it is important to address AI or AI risks but not given unambiguous estimates).
Here’s Ben Goertzel’s survey. I think that Dan Dennett’s median estimate is over a century, although at the 10% level by 2100 I suspect he would agree. Dawkins has made statements that suggest similar estimates, although perhaps with someone shorter timelines. Likewise for Doug Hofstadter, who claimed at the Stanford Singularity Summit to have raised his estimate of time to human-level AI from 21st century to mid-late millenium, although he weirdly claimed to have done so for non-truth-seeking reasons.
None of those people are AI theorists so it isn’t clear that their opinions should get that much weight given that it is outside their area of expertise (incidentally, I’d be curious what citation you have for the Hawking claim). From the computer scientists I’ve talked to, the impression I get is that they see AI as such a failure that most of them just aren’t bothering to do much in the way of research in it except for narrow purpose machine learning or expert systems. There’s also an issue of a sampling bias: the people who think a technology is going to work are generally more loud about that than people who think it won’t. For example, a lot of physicists are very skeptical of Tokamak fusion reactors being practical anytime in the next 50 years, but the people who talk about them a lot are the people who think they will be practical.
Note also that nothing in Yoreth’s post actually relied on or argued that there won’t be moderately smart AI so it doesn’t go against what he’s said to point out that some experts think there will be very smart AI (although certainly some people on that list, such as Chalmers and Hanson do believe that some form of intelligence explosion like event will occur). Indeed, Yoreth’s second argument applies roughly to any level of intelligence. So overall, I don’t think the point about those individuals does much to address the argument.
I disagree with this, basically because AI is a pre-paradigm science. Having been at a big CS/AI dept, I know that the amount of accumulated wisdom about AI is virtually nonexistent compared to that for physics.
What does an average AI prof know that a physics graduate who can code doesn’t know? I’m struggling to name even one thing. If you set the two of them to code AI for some competition like controlling a robot, I doubt that there would be much advantage to the AI guy.
The only examples of genuine scientific insight in AI I have seen are in the works of Pearl, Hutter, Drew McDerrmot and recently Josh Tenebaum.
That’s a very good point. The AI theorist presumably knows more about avenues that have not done very well (neural nets, other forms of machine learning, expert systems) but isn’t likely to have much general knowledge. However, that does mean the AI individual has a better understanding of how many different approaches to AI have failed miserably. But that’s just a comparison to your example of the physics grad student who can code. Most of the people you mentioned in your reply to Yoreth are clearly people who have knowledge bases closer to that of the AI prof than to the physics grad student. Hanson certainly has looked a lot at various failed attempts at AI. I think I’ll withdraw this argument. You are correct that these individuals on the whole are likely to have about as much relevant expertise as the AI professor.
Upvoted for honest debating!
So people with no experience programming robots but who know the equations governing them would just be able to, on the spot, come up with comparable code to AI profs? What do they teach in AI courses, if not the kind of thing that would make you better at this?
How to code, and rookie Bayesian stats/ML, plus some other applied stuff, like statistical Natural Language Processing (this being an application of the ML/stats stuff, but there are some domain tricks and tweaks you need).
The point is that there would only be experience, not theory, separating someone who knew Bayesian stats, coding and how to do science from an AI “specialist”. Yes, there are little shortcuts and details that a PhD in AI would know, but really there’s no massive intellectual gulf there.
I am gratified to find that someone else shares this opinion.
A better way to phrase the question might be: what can an average AI prof. do that a physics graduate who can code, can’t?
Each prof will, of course, have a niche app that they do well (in fact sometimes there is too much pressure to have a “trick” you can do to justify funding), but the key question is: are they more like a software engineer masquerading as a scientist than a real scientist? Do they have a paradigm and theory that enables thousands of engineers to move into completely new design-spaces?
I think that the closest we have seen is the ML revolution, but when you look at it, it is not new science, it is just statistics correctly applied.
I have seen some instances of people trying to push forward the frontier, such as the work of Hutter, but it is very rare.
Statistics vs machine learning: FIGHT!
Could you clarify exactly what Hutter has done that has advanced the frontier? I used to be very nearly a “Hutter enthusiast”, but I eventually concluded that his entire work is:
“Here’s a few general algorithms that are really good, but take way too long to be of any use whatsoever.”
Am I missing something? Is there something of his I should read that will open my eyes to the ease of mechanizing intelligence?
I think that the way of looking at the problem that he introduced is the key, i.e. thinking of the agent and environment as programs. The algorithms (AIXI, etc) are just intuition pumps.
Surely everyone has been doing that from the beginning.
This seems like a fairly reasonable description of the work’s impact:
“Another theme that I picked up was how central Hutter’s AIXI and my work on the universal intelligence measure has become: Marcus and I were being cited in presentations so often that by the last day many of the speakers were simply using our first names. As usual there were plenty of people who disagree with our approach, however it was clear that our work has become a major landmark in the area.”
http://www.vetta.org/2010/03/agi-10-and-fhi/
But why does it get those numerous citations? What real-world, non-academic consequences have resulted from this massive usage of Hutter’s intelligence definition, which would distinguish it from a mere mass frenzy?
No time for a long explanation from me—but “universal intelligence” seems important partly since it shows how simple an intelligent agent can be—if you abstract away most of its complexity into a data-compression system. It is just a neat way to break down the problem.
Machine learning, more math/probability theory/belief networks background?
A good physics or math grad who has done bayesian stats is at no disadvantage on the machine learning stuff, but what do you mean by “belief networks background”?
Do you mean “deep belief networks”?
There is ton of knowledge about probabilistic processes defined by networks in various ways, numerical methods for inference in them, clustering, etc. All the fundamental stuff in this range has applications to physics, and some of it was known in physics before getting reinvented in machine learning, so in principle a really good physics grad could know that stuff, but it’s more than standard curriculum requires. On the other hand, it’s much more directly relevant to probabilistic methods in machine learning. Of course both should have good background in statistics and bayesian probability theory, but probabilistic analysis of nontrivial processes in particular adds unique intuitions that a physics grad won’t necessarily possess.
Re: “What does an average AI prof know that a physics graduate who can code doesn’t know? I’m struggling to name even one thing. If you set the two of them to code AI for some competition like controlling a robot, I doubt that there would be much advantage to the AI guy.”
A very odd opinion. We have 60 years of study of the field, and have learned quite a bit, judging by things like the state of translation and speech recognition.
The AI prof is more likely to know more things that don’t work and the difficulty of finding things that do. Which is useful knowledge when predicting the speed of AI development, no?
Which things?
Trying to model the world as crisp logical statements a la block worlds for example.
That being in the “things that don’t work” category?
Yup… which things were you asking for? Examples of things that do work? You don’t actually need to find them to know that they are hard to find!
I think Hofstadter could fairly be described as an AI theorist.
So could Robin Hanson.
Dan Dennett and Douglas Hofstadater don’t think machine intelligence is coming anytime soon. Those folk actually know something about machine intelligence, too!
Re: “can a mind understand itself?”
That is no big deal: copy the mind a few billion times, and then it will probably collectively manage to grok its construction plans well enough.
Another argument against the difficulties of self-modeling point: It’s possible to become more capable by having better theories rather than by having a complete model, and the former is probably more common.
It could notice inefficiencies in its own functioning, check to see if the inefficiencies are serving any purpose, and clean them up without having a complete model of itself.
Suppose a self-improving AI is too cautious to go mucking about in its own programming, and too ethical to muck about in the programming of duplicates of itself. It still isn’t trapped at its current level, even aside from the reasonable approach of improving its hardware, though that may be a more subtle problem than generally assumed.
What if it just works on having a better understanding of math, logic, and probability?
In addition to theoretical objections, I think the majoritarian argument is factually wrong. Remember, ‘future is here, just not evenly distributed’.
http://www.google.com/trends?q=singularity shows a trend
http://www.nytimes.com/2010/06/13/business/13sing.html?pagewanted=all—this week in NYT. Major MSFT and GOOG involvement.
http://www.acceleratingfuture.com/michael/blog/2010/04/transhumanism-has-already-won/
Re: “http://www.google.com/trends?q=singularity shows a trend”
Not much of one—and also, this is a common math term—while:
“Your terms—“technological singularity”—do not have enough search volume to show graphs.”
The critical aspect of a “major-impact intelligence-explosion singularity” isn’t the method for improvement but the rate of improvement. If computer processing power continues to grow at an exponential rate, even an inefficiently improving AI will have the growth in raw computing power behind it.
I don’t have any articles but I’ll take a stab at counterarguments.
A Majoritarian counterargument: AI turned out to be harder and further away than originally thought. The general view is still tempered by the failure of AI to live up to those expectations. In short, the AI researchers cried “wolf!” too much 30 years ago and now their predictions aren’t given much weight because of that bad track record.
A mind can’t understand itself counterargument: Even accepting as a premise that a mind can’t completely understand itself, that’s not an argument that it can’t understand itself better than it currently does. The question then becomes which parts of the AI mind are important for reasoning/intelligence and can an AI understand and improve that capability at a faster rate than humans.
Re: “If nothing else, you might think that someone could exaggerate the threat of the singularity and use it to scare people into giving them government funds. But we don’t even see that happening.”
? I see plenty of scaremongering around machine intelligence. So far, few governments have supported it—which seems fairly sensible of them.
How do we know that governments aren’t secretly working on AI?
Is it worth speculating about the goals which would be built into a government-designed AI?
Regarding majoritarianism:
Crash programs in basic science because of speculative applications are very uncommon. Decades of experimentation with nuclear fission only brought a crash program with the looming threat of the Nazis, and after a practical demonstration of a chain reaction.
Over the short time spans over which governments make their plans, the probability of big advances in AI basic science coming is relatively small, even if substantially over the longer term. So you get all the usual issues with attending to improbable (in any given short period) dangers that no one has recent experience with. Note things like hurricane Katrina, the Gulf oil spill, etc. The global warming effects of fossil fuel use have been seen as theoretically inevitable since at least the Eisenhower administration, and momentum for action has only gotten mobilized after a long period of actual warming providing pretty irrefutable (and yet widely rejected anyway!). evidence.