AIFoom Debate—conclusion?
I’ve been going through the AIFoom debate, and both sides makes sense to me. I intend to continue, but I’m wondering if there’re already insights in LW culture I can get if I just ask for them.
My understanding is as follows:
The difference between a chimp and a human is only 5 million years of evolution. That’s not time enough for many changes.
Eliezer takes this as proof that the difference between the two in the brain architecture can’t be much. Thus, you can have a chimp-intelligent AI that doesn’t do much, and then with some very small changes, suddenly get a human-intelligent AI and FOOM!
Robin takes the 5-million year gap as proof that the significant difference between chimps and humans is only partly in the brain architecture. Evolution simply can’t be responsible for most of the relevant difference; the difference must be elsewhere.
So he concludes that when our ancestors got smart enough for language, culture became a thing. Our species stumbled across various little insights into life, and these got passed on. An increasingly massive base of cultural content, made of very many small improvements is largely responsible for the difference between chimps and humans.
Culture assimilated new information into humans much faster than evolution could.
So he concludes that you can get a chimp-level AI, and to get up to human-level will take, not a very few insights, but a very great many, each one slowly improving the computer’s intelligence. So no Foom, it’ll be a gradual thing.
So I think I’ve figured out the question. Is there a commonly known answer, or are there insights towards the same?
It is true that a young human (usually) starts with a ton of cultural knowledge that a young chimp doesn’t have.
It is also true that if you would try giving the cultural knowledge to the young chimp, they wouldn’t be able to process it.
Therefore, culture is important, but the genetic adaptation that makes culture possible is also important.
If the AI would get the ability to use human culture, after connecting to internet it would be able to use the human culture just like humans; maybe even better, because humans are usually limited to a few cultures and subcultures, while a sufficiently powerful AI could use them all.
Also, culture is essentially important because humans are mortal, and live relatively shortly compared with how much information they have available. (Imagine that you are an immortal vampire with perfect memory; how many years would it take you to become an expert at everything humans know and do, assuming that the rest of humankind remains frozen in the state of 2016?) Thus culture is the only way to go beyond the capacity of the individual. Also, some experiments get you killed, and culture is a way to get this knowledge. The AI with sufficiently great memory and processing speed would have less need for culture than humans do.
I don’t know if I’m saying anything that hasn’t been said before elsewhere, but looking at the massive difference in intelligence between humans seems like a strong argument for FOOM to me. Humans are basically all the same. We have 99.99% the same DNA, the same brain structure, size, etc. And yet some humans have exceptional abilities.
I was just reading about Paul Erdos. He could hold 3 conversations at the same time, with mathematicians on highly technical subjects. He was constantly having insights into mathematical research left and right. He produced more papers than any other mathematician.
I don’t think it’s a matter of culture. I don’t think an average person could “learn” to have a higher IQ, let alone be Erdos. And yet he very likely had the same brain structure as everyone else. Who knows what would be possible if you are allowed to move far outside the space of humans.
But this isn’t the (main) argument Yudkowsky uses. He relies on this intuition that I don’t think was explicitly stated or argued strongly enough. This one intuition is central to all the points about recursive self improvement.
It’s that humans kind of suck. At least at engineering and solving complicated technical problems. We weren’t evolved to be good at it. There are many cases where simple genetic algorithms outperform humans. Humans outperform GAs in other cases of course, but it shows we are far from perfect. Even in the areas where we do well, we have trouble keeping track of many different things in our heads. We are very bad at prediction and pattern matching compared to small machine learning algorithms much of the time.
I think this intuition that “humans kind of suck” and “there are a lot of places we could make big improvements” is at the core of the FOOM debate and most these AI risk debates. If you really believe this, then it seems almost obvious that AI will very rapidly become much smarter than humans. People that don’t have this seem to believe that AI is going to be very slow. Perhaps with steep diminishing returns.
To riff on your theme a little bit, maybe one area where genetic algorithms (or other comparably “simplistic” approaches) could shine is in the design of computer algorithms, or some important features thereof.
Well actually GAs aren’t that good at algorithms. Because slightly mutating an algorithm usually breaks it, or creates an entirely different algorithm. So the fitness landscape isn’t that gentle.
You can do a bit better if you work with circuits instead. And even better if you make the circuits continuous, so small mutations create small changes in output. And you can optimize these faster with gradient descent instead of GAs.
And then you have neural networks, which are quite successful.
https://en.wikipedia.org/wiki/Neuroevolution “Neuroevolution, or neuro-evolution, is a form of machine learning that uses evolutionary algorithms to train artificial neural networks. It is most commonly applied in artificial life, computer games, and evolutionary robotics. A main benefit is that neuroevolution can be applied more widely than supervised learning algorithms, which require a syllabus of correct input-output pairs. In contrast, neuroevolution requires only a measure of a network’s performance at a task. For example, the outcome of a game (i.e. whether one player won or lost) can be easily measured without providing labeled examples of desired strategies.”
One main thing from which foom depends is the number of AIs which are fooming simultaneously. If it only one, it is real foom. If hundreds, it is just a transition to new equilibrium where will be many superintelligent agents.
The answer on one or many AIs critically depends on the speed of fooming. If fooming doubling time is miliseconds, than one AI wins. But if it weeks or months there will be many fooming AIs, which may result in war between AIs or equilibrium.
But what is more important here is a question: how strong fooming is compare to overall speed of progress in the AI field? If AI is fooming with doubling time of 3 weeks, but the field has speed of 1 month, its is not real fooming.
If AI depends of one crutial insight, which will result in 10 000 times improvement, that will be the real fooming.
Don’t waste your time with the AI foom stuff. The commonly held opinion by experts that work with actual AI technology is that any sort of hard takeoffs will have timelines that are not small on human scales, at least with technology that exists today.
You need only do a few back of the envelope calculations using various AI techniques and AGI architectures to see that learning cycles are measured in hours or days, not milliseconds.
Yeah but that’s still orders of magnitude faster than humans. AI may not be able to do everything humans can, but what they can do, they can do much faster. No human can learn to be an expert at Go in a week, babies take longer than a day to learn to see, and much longer than a day to learn the basics of language.
Did Google actually say how long it took to train Alpha Go? In any case, even if it took a week or less, that is not strong evidence that an AGI could go from knowing nothing to knowing a reasonable amount in a week. It could easily take months, even if it would learn faster than a human being. You need to learn a lot more for general intelligence than to play Go.
They did. In the methodology part they give an exact breakdown of how much wallclock time it took to train each step (I excerpted it in the original discussion here or on Reddit), which was something like 5 weeks total IIRC. Given the GPU counts on the various steps, it translated to something like 2 years on a regular laptop GPU, so the parallelization really helped; I don’t know what the limit on parallelization for reinforcement learning is, but note the recent DeepMind paper establishing that you can even throw away experience-replay entirely if you go all-in on parallelization (since at least one copy will tend to be playing something relevant while the others explore, preventing catastrophic forgetting), so who knows what one could do with 1k GPUs or a crazy setup like that?
The answer is “mine Bitcoin in the pre-FPGA days” :-)
This year Nvidia is releasing its next generation of GPUs (Pascal) which is supposed to provide a major speed-up (on the order of 10x) for neural net applications.
First, at least it establishes a minimum. If an AI can learn the basics of English in a day, then it still has that much of a head start against humans. Even if it takes longer to master the rest of language, you can at least cut 3 years off the training time, and presumably the rest can be learned at a rapid rate as well.
It also establishes that AI can teach itself specialized skills very rapidly. Today it learns the basics of language, tomorrow it learns the basics of programming, the day after it learns vision, and then it can learns engineering nanotechnology, etc. This is an ability far above what humans can do, and would give it a huge advantage.
Finally, even if it takes months, that’s still FOOM. I don’t know where the cutoff point is, but anything that advances at a pace that rapid is dangerous. It’s very different than than the alternative “slow takeoff” scenarios where AI takes years and years to advance to superhuman level.
If an AGI manages to substantially improve itself every few days I would still call it a hard takeoff.
I would just advise caution: we do not have examples of AI improving their own architecture, so it is dangerous to assume one way or another.
We most certainly do. The field of AGI research is a thing.
We do? I wasn’t aware of them… do you know of any example?
SOAR, ACT-R, MIcrobePsi, OpenCog, LIDA, Numenta, NARS, … you might find this list helpful:
http://linas.org/agi.html
I suggest a different reason not to waste your time with the foom debate: even a non-fooming process may be unstoppable. Consider the institutions of the state and the corporation. Each was a long time coming. Each was hotly contested and had plenty of opponents, who failed to stop it or radically change it. Each changed human life in ways that are not obviously and uniformly for the better.
Some references would have been helpful. Having said that, the downvoting is disgraceful.
If you have any references please do provide them. I honestly don’t know if there is a good write up anywhere, and I haven’t the time or inclination to write one myself. Especially as it would require a very long tutorial overview of the inner workings of modern approaches to AGI to adequately explain why running a human level AGI is such a resource intensive proposal.
The tl;dr is what I wrote: learning cycles would be hours or days, and a foom would require hundreds or thousands of learning cycles at minimum. There is just no plausible way for an intelligence to magic itself to super intelligence in less than large human timescales. I don’t know how to succinctly explain that without getting knee deep in AI theory though.
For a dissenting view, there’s e.g. jacob_cannell’s recent comment about the implications of AlphaGo.
That’s a terrible argument. AlphaGo represents a general approach to AI, but its instantiation on the specific problem of Go tightly constrains the problem domain and solution space. Real life is far more combinatorial still, and an AGI requires much more expensive meta-level repeated cognition as well. You don’t just solve one problem, you also look at all past solved problems and think about his you could have solved those better. That’s quadratic blowup.
Tl;Dr speed of narrow AI != speed of general AI.
But what if a general AI could generate specialized narrow AIs? That is something the human brain cannot do but an AGI could. Thus speed of general AI = speed of AI narrow + time to specialize.
How is it different than a general AI solving the problems by itself?
It isn’t. At least not in my model of what an AI is. But Mark_Friedenbach seems to operate under a model where this is less clear or the consequences of the capability of an AI creating these kind of specialized sub agents seem not to be taken into account enough.
Sure, but that wasn’t my point. I was addressing key questions of training data size, sample efficiency, and learning speed. At least for Go, vision, and related domains, the sample efficiency of DL based systems appears to be approaching that of humans. The net learning efficiency of the brain is far beyond current DL systems in terms of learning per joule, but the gap in terms of learning per dollar is less, and closing quickly. Machine DL systems also easily and typically run 10x or more faster than the brain, and thus learn/train 10x faster.
Although I disagree that fooming will be slow, from what I’ve learned studying it I would say that its approach is not easy to generalize.
AlphaGo draws its power partly due to the step where an ‘intuitive’ neural net is created, using millions of self-play from another already supervisedly trained net. But the training can be accurate because the end positions and the winning player are clearly defined once the game is over. This allows a precise calculation of the outcome function that the intuitive neural net is trying to learn.
Unsupervised learners interacting with an environment that has open ontologies will have a much harder time to come up with this kind of intuition-building step.
I tried to explain it in my recent post, that on current level of technologies human level AGI is possible, but foom is not yet, in particular, because some problems with size, speed and the way neural nets are learning.
Also human level AGI is not powerful enough to foam. Human science is developing but in includes millions of scientists; foaming AI should be of the same complexity but run 1000 times quicker. We don’t have such hardware. http://lesswrong.com/lw/n8z/ai_safety_in_the_age_of_neural_networks_and/
But the field of AI research is foaming with doubling time 1 year now.
foom, not foam, right?
Doubling time of 1 year is not a FOOM. But thank you for taking the time to write up a post on AI safety pulling from modern AI research.
It is not foom, but in 10-20 years it results will be superinteligence. I am now writing a post that will give more details about how I see it—the main idea will be that AI speed improvement will be at hyperbolic law, but it will evolve as a whole environment, not a single fooming AI agent.
I’m asking for references because I don’t have them. it’s a shame that the people who are able, ability-wise, to explain the flaws in the MIRI/FHI approach, actual AI researchers, aren’t able, time-wise, to do so. It leads to MIRI’s views dominating in a way that they should not. It’s anomalous that a bunch of amateurs should become the de facto experts in a field, just because they have funding , publicity, and spare time.
It’s not a unique circumstance. I work in Bitcoin and I assure you we are seeing the same thing right now. I suspect it is a general phenomenon.
MIRI/FHI arguments essentially boil down to “you can’t prove that AI FOOM is impossible”.
Arguments of this form, e.g. “You can’t prove that [snake oil/cryonics/cold fusion] doesn’t work” , “You can’t prove there is no God”, etc. can’t be conclusively refuted.
Various AI experts have expressed skepticism in an imminent super-human AI FOOM, pointing out that the capability required for such scenario, if it is even possible, are far beyond what they see in their daily cutting-edge research on AI, and there are still lots of problems that need to be solved before even approaching human-level AGI. I doubt that these expert would have much to gain from keeping to argue over all the countless variations of the same argument that MIRI/FHI can generate.
I don’t agree.
That’s a 741 pages book, can you summarize a specific argument?
For almost any goal an AI had, the AI would make more progress towards this goal if it became smarter. As an AI became smarter it would become better at making itself smarter. This process continues. Imagine if it were possible to quickly make a copy of yourself that had a slightly different brain. You could then test the new self and see if it was an improvement. If it was you could make this new self the permanent you. You could do this to quickly become much, much smarter. An AI could do this.
True, but there it is likely that there are diminishing returns in how much adding more intelligence can help with other goals, including the instrumental goal of becoming smarter.
Nope, doesn’t follow.
Eventual diminishing returns, perhaps but probably long after it was smart enough to do what it wanted with Earth.
A drug that raised the IQ of human programmers would make the programmers better programmers. Also, intelligence is the ability to solve complex problems in complex environments so it does (tautologically) follow.
Why?
The proper analogy is with a drug that raised the IQ of researchers who invent the drugs that increase IQ. Does this lead to an intelligence explosion? Probably not. If the number of IQ points that you need to discover the next drug in a constant time increases faster than the number of IQ points that the next drug gives you, then you will run into diminishing returns.
It doesn’t seem to be much different with computers.
Algorithmic efficiency is bounded: for any given computational problem, once you have the best algorithm for it, for whatever performance measure you care for, you can’t improve on it anymore. And in fact long before you reached the perfect algorithm you’ll already have run into diminishing returns in terms of effort vs. improvement: past some point you are tweaking low-level details in order to get small performance improvements.
Once you have maxed out algorithmic efficiency, you can only improve by increasing hardware resources, but this 1) requires significant interaction with the physical world, and 2) runs into asymptotic complexity issues: for most AI problems worst-case complexity is at least exponential, average case complexity is more difficult to estimate but most likely super-linear. Take a look at the AlphaGo paper for instance, figure 4c shows how ELO rating increases with the number of CPUs/GPUs/machines. The trend is logarithmic at best, logistic at worst.
Now of course you could insist that it can’t be disproved that significant diminishing returns will kick in before AGI reaches strongly super-human level, but, as I said, this is an unfalsifiable argument from ignorance.
Cute. Now try quantifying that argument. How much data needs to be considered / collected to make each incremental improvement? Does that grow over time, and how fast? What is the failure rate (chance a change makes you dumber not smarter)? What is the critical failure rate (chance a change makes you permanently incapacitated)? How much testing and analysis is required to confidently have a low critical error rate?
When you look at it as an engineer not a philosopher, the answers are not so obvious.
Much depends on what you mean by “learning cycle”—do you mean a complete training iteration (essentially a lifetime) of an AGI? Grown from seed to adult?
I’m not sure where you got the ‘hundreds to thousands’ of learning cycles from either. If you want to estimate the full experimental iteration cycle count, it would probably be better to estimate from smaller domains. Like take vision—how many full experimental cycles did it take to get to current roughly human-level DL vision?
It’s hard to say exactly, but it is roughly on the order of ‘not many’ - we achieved human-level vision with DL very soon after the hardware capability arrived.
If we look in the brain, we see that vision is at least 10% of the total computational cost of the entire brain, and the brain uses the same learning mechanisms and circuit patterns to solve vision as it uses to solve essentially everything else.
Likewise, we see that once we (roughly kindof) solved vision in the very general way the brain does, we see that same general techniques essentially work for all other domains.
Oh thats easy—as soon as you get one adult, human level AGI running compactly on a single GPU, you can then trivially run it 100x faster on a supercomputer, and or replicate it 1 million fold or more. That generation of AGI then quickly produces the next, and then singularity.
It’s slow going until we get up to that key threshold of brain compute parity, but once you pass that we probably go through a phase transition in history.
Citation on plausibility severely needed, which is the point.
While that particular discussion is quite interesting, it’s irrelevant to my point above—which is simply that once you achieve parity, it’s trivially easy to get at least weak superhuman performance through speed.
The whole issue is whether a hard takeoff is possible and/or plausible, presumably with currently available computing technology. Certainly with Landauer-limit computing technology it would be trivial to simulate billions of human minds in the space and energy usage of a single biological brain. If such technology existed, yes a hard takeoff as measured from biological-human scale would be an inevitability.
But what about today’s technology? The largest supercomputers in existence can maaaaybe simulate a single human mind at highly reduced speed and with heavy approximation. A single GPU wouldn’t even come close in either storage or processing capacity. The human brain has about 100bn neurons and operates at 100Hz. The NVIDIA Tesla K80 has 8.73TFLOPS single-precision performance with 24GB of memory. That’s 1.92bits per neuron and 0.87 floating point operations per neuron-cycle. Sorry, no matter how you slice it, neurons are complex things that interact in complex ways. There is just no possible way to do a full simulation with ~2 bits per neuron and ~1 flop per neuron-cycle. More reasonable assumptions about simulation speed and resource requirements demand supercomputers on the order of approximately the largest we as a species have in order to do real-time whole-brain emulations. And if such a thing did exist, it’s not “trivially easy” to expand its own computation power—it’s already running on the fastest stuff in existence!
So with today’s technology, any AI takeoff is likely to be a prolonged affair. This is absolutely certain to be the case if whole-brain emulation is used. So should hard-takeoffs be a concern? Not in the next couple of decades at least.
You are assuming enormously suboptimal/naive simulation. Sure if you use a stupid simulation algorithm, the brain seems powerful.
As a sanity check, apply your same simulation algorithm to simulating the GPU itself.
It has 8 billion transistors that cycle at 1 ghz, with a typical fanout of 2 to 4. So that’s more than 10^19 gate ops/second! Far more than the brain . ..
The brain has about 100 trillion synapses, and the average spike rate is around 0.25hz (yes, really). So that’s only about 25 trillion synaptic events/second. Furthermore, the vast majority of those synapses are tiny and activate on an incoming spike with low probability around 25% to 30% or so (stochastic connection dropout). The average synapse has an SNR equivalent of 4 bits or less. All of these numbers are well-supported from the neuroscience lit.
Thus the brain as a circuit computes with < 10 trillion low bit ops/second. That’s nothing, even if it’s off by 10x.
Also, synapse memory isn’t so much an issue for ANNs, as weights are easily compressed 1000x or more by various schemes, from simple weight sharing to more complex techniques such as tensorization.
As we now approach moore’s law, our low level circuit efficiency has already caught up to the brain, or is it close. The remaining gap is almost entirely algorithmic level efficiency.
If you are assuming that a neuron contributes less than 2 bits of state (or 1 bit per 500 synapses) and 1 computation per cycle, then you know more about neurobiology than anyone alive.
I don’t understand your statement.
I didn’t say anything in my post above about the per neuron state—because it’s not important. Each neuron is a low precision analog accumulator, roughly up to 8-10 bits ish, and there are 20 billion neurons in the cortex. There are another 80 billion in the cerebellum, but they are unimportant.
The memory cost of storing the state for an equivalent ANN is far less than than 20 billion bytes or so, because of compression—most of that state is just zero most of the time.
In terms of computation per neuron per cycle, when a neuron fires it does #fanout computations. Counting from the total synapse numbers is easier than estimating neurons * avg fanout, but gives the same results.
When a neuron doesn’t fire .. .it doesn’t compute anything of significance. This is true in the brain and in all spiking ANNs, as it’s equivalent to sparse matrix operations—where the computational cost depends on the number of nonzeros, not the raw size.