If you have any references please do provide them. I honestly don’t know if there is a good write up anywhere, and I haven’t the time or inclination to write one myself. Especially as it would require a very long tutorial overview of the inner workings of modern approaches to AGI to adequately explain why running a human level AGI is such a resource intensive proposal.
The tl;dr is what I wrote: learning cycles would be hours or days, and a foom would require hundreds or thousands of learning cycles at minimum. There is just no plausible way for an intelligence to magic itself to super intelligence in less than large human timescales. I don’t know how to succinctly explain that without getting knee deep in AI theory though.
For a dissenting view, there’s e.g. jacob_cannell’s recent comment about the implications of AlphaGo.
Critics like to point out that DL requires tons of data, but so does the human brain. A more accurate comparison requires quantifying the dataset human pro go players train on.
A 30 year old asian pro will have perhaps 40,000 hours of playing experience (20 years 50 40 hrs/week). The average game duration is perhaps an hour and consists of 200 moves. In addition, pros (and even fans) study published games. Reading a game takes less time, perhaps as little as 5 minutes or so.
So we can estimate very roughly that a top pro will have absorbed between 100,000 games to 1 million games, and between 20 to 200 million individual positions (around 200 moves per game) .
AlphaGo was trained on the KGS dataset: 160,00 games and 29 million positions. So it did not train on significantly more data than a human pro. The data quantities are actually very similar.
Furthermore, the human’s dataset is perhaps of better quality for a pro, as they will be familiar with mainly pro level games, whereas the AlphaGo dataset is mostly amateur level.
The main difference is speed. The human brain’s ‘clockrate’ or equivalent is about 100 hz, whereas AlphaGo’s various CNNs can run at roughly 1000hz during training on a single machine, and perhaps 10,000 hz equivalent distributed across hundreds of machines. 40,000 hours—a lifetime of experience—can be compressed 100x or more into just a couple of weeks for a machine. This is the key lesson here.
That’s a terrible argument. AlphaGo represents a general approach to AI, but its instantiation on the specific problem of Go tightly constrains the problem domain and solution space. Real life is far more combinatorial still, and an AGI requires much more expensive meta-level repeated cognition as well. You don’t just solve one problem, you also look at all past solved problems and think about his you could have solved those better. That’s quadratic blowup.
But what if a general AI could generate specialized narrow AIs? That is something the human brain cannot do but an AGI could. Thus speed of general AI = speed of AI narrow + time to specialize.
It isn’t. At least not in my model of what an AI is. But Mark_Friedenbach seems to operate under a model where this is less clear or the consequences of the capability of an AI creating these kind of specialized sub agents seem not to be taken into account enough.
AlphaGo represents a general approach to AI, but its instantiation on the specific problem of Go tightly constrains the problem domain and solution space ..
Sure, but that wasn’t my point. I was addressing key questions of training data size, sample efficiency, and learning speed. At least for Go, vision, and related domains, the sample efficiency of DL based systems appears to be approaching that of humans. The net learning efficiency of the brain is far beyond current DL systems in terms of learning per joule, but the gap in terms of learning per dollar is less, and closing quickly. Machine DL systems also easily and typically run 10x or more faster than the brain, and thus learn/train 10x faster.
Although I disagree that fooming will be slow, from what I’ve learned studying it I would say that its approach is not easy to generalize. AlphaGo draws its power partly due to the step where an ‘intuitive’ neural net is created, using millions of self-play from another already supervisedly trained net. But the training can be accurate because the end positions and the winning player are clearly defined once the game is over. This allows a precise calculation of the outcome function that the intuitive neural net is trying to learn. Unsupervised learners interacting with an environment that has open ontologies will have a much harder time to come up with this kind of intuition-building step.
I tried to explain it in my recent post, that on current level of technologies human level AGI is possible, but foom is not yet, in particular, because some problems with size, speed and the way neural nets are learning.
It is not foom, but in 10-20 years it results will be superinteligence. I am now writing a post that will give more details about how I see it—the main idea will be that AI speed improvement will be at hyperbolic law, but it will evolve as a whole environment, not a single fooming AI agent.
I’m asking for references because I don’t have them. it’s a shame that the people who are able, ability-wise, to explain the flaws in the MIRI/FHI approach, actual AI researchers, aren’t able, time-wise, to do so. It leads to MIRI’s views dominating in a way that they should not. It’s anomalous that a bunch of amateurs should become the de facto experts in a field, just because they have funding , publicity, and spare time.
I’m asking for references because I don’t have them. it’s a shame that the people who are able, ability-wise, to explain the flaws in the MIRI/FHI approach
MIRI/FHI arguments essentially boil down to “you can’t prove that AI FOOM is impossible”.
Arguments of this form, e.g. “You can’t prove that [snake oil/cryonics/cold fusion] doesn’t work” , “You can’t prove there is no God”, etc. can’t be conclusively refuted.
Various AI experts have expressed skepticism in an imminent super-human AI FOOM, pointing out that the capability required for such scenario, if it is even possible, are far beyond what they see in their daily cutting-edge research on AI, and there are still lots of problems that need to be solved before even approaching human-level AGI. I doubt that these expert would have much to gain from keeping to argue over all the countless variations of the same argument that MIRI/FHI can generate.
For almost any goal an AI had, the AI would make more progress towards this goal if it became smarter. As an AI became smarter it would become better at making itself smarter. This process continues. Imagine if it were possible to quickly make a copy of yourself that had a slightly different brain. You could then test the new self and see if it was an improvement. If it was you could make this new self the permanent you. You could do this to quickly become much, much smarter. An AI could do this.
For almost any goal an AI had, the AI would make more progress towards this goal if it became smarter.
True, but there it is likely that there are diminishing returns in how much adding more intelligence can help with other goals, including the instrumental goal of becoming smarter.
As an AI became smarter it would become better at making itself smarter.
Eventual diminishing returns, perhaps but probably long after it was smart enough to do what it wanted with Earth.
A drug that raised the IQ of human programmers would make the programmers better programmers. Also, intelligence is the ability to solve complex problems in complex environments so it does (tautologically) follow.
Eventual diminishing returns, perhaps but probably long after it was smart enough to do what it wanted with Earth.
Why?
A drug that raised the IQ of human programmers would make the programmers better programmers.
The proper analogy is with a drug that raised the IQ of researchers who invent the drugs that increase IQ. Does this lead to an intelligence explosion? Probably not. If the number of IQ points that you need to discover the next drug in a constant time increases faster than the number of IQ points that the next drug gives you, then you will run into diminishing returns.
It doesn’t seem to be much different with computers.
Algorithmic efficiency is bounded: for any given computational problem, once you have the best algorithm for it, for whatever performance measure you care for, you can’t improve on it anymore. And in fact long before you reached the perfect algorithm you’ll already have run into diminishing returns in terms of effort vs. improvement: past some point you are tweaking low-level details in order to get small performance improvements.
Once you have maxed out algorithmic efficiency, you can only improve by increasing hardware resources, but this 1) requires significant interaction with the physical world, and 2) runs into asymptotic complexity issues: for most AI problems worst-case complexity is at least exponential, average case complexity is more difficult to estimate but most likely super-linear. Take a look at the AlphaGo paper for instance, figure 4c shows how ELO rating increases with the number of CPUs/GPUs/machines. The trend is logarithmic at best, logistic at worst.
Now of course you could insist that it can’t be disproved that significant diminishing returns will kick in before AGI reaches strongly super-human level, but, as I said, this is an unfalsifiable argument from ignorance.
Cute. Now try quantifying that argument. How much data needs to be considered / collected to make each incremental improvement? Does that grow over time, and how fast? What is the failure rate (chance a change makes you dumber not smarter)? What is the critical failure rate (chance a change makes you permanently incapacitated)? How much testing and analysis is required to confidently have a low critical error rate?
When you look at it as an engineer not a philosopher, the answers are not so obvious.
The tl;dr is what I wrote: learning cycles would be hours or days, and a foom would require hundreds or thousands of learning cycles at minimum.
Much depends on what you mean by “learning cycle”—do you mean a complete training iteration (essentially a lifetime) of an AGI? Grown from seed to adult?
I’m not sure where you got the ‘hundreds to thousands’ of learning cycles from either. If you want to estimate the full experimental iteration cycle count, it would probably be better to estimate from smaller domains. Like take vision—how many full experimental cycles did it take to get to current roughly human-level DL vision?
It’s hard to say exactly, but it is roughly on the order of ‘not many’ - we achieved human-level vision with DL very soon after the hardware capability arrived.
If we look in the brain, we see that vision is at least 10% of the total computational cost of the entire brain, and the brain uses the same learning mechanisms and circuit patterns to solve vision as it uses to solve essentially everything else.
Likewise, we see that once we (roughly kindof) solved vision in the very general way the brain does, we see that same general techniques essentially work for all other domains.
There is just no plausible way for an intelligence to magic itself to super intelligence in less than large human timescales.
Oh thats easy—as soon as you get one adult, human level AGI running compactly on a single GPU, you can then trivially run it 100x faster on a supercomputer, and or replicate it 1 million fold or more. That generation of AGI then quickly produces the next, and then singularity.
It’s slow going until we get up to that key threshold of brain compute parity, but once you pass that we probably go through a phase transition in history.
While that particular discussion is quite interesting, it’s irrelevant to my point above—which is simply that once you achieve parity, it’s trivially easy to get at least weak superhuman performance through speed.
The whole issue is whether a hard takeoff is possible and/or plausible, presumably with currently available computing technology. Certainly with Landauer-limit computing technology it would be trivial to simulate billions of human minds in the space and energy usage of a single biological brain. If such technology existed, yes a hard takeoff as measured from biological-human scale would be an inevitability.
But what about today’s technology? The largest supercomputers in existence can maaaaybe simulate a single human mind at highly reduced speed and with heavy approximation. A single GPU wouldn’t even come close in either storage or processing capacity. The human brain has about 100bn neurons and operates at 100Hz. The NVIDIA Tesla K80 has 8.73TFLOPS single-precision performance with 24GB of memory. That’s 1.92bits per neuron and 0.87 floating point operations per neuron-cycle. Sorry, no matter how you slice it, neurons are complex things that interact in complex ways. There is just no possible way to do a full simulation with ~2 bits per neuron and ~1 flop per neuron-cycle. More reasonable assumptions about simulation speed and resource requirements demand supercomputers on the order of approximately the largest we as a species have in order to do real-time whole-brain emulations. And if such a thing did exist, it’s not “trivially easy” to expand its own computation power—it’s already running on the fastest stuff in existence!
So with today’s technology, any AI takeoff is likely to be a prolonged affair. This is absolutely certain to be the case if whole-brain emulation is used. So should hard-takeoffs be a concern? Not in the next couple of decades at least.
The human brain has about 100bn neurons and operates at 100Hz. The NVIDIA Tesla K80 has 8.73TFLOPS single-precision performance with 24GB of memory. That’s 1.92bits per neuron and 0.87 floating point operations per neuron-cycle. Sorry, no matter how you slice it, neurons are complex things that interact in complex ways. There is just no possible way to do a full simulation with ~2 bits per neuron and ~1 flop per neuron-cycle
You are assuming enormously suboptimal/naive simulation. Sure if you use a stupid simulation algorithm, the brain seems powerful.
As a sanity check, apply your same simulation algorithm to simulating the GPU itself.
It has 8 billion transistors that cycle at 1 ghz, with a typical fanout of 2 to 4. So that’s more than 10^19 gate ops/second! Far more than the brain . ..
The brain has about 100 trillion synapses, and the average spike rate is around 0.25hz (yes, really). So that’s only about 25 trillion synaptic events/second. Furthermore, the vast majority of those synapses are tiny and activate on an incoming spike with low probability around 25% to 30% or so (stochastic connection dropout). The average synapse has an SNR equivalent of 4 bits or less. All of these numbers are well-supported from the neuroscience lit.
Thus the brain as a circuit computes with < 10 trillion low bit ops/second. That’s nothing, even if it’s off by 10x.
Also, synapse memory isn’t so much an issue for ANNs, as weights are easily compressed 1000x or more by various schemes, from simple weight sharing to more complex techniques such as tensorization.
As we now approach moore’s law, our low level circuit efficiency has already caught up to the brain, or is it close. The remaining gap is almost entirely algorithmic level efficiency.
If you are assuming that a neuron contributes less than 2 bits of state (or 1 bit per 500 synapses) and 1 computation per cycle, then you know more about neurobiology than anyone alive.
I didn’t say anything in my post above about the per neuron state—because it’s not important. Each neuron is a low precision analog accumulator, roughly up to 8-10 bits ish, and there are 20 billion neurons in the cortex. There are another 80 billion in the cerebellum, but they are unimportant.
The memory cost of storing the state for an equivalent ANN is far less than than 20 billion bytes or so, because of compression—most of that state is just zero most of the time.
In terms of computation per neuron per cycle, when a neuron fires it does #fanout computations. Counting from the total synapse numbers is easier than estimating neurons * avg fanout, but gives the same results.
When a neuron doesn’t fire .. .it doesn’t compute anything of significance. This is true in the brain and in all spiking ANNs, as it’s equivalent to sparse matrix operations—where the computational cost depends on the number of nonzeros, not the raw size.
If you have any references please do provide them. I honestly don’t know if there is a good write up anywhere, and I haven’t the time or inclination to write one myself. Especially as it would require a very long tutorial overview of the inner workings of modern approaches to AGI to adequately explain why running a human level AGI is such a resource intensive proposal.
The tl;dr is what I wrote: learning cycles would be hours or days, and a foom would require hundreds or thousands of learning cycles at minimum. There is just no plausible way for an intelligence to magic itself to super intelligence in less than large human timescales. I don’t know how to succinctly explain that without getting knee deep in AI theory though.
For a dissenting view, there’s e.g. jacob_cannell’s recent comment about the implications of AlphaGo.
That’s a terrible argument. AlphaGo represents a general approach to AI, but its instantiation on the specific problem of Go tightly constrains the problem domain and solution space. Real life is far more combinatorial still, and an AGI requires much more expensive meta-level repeated cognition as well. You don’t just solve one problem, you also look at all past solved problems and think about his you could have solved those better. That’s quadratic blowup.
Tl;Dr speed of narrow AI != speed of general AI.
But what if a general AI could generate specialized narrow AIs? That is something the human brain cannot do but an AGI could. Thus speed of general AI = speed of AI narrow + time to specialize.
How is it different than a general AI solving the problems by itself?
It isn’t. At least not in my model of what an AI is. But Mark_Friedenbach seems to operate under a model where this is less clear or the consequences of the capability of an AI creating these kind of specialized sub agents seem not to be taken into account enough.
Sure, but that wasn’t my point. I was addressing key questions of training data size, sample efficiency, and learning speed. At least for Go, vision, and related domains, the sample efficiency of DL based systems appears to be approaching that of humans. The net learning efficiency of the brain is far beyond current DL systems in terms of learning per joule, but the gap in terms of learning per dollar is less, and closing quickly. Machine DL systems also easily and typically run 10x or more faster than the brain, and thus learn/train 10x faster.
Although I disagree that fooming will be slow, from what I’ve learned studying it I would say that its approach is not easy to generalize.
AlphaGo draws its power partly due to the step where an ‘intuitive’ neural net is created, using millions of self-play from another already supervisedly trained net. But the training can be accurate because the end positions and the winning player are clearly defined once the game is over. This allows a precise calculation of the outcome function that the intuitive neural net is trying to learn.
Unsupervised learners interacting with an environment that has open ontologies will have a much harder time to come up with this kind of intuition-building step.
I tried to explain it in my recent post, that on current level of technologies human level AGI is possible, but foom is not yet, in particular, because some problems with size, speed and the way neural nets are learning.
Also human level AGI is not powerful enough to foam. Human science is developing but in includes millions of scientists; foaming AI should be of the same complexity but run 1000 times quicker. We don’t have such hardware. http://lesswrong.com/lw/n8z/ai_safety_in_the_age_of_neural_networks_and/
But the field of AI research is foaming with doubling time 1 year now.
foom, not foam, right?
Doubling time of 1 year is not a FOOM. But thank you for taking the time to write up a post on AI safety pulling from modern AI research.
It is not foom, but in 10-20 years it results will be superinteligence. I am now writing a post that will give more details about how I see it—the main idea will be that AI speed improvement will be at hyperbolic law, but it will evolve as a whole environment, not a single fooming AI agent.
I’m asking for references because I don’t have them. it’s a shame that the people who are able, ability-wise, to explain the flaws in the MIRI/FHI approach, actual AI researchers, aren’t able, time-wise, to do so. It leads to MIRI’s views dominating in a way that they should not. It’s anomalous that a bunch of amateurs should become the de facto experts in a field, just because they have funding , publicity, and spare time.
It’s not a unique circumstance. I work in Bitcoin and I assure you we are seeing the same thing right now. I suspect it is a general phenomenon.
MIRI/FHI arguments essentially boil down to “you can’t prove that AI FOOM is impossible”.
Arguments of this form, e.g. “You can’t prove that [snake oil/cryonics/cold fusion] doesn’t work” , “You can’t prove there is no God”, etc. can’t be conclusively refuted.
Various AI experts have expressed skepticism in an imminent super-human AI FOOM, pointing out that the capability required for such scenario, if it is even possible, are far beyond what they see in their daily cutting-edge research on AI, and there are still lots of problems that need to be solved before even approaching human-level AGI. I doubt that these expert would have much to gain from keeping to argue over all the countless variations of the same argument that MIRI/FHI can generate.
I don’t agree.
That’s a 741 pages book, can you summarize a specific argument?
For almost any goal an AI had, the AI would make more progress towards this goal if it became smarter. As an AI became smarter it would become better at making itself smarter. This process continues. Imagine if it were possible to quickly make a copy of yourself that had a slightly different brain. You could then test the new self and see if it was an improvement. If it was you could make this new self the permanent you. You could do this to quickly become much, much smarter. An AI could do this.
True, but there it is likely that there are diminishing returns in how much adding more intelligence can help with other goals, including the instrumental goal of becoming smarter.
Nope, doesn’t follow.
Eventual diminishing returns, perhaps but probably long after it was smart enough to do what it wanted with Earth.
A drug that raised the IQ of human programmers would make the programmers better programmers. Also, intelligence is the ability to solve complex problems in complex environments so it does (tautologically) follow.
Why?
The proper analogy is with a drug that raised the IQ of researchers who invent the drugs that increase IQ. Does this lead to an intelligence explosion? Probably not. If the number of IQ points that you need to discover the next drug in a constant time increases faster than the number of IQ points that the next drug gives you, then you will run into diminishing returns.
It doesn’t seem to be much different with computers.
Algorithmic efficiency is bounded: for any given computational problem, once you have the best algorithm for it, for whatever performance measure you care for, you can’t improve on it anymore. And in fact long before you reached the perfect algorithm you’ll already have run into diminishing returns in terms of effort vs. improvement: past some point you are tweaking low-level details in order to get small performance improvements.
Once you have maxed out algorithmic efficiency, you can only improve by increasing hardware resources, but this 1) requires significant interaction with the physical world, and 2) runs into asymptotic complexity issues: for most AI problems worst-case complexity is at least exponential, average case complexity is more difficult to estimate but most likely super-linear. Take a look at the AlphaGo paper for instance, figure 4c shows how ELO rating increases with the number of CPUs/GPUs/machines. The trend is logarithmic at best, logistic at worst.
Now of course you could insist that it can’t be disproved that significant diminishing returns will kick in before AGI reaches strongly super-human level, but, as I said, this is an unfalsifiable argument from ignorance.
Cute. Now try quantifying that argument. How much data needs to be considered / collected to make each incremental improvement? Does that grow over time, and how fast? What is the failure rate (chance a change makes you dumber not smarter)? What is the critical failure rate (chance a change makes you permanently incapacitated)? How much testing and analysis is required to confidently have a low critical error rate?
When you look at it as an engineer not a philosopher, the answers are not so obvious.
Much depends on what you mean by “learning cycle”—do you mean a complete training iteration (essentially a lifetime) of an AGI? Grown from seed to adult?
I’m not sure where you got the ‘hundreds to thousands’ of learning cycles from either. If you want to estimate the full experimental iteration cycle count, it would probably be better to estimate from smaller domains. Like take vision—how many full experimental cycles did it take to get to current roughly human-level DL vision?
It’s hard to say exactly, but it is roughly on the order of ‘not many’ - we achieved human-level vision with DL very soon after the hardware capability arrived.
If we look in the brain, we see that vision is at least 10% of the total computational cost of the entire brain, and the brain uses the same learning mechanisms and circuit patterns to solve vision as it uses to solve essentially everything else.
Likewise, we see that once we (roughly kindof) solved vision in the very general way the brain does, we see that same general techniques essentially work for all other domains.
Oh thats easy—as soon as you get one adult, human level AGI running compactly on a single GPU, you can then trivially run it 100x faster on a supercomputer, and or replicate it 1 million fold or more. That generation of AGI then quickly produces the next, and then singularity.
It’s slow going until we get up to that key threshold of brain compute parity, but once you pass that we probably go through a phase transition in history.
Citation on plausibility severely needed, which is the point.
While that particular discussion is quite interesting, it’s irrelevant to my point above—which is simply that once you achieve parity, it’s trivially easy to get at least weak superhuman performance through speed.
The whole issue is whether a hard takeoff is possible and/or plausible, presumably with currently available computing technology. Certainly with Landauer-limit computing technology it would be trivial to simulate billions of human minds in the space and energy usage of a single biological brain. If such technology existed, yes a hard takeoff as measured from biological-human scale would be an inevitability.
But what about today’s technology? The largest supercomputers in existence can maaaaybe simulate a single human mind at highly reduced speed and with heavy approximation. A single GPU wouldn’t even come close in either storage or processing capacity. The human brain has about 100bn neurons and operates at 100Hz. The NVIDIA Tesla K80 has 8.73TFLOPS single-precision performance with 24GB of memory. That’s 1.92bits per neuron and 0.87 floating point operations per neuron-cycle. Sorry, no matter how you slice it, neurons are complex things that interact in complex ways. There is just no possible way to do a full simulation with ~2 bits per neuron and ~1 flop per neuron-cycle. More reasonable assumptions about simulation speed and resource requirements demand supercomputers on the order of approximately the largest we as a species have in order to do real-time whole-brain emulations. And if such a thing did exist, it’s not “trivially easy” to expand its own computation power—it’s already running on the fastest stuff in existence!
So with today’s technology, any AI takeoff is likely to be a prolonged affair. This is absolutely certain to be the case if whole-brain emulation is used. So should hard-takeoffs be a concern? Not in the next couple of decades at least.
You are assuming enormously suboptimal/naive simulation. Sure if you use a stupid simulation algorithm, the brain seems powerful.
As a sanity check, apply your same simulation algorithm to simulating the GPU itself.
It has 8 billion transistors that cycle at 1 ghz, with a typical fanout of 2 to 4. So that’s more than 10^19 gate ops/second! Far more than the brain . ..
The brain has about 100 trillion synapses, and the average spike rate is around 0.25hz (yes, really). So that’s only about 25 trillion synaptic events/second. Furthermore, the vast majority of those synapses are tiny and activate on an incoming spike with low probability around 25% to 30% or so (stochastic connection dropout). The average synapse has an SNR equivalent of 4 bits or less. All of these numbers are well-supported from the neuroscience lit.
Thus the brain as a circuit computes with < 10 trillion low bit ops/second. That’s nothing, even if it’s off by 10x.
Also, synapse memory isn’t so much an issue for ANNs, as weights are easily compressed 1000x or more by various schemes, from simple weight sharing to more complex techniques such as tensorization.
As we now approach moore’s law, our low level circuit efficiency has already caught up to the brain, or is it close. The remaining gap is almost entirely algorithmic level efficiency.
If you are assuming that a neuron contributes less than 2 bits of state (or 1 bit per 500 synapses) and 1 computation per cycle, then you know more about neurobiology than anyone alive.
I don’t understand your statement.
I didn’t say anything in my post above about the per neuron state—because it’s not important. Each neuron is a low precision analog accumulator, roughly up to 8-10 bits ish, and there are 20 billion neurons in the cortex. There are another 80 billion in the cerebellum, but they are unimportant.
The memory cost of storing the state for an equivalent ANN is far less than than 20 billion bytes or so, because of compression—most of that state is just zero most of the time.
In terms of computation per neuron per cycle, when a neuron fires it does #fanout computations. Counting from the total synapse numbers is easier than estimating neurons * avg fanout, but gives the same results.
When a neuron doesn’t fire .. .it doesn’t compute anything of significance. This is true in the brain and in all spiking ANNs, as it’s equivalent to sparse matrix operations—where the computational cost depends on the number of nonzeros, not the raw size.