Why AI may not foom
Summary
There’s a decent chance that the intelligence of a self-improving AGI will grow in a relatively smooth exponential or sub-exponential way, not super-exponentially or with large jump discontinuities.
If this is the case, then an AGI whose effective intelligence matched that of the world’s combined AI researchers would make AI progress at the rate they do, taking decades to double its own intelligence.
The risk that the first successful AGI will quickly monopolize many industries, or quickly hack many of the computers connected to the internet, seems worth worrying about. In either case, the AGI would likely end up using the additional computing power it gained to self-modify so it was superintelligent.
AI boxing could mitigate both of these risks greatly.
If hard takeoff could be impossible, it might be best to assume this case and concentrate our resources on ensuring a safe soft takeoff, given that the prospects for a safe hard takeoff look grim.
Takeoff models discussed in the Hanson-Yudkowsky debate
The supercritical nuclear chain reaction model
Yudkowsky alludes to this model repeatedly, starting in this post:
When a uranium atom splits, it releases neutrons—some right away, some after delay while byproducts decay further. Some neutrons escape the pile, some neutrons strike another uranium atom and cause an additional fission. The effective neutron multiplication factor, denoted k, is the average number of neutrons from a single fissioning uranium atom that cause another fission...
It might seem that a cycle, with the same thing happening over and over again, ought to exhibit continuous behavior. In one sense it does. But if you pile on one more uranium brick, or pull out the control rod another twelve inches, there’s one hell of a big difference between k of 0.9994 and k of 1.0006.
I don’t like this model much for the following reasons:
The model doesn’t offer much insight in to the time scale over which an AI might self-improve. The “mean generation time” (time necessary for the next “generation” of neutrons to be released) of a nuclear chain reaction is short, and the doubling time for neutron activity in Fermi’s experiment was just two minutes, but it hardly seems reasonable to generalize this to self-improving AIs.
A flurry of insights that either dies out or expands exponentially doesn’t seem like a very good description of how human minds work, and I don’t think it would describe an AGI well either. Many people report that taking time to think about problems is key to their problem-solving process. It seems likely that an AGI unable to immediately generate insight in to some problem would have a slower and more exhaustive “fallback” search process that would allow it to continue making progress. (Insight could also work via a search process in the first place—over the space of permutations in one’s mental model, say.)
The “differential equations folded on themselves” model
This is another model Eliezer alludes to, albeit in a somewhat handwavey fashion:
When you fold a whole chain of differential equations in on itself like this, it should either peter out rapidly as improvements fail to yield further improvements, or else go FOOM.
It’s not exactly clear to me what the “whole chain of differential equations” is supposed to refer to… there’s only one differential equation in the preceding paragraph, and it’s a standard exponential (which could be scary or not, depending on the multiplier in the exponent. Rabbit populations and bank account balances both grow exponentially in a way that’s slow enough for humans to understand and control.)
Maybe he’s referring to the levels he describes here: metacognitive, cognitive, metaknowledge, knowledge, and object. How might we paramaterize this system?
Let’s say c is our AGI’s cognition ability, dc/dt is the rate of change in our AGI’s cognitive ability, m is our AGI’s “metaknowledge” (about cognition and metaknowledge), and dm/dt is the rate of change in metaknowledge. What I’ve got in mind is:
where p and q are constants.
In other words, both change in cognitive ability and change in metaknowledge are each individually directly proportionate to both cognitive ability and metaknowledge.
I don’t know much about understanding systems of differential equations, so if you do, please comment! I put the above system in to Wolfram Alpha, but I’m not exactly sure how to interpret the solution provided. In any case, fooling around with this script suggests sudden, extremely sharp takeoff for a variety of different test parameters.
The straight exponential model
To me, the “proportionality thesis” described by David Chalmers in his singularity paper, “increases in intelligence (or increases of a certain sort) always lead to proportionate increases in the capacity to design intelligent systems”, suggests a single differential equation that looks like
where u represents the number of upgrades that have been made to an AGI’s source code, and s is some constant. The solution to this differential equation is going to look like
where the constant c1 is determined by our initial conditions.
(In Recursive Self-Improvement, Eliezer calls this a “too-obvious mathematical idiom”. I’m inclined to favor it for its obviousness, or at least use it as a jumping-off point for further analysis.)
Under this model, the constant s is pretty important… if u(t) was the amount of money in a bank account, s would be the rate of return it was receiving. The parameter s will effectively determine the “doubling time” of an AGI’s intelligence. It matters a lot whether this “doubling time” is on the scale of minutes or years.
So what’s going to determine s? Well, if the AGI’s hardware is twice as fast, we’d expect it to come up with upgrades twice as fast. If the AGI had twice as much hardware, and it could parallelize the search for upgrades perfectly (which seems like a reasonable approximation to me), we’d expect the same thing. So let’s decompose s and make it the product of two parameters: h representing the hardware available to the AGI, and r representing the ease of finding additional improvements. The AGI’s intelligence will be on the order of u * h, i.e. the product of the AGI’s software quality and hardware capability.
Considerations affecting our choice of model
Diminishing returns
The consideration here is that the initial improvements implemented by an AGI will tend to be those that are especially easy to implement and/or especially fruitful to implement, with subsequent improvements tending to deliver less intelligence bang for the implementation buck. Chalmers calls this “perhaps the most serious structural obstacle” to the proportionality thesis.
To think about this consideration, one could imagine representing a given improvement as a pair of two values (u, d). u represents a factor by which existing performance will be multiplied, e.g. if u is 1.1, then implementing this improvement will improve performance by a factor of 1.1. d represents the cognitive difficulty or amount of intellectual labor to required to implement a given improvement. If d is doubled, then at any given level of intelligence, implementing this improvement will take twice as long (because it will be harder to discover and/or harder to translate in to code).
Now let’s imagine ordering our improvements in order from highest to lowest u to d ratio, so we implement those improvements that deliver the greatest bang for the buck first.
Thus ordered, let’s imagine separating groups of consecutive improvements in to “tiers”. Each tier’s worth of improvements, when taken together, will represent the doubling of an AGI’s software quality, i.e. the product of the u’s in that cluster will be roughly 2. For a steady doubling time, each tier’s total difficulty will need sum to approximately twice the difficulty of the tier before it. If tier difficulty tends to more than double, we’re likely to see sub-exponential growth. If tier difficulty tends to less than double, we’re likely to see super-exponential growth. If a single improvement delivers a more-than-2x improvement, it will span multiple “tiers”.
It seems to me that the quality of fruit available at each tier represents a kind of logical uncertainty, similar to asking whether an efficient algorithm exists for some task, and if so, how efficient.
On the this diminishing returns consideration, Chalmers writes:
If anything, 10% increases in intelligence-related capacities are likely to lead all sorts of intellectual breakthroughs, leading to next-generation increases in intelligence that are significantly greater than 10%. Even among humans, relatively small differences in design capacities (say, the difference between Turing and an average human) seem to lead to large differences in the systems that are designed (say, the difference between a computer and nothing of importance).
Eliezer Yudkowsky’s objection is similar:
...human intelligence does not require a hundred times as much computing power as chimpanzee intelligence. Human brains are merely three times too large, and our prefrontal cortices six times too large, for a primate with our body size.
Or again: It does not seem to require 1000 times as many genes to build a human brain as to build a chimpanzee brain, even though human brains can build toys that are a thousand times as neat.
Why is this important? Because it shows that with constant optimization pressure from natural selection and no intelligent insight, there were no diminishing returns to a search for better brain designs up to at least the human level. There were probably accelerating returns (with a low acceleration factor). There are no visible speedbumps, so far as I know.
First, hunter-gatherers can’t design toys that are a thousand times as neat as the ones chimps design—they aren’t programmed with the software modern humans get through the education (some may be unable to count), and educating apes has produced interesting results.
Speaking as someone who’s basically clueless about neuroscience, I can think of many different factors that might contribute to intelligence differences within the human race or between humans and other apes:
Processing speed.
Cubic centimeters brain hardware devoted to abstract thinking. (Gifted technical thinkers often seem to suffer from poor social intuition—perhaps a result of reallocation of brain hardware from social to technical processing.)
Average number of connections per neuron within that brain hardware.
Average neuron density within that brain hardware. This author seems to think that a large part of the human brain’s remarkableness comes largely from the fact that it’s the largest primate brain, and primate brains maintain the same neuron density when enlarged while other types of brains don’t. “If absolute brain size is the best predictor of cognitive abilities in a primate (13), and absolute brain size is proportional to number of neurons across primates (24, 26), our superior cognitive abilities might be accounted for simply by the total number of neurons in our brain, which, based on the similar scaling of neuronal densities in rodents, elephants, and cetaceans, we predict to be the largest of any animal on Earth (28).”
Propensity to actually use your capacity for deliberate System 2 reasoning. Richard Feynman’s second wife on why she divorced him: “He begins working calculus problems in his head as soon as he awakens. He did calculus while driving in his car, while sitting in the living room, and while lying in bed at night.” (By the way, does anyone know of research that’s been done on getting people to use System 2 more? Seems like it could be really low-hanging fruit for improving intellectual output. Sometimes I wonder if the reason intelligent people tend to like math is because they were reinforced for the behaviour of thinking abstractly as kids (via praise, good grades, etc.) while those not at the top of the class were not so reinforced.)
Increased calories to think with due to the invention of cooking.
And finally, mental algorithms (“software”). Which are probably at least somewhat important.
It seems to me like these factors (or ones like them) may multiply together to produce intelligence, i.e. the “intelligence equation”, as it were, could be something like intelligence = processing_speed * cc_abstract_hardware * neuron_density * connections_per_neuron * propensity_for_abstraction * mental_algorithms. If the ancestral environment rewarded intelligence, we should expect all of these characteristics to be selected for, and this could explain the “low acceleration factor” in human intelligence increase. (Increasing your processing speed by a factor of 1.2 does more when you’re already pretty smart, so all these sources of intelligence increase would feed in to one another.)
In other words, it’s not that clear what relevance the evolution of human intelligence has to the ease and quality of the upgrades at different “tiers” of software improvements, since evolution operates on many non-software factors, but a self-improving AI (properly boxed) can only improve its software.
Bottlenecks
In the Hanson/Yudkowsky debate, Yudkowsky declares Douglas Englebart’s plan to radically bootstrap his team’s productivity though improving their computer and software tools “insufficiently recursive”. I agree with this assessment. Here’s my modelling of this phenomenon.
When a programmer makes an improvement to their code, their work of making the improvement requires the completion of many subtasks:
choosing a feature to add
reminding themselves of how the relevant part of the code works and loading that information in to their memory
identifying ways to implement the feature
evaluating different methods of implementing the feature according to simplicity, efficiency, and correctness
coding their chosen implementation
testing their chosen implementation, identifying bugs
identifying the cause of a given bug
figuring out how to fix the given bug
Each of those subtasks will consist of further subtasks like poking through their code, staring off in to space, typing, and talking to their rubber duck.
Now the programmer improves their development environment so they can poke through their code slightly faster. But if poking through their code takes up only 5% of their development time, even an extremely large improvement in code-poking abilities is not going to result in an especially large increase in his development speed… in the best case, where code-poking time is reduced to zero, the programmer will only work about 5% faster.
This is a reflection of Amdahl’s Law-type thinking. The amount you can gain through speeding something up depends on how much it’s slowing you down.
Relatedly, if intelligence is a complicated, heterogeneous process where computation is spread relatively evenly among many modules, then improving the performance of an AGI gets tougher, because upgrading an individual module does little to improve the performance of the system as a whole.
And to see orders-of-magnitude performance improvement in such a process, almost all of your AGI’s components will need to be improved radically. If even a few prove troublesome, improving your AGI’s thinking speed becomes difficult.
Case studies in technological development speed
Moore’s Law
It has famously been noted that if the automotive industry had achieved similar improvements in performance [to the semiconductor industry] in the last 30 years, a Rolls-Royce would cost only $40 and could circle the globe eight times on one gallon of gas—with a top speed of 2.4 million miles per hour.
From this McKinsey report. So Moore’s Law is an outlier where technological development is concerned. I suspect that making transistors smaller and faster doesn’t require finding ways to improve dozens of heterogeneous components. And when you zoom out to view a computer system as a whole, other bottlenecks typically appear.
(It’s also worth noting that research budgets in the semiconductor field have also risen greatly in the semiconductor industry since its inception, but obviously not following the same curve that chip speeds have.)
Compiler technology
This paper on “Proebstig’s Law” suggests that the end result of all the compiler research done between 1970 or so and 2001 was that a typical integer-intensive program was compiled to run 3.3 times faster, and a typical floating-point-intensive program was compiled to run 8.1 times faster. When it comes to making programs run quickly, it seems that software-level compiler improvements are swamped by hardware-level chip improvements—perhaps because, like an AGI, a compiler has to deal with a huge variety of different scenarios, so improving it in the average case is tough. (This represents supertask heterogeneity, rather than subtask heterogeneity, so it’s a different objection than the one mentioned above.)
Database technology
According to two analyses (full paper for that second one), it seems that improvement in database performance benchmarks has largely been due to Moore’s Law.
AI (so far)
Robin Hanson’s blog post “AI Progress Estimate” was the best resource I could find on this.
Why smooth exponential growth implies soft takeoff
Let’s suppose we consider all of the above, deciding that the exponential model is the best, and we agree with Robin Hanson that there are few deep, chunky, undiscovered AI insights.
Under the straight exponential model, if you recall, we had
where u is the degree of software quality, h is the hardware availability, and r is a parameter representing the difficulty of doing additional upgrades. Our AGI’s overall intelligence is given by u * h—the quality of the software times the amount of hardware.
Now we can solve for r by substituting in human intelligence for u * h, and substituting in the rate of human AI progress for du/dt. Another way of saying this is: When the AI is as smart as all the world’s AI researchers working together, it will produce new AI insights at the rate that all the world’s AI researchers working together produce new insights. At some point our AGI will be just as smart as the world’s AI researchers, but we can hardly expect to start seeing super-fast AI progress at that point, because the world’s AI researchers haven’t produced super-fast AI progress.
Let’s assume AGI that’s on par with the world AI research community is reached in 2080 (LW’s median “singularity” estimate in 2011). We’ll pretend AI research has only been going on since 2000, meaning 80 “standard research years” of progress have gone in to the AGI’s software. So at the moment our shiny new AGI is fired up, u = 80, and it’s doing research at the rate of one “human AGI community research year” per year, so du/dt = 1. That’s an effective rate of return on AI software progress of 1 / 80 = 1.3%, giving a software quality doubling time of around 58 years.
You could also apply this kind of thinking to individual AI projects. For example, it’s possible that at some point EURISKO was improving itself about as fast as Doug Lenat was improving it. You might be able to do a similar calculation to take a stab at EURISKO’s insight level doubling time.
The importance of hardware
According to my model, you double your AGI’s intelligence, and thereby the speed with which your AGI improves itself, by doubling the hardware available for your AGI. So if you had an AGI that was interesting, you could make it 4x as smart by giving it 4x the hardware. If an AGI that was 4x as smart could get you 4x as much money (through impressing investors, or playing the stock market, or monopolizing additional industries), that’d be a nice feedback loop. For maximum explosivity, put half your AGI’s mind to the task of improving its software, and the other half to the task of making more money with which to buy more hardware.
But it seems pretty straightforward to prevent a non-superintelligent AI from gaining access to additional hardware with careful planning. (Note: One problem with AI boxing experiments thus far is that all of the AIs have been played by human beings. Human beings have innate understanding of human psychology and possess specialized capabilities for running emulations of one another. It seems pretty easy to prevent an AGI from acquiring such understanding. But there may exist box-breaking techniques that don’t rely on understanding human psychology. Another note about boxing: FAI requires getting everything perfect, which is a conjunctive calculation. Given multiple safeguards, only one has to work for the box as a whole to work, which is a disjunctive calculation.)
AGI’s impact on the economy
Is it possible that the first group to create a successful AGI might begin monopolizing different sections of the economy? Robin Hanson argues that technology insights typically leak between different companies, due to conferences and employee poaching. But we can’t be confident these factors would affect the research an AGI does on itself. And if an AGI is still dumb enough that a significant portion of its software upgrades are coming from human researchers, it can hardly be considered superintelligent.
Given what looks like a winner-take-all dynamic, an important factor may be the number of serious AGI competitors. If there are only two, the #1 company may not wish to trade insights with the #2 company for fear of losing its lead. If there are more than two, all but the leading company might ally against the leading company in trading insights. If their alliance is significantly stronger than the leading company, perhaps the leading company would wish to join their alliance.
But if AI is about getting lots of details right, as Hanson suggests, improvements may not even transfer between different AI architectures.
What should we do?
I’ve argued that soft takeoff is a strong possibility. Should that change our strategy as people concerned with x-risk?
If we are basically screwed in the event that hard takeoff is possible, it may be that preparing for a soft takeoff is a better use of resources on the margin. Shane Legg has proposed that people concerned with friendliness become investors in AGI projects so they can affect the outcome of any that seem to be succeeding.
Concluding thoughts
Expert forecasts are famously unreliable even in the relatively well-understood field of political forecasting. So given the number of unknowns involved in the emergence of smarter-than-human intelligence, it’s hard to say much with certainty. Picture a few Greek scholars speculating on the industrial revolution.
I don’t have a strong background in these topics, so I fully expect that the above essay will reveal my ignorance, which I’d appreciate your pointing out in the comments. This essay should be taken as at attempt to hack away at the edges, not come to definitive conclusions. As always, I reserve the right to change my mind about anything ;)
- Original Research on Less Wrong by 29 Oct 2012 22:50 UTC; 48 points) (
- 16 Jan 2014 8:25 UTC; 16 points) 's comment on The first AI probably won’t be very smart by (
- 15 Jan 2014 5:44 UTC; 9 points) 's comment on Stupid Questions Thread—January 2014 by (
- 8 Sep 2013 23:13 UTC; 8 points) 's comment on Yet More “Stupid” Questions by (
- What’s the likelihood of only sub exponential growth for AGI? by 13 Nov 2021 22:46 UTC; 5 points) (
- 27 May 2015 12:23 UTC; 5 points) 's comment on Leaving LessWrong for a more rational life by (
- 21 Jul 2015 11:36 UTC; 4 points) 's comment on Effective Altruism Global SF panel on AI: question submissions thread by (EA Forum;
- 12 Dec 2015 20:11 UTC; 2 points) 's comment on [Link] Introducing OpenAI by (
- 4 Jan 2014 0:21 UTC; 2 points) 's comment on [link] How do good ideas spread? by (
- 19 May 2013 6:16 UTC; 1 point) 's comment on LINK: Google research chief: ‘Emergent artificial intelligence? Hogwash!’ by (
- 11 Jun 2015 16:02 UTC; 0 points) 's comment on Leaving LessWrong for a more rational life by (
- 21 May 2014 23:36 UTC; 0 points) 's comment on OpenWorm and differential technological development by (
- 9 Sep 2013 6:10 UTC; 0 points) 's comment on Yet More “Stupid” Questions by (
I think there is a fundamental misunderstanding of the nature of software performance in this kind of arguments.
Software performance, according to any metric of your choice (speed, memory usage, energy consumption, etc.) is fundamentally a measure of efficiency.
For any given task, and any given hardware architecture, there is one program that maximizes the performance metric: that’s 100% efficiency.
The fact that efficiency is bounded means that you can’t keep doubling it. If your program is 25% efficient, then the best you can hope for is to double its efficiency twice and then you are done.
In practice, when you try to improve the efficiency of a program, you quickly run into diminishing returns: you get the biggest gains from chosing the proper general forms of the algorithms and data structures, then the more you fiddle with the details, down to machine code level, the less gains you get, despite the effort.
In fact, it can be shown that obtaining the most efficient program for a given problem is uncomputable in the general case.
Therefore, self-improving AI or not, you only get so far with software improvements. So you are left with hardware improvements, which bring us to another misunderstanding:
WRONG
This misunderstanding is very common among non-computer scientists, and in fact it was common even among computer scientists before computational complexity was understood.
The misunderstanding rests on the implicit assumption that performance scales essentially linearly with hardware resources. Typically, it doesn’t.
Problems which admit algorithms of linear complexity are only a small, lucky subset of all the interesting problems.
Many problems have superlinear polynomial complexity, meaning that as you increase the problem instance size, the amount of hardware resources required scales as a superlinear polynomial of the problem instance size.
It gets worse:
Many problems, including many optimization problems relevant to AI, fall in the NP-hard class, which is strongly conjectured to have super-polynomial, in particular exponential, complexity.
There are some details missing from this picture, namely that this classification refers to worst-case complexity, while average-case complexity may differ. Some NP-hard problems admits approximation schemes or heuristics which allow to feasibly compute solutions for problem instances of reasonable size, at least on average.
But the main point stands. For any such problem, for any probabilty distribution over the instances, there will be an algorithm with the best average-case complexity. In general, this average case complexity will not be linear, probably, it will not be even polynomial. Doubling your hardware will not double the performance of this algorithm.
Anecdotally, I’m under the impression that this reflects observed gains in AI performance: hardware resources have been growing exponentially for decades, while AI performance increased perhaps linearly or even sublinearly with time. Algorithms got better, but it seems to me that AI is fundamentally an exponential complexity problem.
In generality, yea, but possibly correct-ish for a a part of the powering-up curve, depending on the algorithms involved. If it Amdahl’ed out only once the AGI had already reached superintelligence, that wouldn’t be very comforting.
Thanks for your comments. How do you think human intelligence works? Perhaps by doing a massive parallel search to approximate the best solution?
I’m confused… if time required is a polynomial or exponential function of your problem size, then hardware that runs twice as fast will still solve your problem twice as fast, won’t it? (How could it not?) And if the algorithm you’re using to solve the problem is perfectly parallelizable (which I grant to AI foom proponents ’cause it seems plausible to me), then throwing twice the hardware at any given problem will solve it twice as fast. (Although yes, it will not solve problems that are twice as big.)
The brain architecture is highly parallel, however, how it forms high-level thoughts is not known.
My guess is that’s some sort of parallel Monte Carlo search driven by complex, partially innate and partially learned, heuristics.
Yes, but it wouldn’t be twice as smart. If you were to speed up a chicken brain by a factor of 10,000 you wouldn’t get a super-human intelligence.
Perfect parallelizability (linear speedup in the number of processors) is physically impossible due to the fact that information propagates at finite speed, though depending on hardware details, as long as your computer doesn’t get too big, you can obtain close to linear speedups on certain problems.
NP-complete problems can be solved by brute-force exhaustive search, in principle, which is highly parallelizable. But exhaustive search has a very fast growing exponential complexity, hence it doesn’t get you very far from toy problem instances even on parallel hardware. The more complex heuristics and approximation schemes you use, the less parallelizability you get, in general.
Anyway, 10,000 chickens won’t make a super-human intelligence, even if you found some way to wire their brains togheter.
One of the cooler papers I’ve seen connecting MC with thinking is http://www.stanford.edu/~ngoodman/papers/LiederGriffithsGoodman2012NIPS.pdf which claims that MCMC can even explain some cognitive biases. (I don’t know as much about MCMC as I would like, so I can’t evaluate it.)
Sure, but if we assume we manage to have a human-level AI, how powerful should we expect it to be if we speed that up by a factor of 10, 100, or more?
Personally, I’m pretty sure such a thing is still powerful enough to take over the world (assuming it is the only such AI), and in any case dangerous enough to lock us all in a future we really don’t want.
At that point, I don’t really care if it’s “superhuman” or not.
It won’t be any smarter at all actually, it will just have more relative time.
Basically, if you take someone, and give them 100 days to do something, they will have 100 times as much time to do it as they would if it takes 1 day, but if it is beyond their capabilities, then it will remain beyond their capabilities, and running at 100x speed is only helpful for projects for which mental time is the major factor—if you have to run experiments and wait for results, all you’re really doing is decreasing the lag time between experiments, and even then only potentially.
Its not even as good as having 100 slaves work on a project (as someone else posited) because you’re really just having ONE slave work on the project for 100 days; copying them 100 times likely won’t help that issue.
This is one of the fundamental problems with the idea of the singularity in the first place; the truth is that designing more intelligent intelligences is probably HARDER than designing simpler ones, possibly by orders of magnitude, and it may not be scalar at all. If you look at rodent brains and human brains, there are numerous differences between them—scaling up a rodent brain to the same EQ as a human brain would NOT give you something as smart as a human, or even sapient.
You are very likely to see declining returns, not accelerating returns, which is exactly what we see in all other fields of technology—the higher you get, the harder it is to go further.
Moreover, it isn’t even clear what a “superhuman” intelligence even means. We don’t even have any way of measuring intelligence absolutely that I am aware of—IQ is a statistical means, as are standardized tests. We can’t say that human A is twice as smart as human B, and without such metrication it may be difficult to determine just how much smarter anything is than a human in the first place. If four geniuses can work together and get the same result as a computer which takes 1000 times as much energy to do the same task, then the computer is inefficient no matter how smart it is.
This efficiency is ANOTHER major barrier as well—human brains run off of cherrios, whereas any AI we build is going to be massively less efficient in terms of energy usage per cycle, at least for the foreseeable futures.
Another question is whether there is some sort of effective cap to intelligence given energy, heat dissipation, proximity of processing centers, ect. Given that we’re only going to see microchips 256 times as dense on a plane as we have presently available, and given the various issues with heat dissipation of 3D chips (not to mention expense), we may well run into some barriers here.
I was looking at some stuff last night and while people claim we may be able to model the brain using an exascale computer, I am actually rather skeptical after reading up on it—while 150 trillion connections between 86 billion neurons doesn’t sound like that much on the exascale, we have a lot of other things, such as glial cells, which appear to play a role in intelligence, and it is not unlikely that their function is completely vital in a proper simulation. Indeed, our utter lack of understanding of how the human brain works is a major barrier for even thinking about how we can make something more intelligent than a human which is not a human—its pretty much pure fantasy at this point. It may be that ridiculous parallelization with low latency is absolutely vital for sapience, and that could very well put a major crimp on silicon-based intelligences at all, due to their more linear nature, even with things like GPUs and multicore processors because the human brain is sending out trillions of signals with each step.
Some possibilities for simulating the human brain could easily take 10^22 FLOPS or more, and given the limitations of transistor-based computing, that looks like it is about the level of supercomputer we’d have in 2030 or so—but I wouldn’t expect much better than that beyond that point because the only way to make better processors at that point is going up or out, and to what extent we can continue doing that… well, we’ll have to see, but it would very likely eat up even more power and I would have to question the ROI at some point. We DO need to figure out how intelligence works, if only because it might make enhancing humans easier—indeed, unless intelligence is highly computationally efficient, organic intelligences may well be the optimal solution from the standpoint of efficiency, and no sort of exponential takeoff is really possible, or even likely, with such.
In many fields of technology, we see sigmoid curves, where initial advancements lead to accelerating returns until it becomes difficult to move further ahead without running up against hard problems or fundamental limits, and returns diminish.
Making an artificial intelligence as capable as a human intelligence may be difficult, but that doesn’t mean that if we reach that point, we’ll be facing major barriers to further progression. I would say we don’t have much evidence to suggest humans are even near the ceiling of what’s strictly possible with a purely biological intelligence; we’ve had very little opportunity for further biological development since the point when cultural developments started accounting for most of our environmental viability, plus we face engineering challenges such as only being able to shove so large a cranium through a bipedal pelvis.
We have no way to even measure intelligence, let alone determine how close to capacity we’re at. We could be 90% there, or 1%, and we have no way, presently, of distinguishing between the two.
We are the smartest creatures ever to have lived on the planet Earth as far as we can tell, and given that we have seen no signs of extraterrestrial civilization, we could very well be the most intelligent creatures in the galaxy for all we know.
As for shoving out humans, isn’t the simplest solution to that simply growing them in artificial wombs?
We already have a simpler solution than that, namely the Cesarian section. It hasn’t been a safe option long enough to have had a significant impact as an evolutionary force though. Plus, there hasn’t been a lot of evolutionary pressure for increased intelligence since the advent of agriculture.
We might be the most intelligent creatures in the galaxy, but that’s a very different matter from being near the most intelligent things that could be constructed out of a comparable amount of matter. Natural selection isn’t that great a process for optimizing intelligence, it’s backpedaled on hominids before given the right niche to fill, so while we don’t have a process for measuring how close we are to the ceiling, I think the reasonable prior on our being close to it is pretty low.
As powerful as a a team of 10, 100 human slaves, or a little more, but within the same order or magnitude.
100 slaves are not going to take over the world.
One 10,000 year old human might be able to do it, though.
Without any legal protection?
At first. If the “100 slaves” AI ever gets out of the box, you can multiply the initial number by the amount of hardware it can copy itself to. It can hack computers, earn (or steal) money, buy hardware…
And suddenly we’re talking about a highly coordinated team of millions.
That’s the plot of the Terminator movies, but it doesn’t seem to be a likely scenario.
During their regime, the Nazis locked up, used as slave labor, and eventually killed, millions of people. Most of them were Ashkenazi Jews, perhaps the smartest of all ethnic groups, with a language difficult to comprehend to outsiders, living in close-knit communities with transnational range, and strong inter-community ties.
Did they get “out of the box” and take over the Third Reich? Nope.
AIs might have some advantages for being digital, but also disadvantages.
I think you miss the part where the team of millions continues its self-copying until it eats up every available computing power. If there’s any significant computing overhang, the AI could easily seize control of way more computing power than all the human brains put together.
Also, I think you underestimate the “highly coordinated” part. Any copy of the AI will likely share the exact same goals, and the exact same beliefs. Its instances will have common knowledge of this fact. This would creates an unprecedented level of trust. (The only possible exception I can think of are twins. And even so…)
So, let’s recap:
Thinks 100 times faster than a human, though no better.
Can copy itself over many times (the exact amount depends on computing power available).
The resulting team forms a nearly perfectly coordinated group.
Do you at least concede that this is potentially more dangerous than a whole country armed up with nukes? Would you rely on it being less dangerous than that?
When I imagine that I could make my copy which would be identical to me, sharing my goals, able to copy its experiences back to me, and willing to die for me (something like Naruto’s clones), taking over the society seems rather easy. (Assuming that no one else has this ability, and no one suspects me of having it. In real life it would probably help if all the clones looked different, but had an ability to recognize each other.)
Research: For each interesting topic I could make dozen clones which would study the topic in libraries and universities, and discuss their findings with each other. I don’t suppose it would make me an expert on everything, but I could get at least all the university-level education on most things.
Resources: If I can make more money than I spend, and if I don’t make too much copies to imbalance the economy, I can let a few dozen clones work and produce the money for the rest of them. At least in the starting phase, until my research groups discover better ways to make money.
Contacts: Different clones could go to different places making useful contacts wil different kinds of people. Sometimes you find a person which can help your goals significantly. With many clones I could make contacts in many different social groups, and overcome language or religious barriers (I can have a few clones learn the language or join the religion).
Multiple “first impressions”: If I need a help of a given person or organization, I could in many cases gain their trust by sending multiple different clones to them, using different strategies to befriend them, until I find one that works.
Taking over democratic organizations: Any organization with low barriers to entry and democratic voting can be taken over by sending enough clones there, and then voting some of the clones as new leaders. A typical non-governmental organization or even a smaller political party could be gained this way. I don’t even need a majority of clones there: two potential leaders competing with each other, half dozen experts openly supporting each of them, and dozen people befriending random voters and explaining them why leader X or leader Y is the perfect choice; then most of the voting would be done by other people.
Assassination: If someone is too much of a problem, I can create a clone which kills them and then disappears. This should be used very rarely, not to draw attention to my abilities.
Safety: To protect myself, I would send my different clones to different countries over the world.
Joining all the winning sides: If there is an important group of people, I could join them, even the groups fighting against each other. Whoever wins, some of my clones are on the winning side.
There are a lot of “ifs”, though.
If that AI runs on expensive or specialized hardware, it can’t necessarily expand much. For instance, if it runs on hardware worth millions of dollars, it can’t exactly copy itself just anywhere yet. Assuming that the first AI of that level will be cutting edge research and won’t be cheap, that gives a certain time window to study it safely.
The AI may be dangerous if it appeared now, but if it appears in, say, fifty years, then it will have to deal with the state of the art fifty years from now. Expanding without getting caught might be considerably more difficult then than it is now—weak AI will be all over the place, for one.
Last, but not least, the AI must have access to its own source code in order to copy it. That’s far from a given, especially if it’s a neural architecture. A human-level AI would not know how it works any more than we know how we work, so if it has no read access to itself or no way to probe its own circuitry, it won’t be able to copy itself at all. I doubt the first AI would actually have fine-grained access to its own inner workings, and I doubt it would have anywhere close to the amount of resources required to reverse engineer itself. Of course, that point is moot if some fool does give it access...
I agree with your first point, though it gets worse for us as hardware gets cheaper and cheaper.
I like your second point even more: it’s actionable. We could work on the security of personal computers.
That last one is incorrect however. The AI only have to access its object code in order to copy itself. That’s something even current computer viruses can do. And we’re back to boxing it.
If the AI is a learning system such as a neural network, and I believe that’s quite likely to be the case, there is no source/object dichotomy at all and the code may very well be unreadable outside of simple local update procedures that are completely out of the AI’s control. In other words, it might be physically impossible for both the AI and ourselves to access the AI’s object code—it would be locked in a hardware box with no physical wires to probe its contents, basically.
I mean, think of a physical hardware circuit implementing a kind of neuron network—in order for the network to be “copiable”, you need to be able to read the values of all neurons. However, that requires a global clock (to ensure synchronization, though AI might tolerate being a bit out of phase) and a large number of extra wires connecting each component to busses going out of the system. Of course, all that extra fluff inflates the cost of the system, makes it bigger, slower and probably less energy efficient. Since the first human-level AI won’t just come out of nowhere, it will probably use off-the-shelf digital neural components, and for cost and speed reasons, these components might not actually offer any way to copy their contents.
This being said, even if the AI runs on conventional hardware, locking it out of its own object code isn’t exactly rocket science. The specification of some programming languages already guarantee that this cannot happen, and type/proof theory is an active research field that may very well be able to prove the conformance of implementation to specification. If the AI is a neural network emulated on conventional hardware, the risks that it can read itself without permission are basically zilch.
What we usually mean by intelligence doesn’t include the skills necessary for getting to be in charge.
There are various notions of intelligence, social intelligence includes the skills for getting in charge.
My point is that human-level intelligence, even replicated or sped up, is generally not enough.
I’m not sure if that’s a good comparison. Compare the following cases:
A. 1 smart human, given 100 days to solve some problem
B. 100 smart humans, given 1 day to solve some problem.
C. 1,000 smart humans, given 1 day to solve some problem.
A would outperform B on most tasks, and probably even C. Most problems just aren’t that parallelizable.
That’s why I wrote “or a little more, but within the same order or magnitude”
Right. You’re definitely gonna be able to get the same solution to the same problem twice as fast. The thing labeled by labels like “NP hard” is that doubling your hardware doesn’t let you solve problems that are twice as complicated in your unit of time. So your dumb robot can do dumb things twice as fast, but it can’t do things twice as smart :P
There’s one more consideration, which is that if you’re approximating and you keep the problem the same, doubling your hardware won’t always let you find a solution that’s twice as good. But I think this can reasonably be either sublinear or superlinear, until you get up to really large amounts of computing power.
Right, the problem is that “twice as fast” doesn’t help you much for most problems. For example, if you are solving the Traveling Salesman Problem, then doubling your hardware will allow you to add one more city to the map (under the worst-case scenario). So, now your AI could solve the problem for 1001 cities, instead of 1000. Yey.
But given the right approximation algorithm...
No problem is perfectly parallelizable in a physical sense. If you build a circuit to solve a problem, and that the circuit is one light year across in size, you’re probably not going to solve it in under a year—technically, any decision problem implemented by a circuit is at least O(n) because that’s how the length of the wires scale.
Now, there are a few ways you might want to parallelize intelligence. The first way is by throwing many independent intelligent entities at the problem, but that requires a lot of redundancy, so the returns on that will not be linear. A second way is to build a team of intelligent entities collaborating to solve the problem, each specializing on an aspect—but since each of these specialized intelligent entities is much farther from each other than the respective modules of a single general intelligence, part of the gains will be offset by massive increases in communication costs. A third way would be to grow an AI from within, interleaving various modules so that significant intelligence is available in all locations of the AI’s brain. Unfortunately, doing so requires internal scaffolding (which is going to reduce packing efficiency and slow it down) and it still expands in space, with internal communication costs increasing in proportion of its size.
I mean, ultimately, even if you want to do some kind of parallel search, you’re likely to use some kind of divide and conquer technique with a logarithmic-ish depth. But since you still have to pack data in a 3D space, each level is going to take longer to explore than the previous one, so past a certain point, communication costs might outweigh intelligence gains and parallelization might become somewhat of a pipe dream.
That is a pretty cool idea.
There are a few like it. For example: All problems are at least Ω(max(N,M)), in the size of the problem description and output description.
It’s not usually the limiting factor. ;)
Actually, only the output; sometimes you only need the first few bits. Your equation holds if you know you need to read the end of the input.
And technically you can lower that to sqrt(M) if you organize the inputs and outputs on a surface.
When we talk about the complexity of an algorithm, we have to decide what resources we are going to measure. Time used by a multi-tape Turing machine is the most common measurement, since it’s easy to define and generally matches up with physical time. If you change the model of computation, you can lower (or raise) this to pretty much anything by constructing your clock the right way.
Ah, sorry, I might not have been clear. I was referring to what may be physically feasible, e.g. a 3D circuit in a box with inputs coming in from the top plane and outputs coming out of the bottom plane. If you have one output that depends on all N inputs and pack everything as tightly as possible, the signal would still take Ω(sqrt(N)) time to reach. From all the physically doable models of computation, I think that’s likely as good as it gets.
Oh I see, we want physically possible computers. In that case, I can get it down to log(n) with general relativity, assuming I’m allowed to set up wormholes. (This whole thing is a bit badly defined since it’s not clear what you’re allowed to prepare in advance. Any necessary setup would presumably take Ω(n) time anyways.)
This is just an educated guess, but to me massive parallel search feels very unlikely for human intelligence. To do something “massive parallel”, you need a lot of (almost) identical hardware. If you want to run the same algorithm 100 times in parallel, you need 100 instances of the (almost) same hardware. Otherwise—how can you run that in parallel?
Some parts of human brain work like that, as far as I know. The visual part of the brain, specifically. There are many neurons implementing the same task: scanning an input from a part of retina, detecting lines, edges, and whatever. This is why image recognition is extremely fast and requires a large part of the brain dedicated to this task.
Seems to me (but I am not an expert) that most of the brain functionality is not like this. Especially the parts related to thinking. Thinking is usually slow and needs to be learned—which is the exact opposite of how the massively parallelized parts work.
EDIT: Unless by massive parallel human intelligence you meant multiple people working on the same problem.
I’m not an expert either, but from what I’ve read on the subject, most of the neocortex does work like this. The architecture used in the visual cortex is largely the same as that used in the rest of the cortex, with some minor variations. This is suggested by the fact that people who lose an area of their neocortex are often able to recover, with another area filling in for it. I’m on a phone, so I can’t go into as much detail as I’d like, but I recommend investigating the work of Mountcastle, and more recently Markram.
Edit: On Intelligence by Jeff Hawkins explains this principle in more depth, it’s an interesting read.
I find the use of schematic differential equations, as if they actually meant something, to be horrifically bad. Yudkowsky’s original point in Hard Takeoff was that there is no a priori reason to expect than an agent that can RSI should improve at a rate that humans can react to.
Even naive dimensional analysis is enough to show that these equations don’t mean anything.
I think use of equations is fine as long as you don’t put more weight in to them than words. Ultimately, as I said, it’s all very speculative. Equations represent model thinking, not association-based reasoning or reasoning by analogy. I tend to think that model thinking is typically more useful than the other two, but yes, if you’re the sort of person who says “if it’s an equation, it must be right” then you shouldn’t do that here.
Go on...
Saying that “compiler technology” has only made floating point programs 8 times faster is somewhat too much of an apples-to-apples comparison. Sure, if you take the exact same Fortran program and recompile it decades later you may only see an 8x speedup (I’d have guessed 2x or 4x, myself, depending on how much the hardware benefits from vectorized instructions). But if you instead take a more modern program designed to solve the same higher-level problem, you are more likely to see a three order of magnitude speedup. Graph from the SCaLeS Report, Vol. 2, D. Keyes et. al. eds, 2004; it specifically refers to magnetohydrodynamics simulations but it’s pretty typical of a wide class of computational mathematics problems.
An AGI will presumably be able to optimize not only its own source code compilation, but also its own algorithm choices. That process will also eventually hit diminishing returns, but who knows how many orders of magnitude it could get before things plateau? The first AGI is likely to be using a lot of relatively new and suboptimal algorithms almost by definition of “first”.
Why?
Because optimality isn’t actually required, and humans are bad at perfection.
Yes, but non-perfect doesn’t imply that there is much room for improvement
I was handwaving a bit there, huh? “Some relatively new algorithm(s)” would have been true by definition; everything else needs a bit more justification:
“A lot of relatively new”: whatever makes the difference between problem-specific AI and general unknown-problem-handling AGI is going to be new. The harder these subproblems are (and I’d say they’re likely to be hard), the more difficult new algorithms are going to be required.
“suboptimal”: just by induction, what percentage of the time do we immediately hit on an optimal algorithm to solve a complicated problem, and do we expect the problems in AGI to be harder or easier than most of this reference class? Even superficially simple problems with exact solutions like sorting have a hundred algorithms whose optimality varies depending on the exact application and hardware. Hard problems with approximate solutions like uncertainty quantification are even worse. The people I know doing state-of-the-art work with Bayesian inverse problems are still mostly using (accelerated variants of) Monte Carlo, despite general agreement with that old quote about how Monte Carlo is the way you solve problems when you don’t yet know the right way to solve them.
I’ve wondered about the possibility of FOOM-FLOP. Eventually, the AI is exploring unknown territory as it tries to improve itself, and it seems at least possible that it tries something plausible which breaks itself. Backups are no guarantee of safety—the AI could have “don’t use the backups” as part of the FLOP.
In effect the AI would need to be provably friendly to its past self.
Why would an AI try an upgrade it couldn’t prove would work?
I believe David Wolpert (of the No Free Lunch Theorems fame) had a paper asserting the impossibility of perfect self referential modeling.
Doesn’t need to be perfect. Just be smarter and have the same goals.
“Prove would work” is a much more stringent standard than “just be smarter and have the same goals.” The import of the paper is that the budding FOOMster couldn’t prove that a change would work.
Lacking a proof, if the imperfection lies in the modeling of intelligence improvement, it could be in error when believing the update will make it smarter and have the same goals.
I believe Nancy’s point is a very good one. Intelligence has generally improved in evolutionary fashion. Some things work, some don’t. People seem to picture the AI as a fully self referential optimizer, which per Wolpert would be a mistake, and more generally a mistake as I think fully recursive self reference explodes without bounds in complexity as you iterate your self referencing.
Instead, you have some simple rules applied to some subset self referentially, which seems to improve some local issue, but may fail in a wider context. In the end, you make a guess based on your current functioning and reality tells you if you were right.
I suppose it depends what you mean by ‘smarter’. I mean, code optimizations are provable, and if Löb’s theorem says you can’t safely trim a million consecutive no-ops that somehow snuck into your inner loop, then it’s a dumb theorem to use.
Developing new heuristics is a whole different kettle of fish and yes it’s a rough-and-tumble world out there.
Upon further reflection, it seems to me that the real upgrades are either going to be heuristics adopted in a continuous fashion on a Bayesian basis (software), or hardware.
And hardware contract proving is a much littler thing altogether. Basically, when DOES this theorem apply?
If the expected gain from the upgrade, assuming it worked, outweighed the cost of the upgrade failing.
That, and there’s also the possibility that the AI’s proof might have a serious mistake.
A handful of the many, many problems here:
It would be trivial for even a Watson-level AI, specialized to the task, to hack into pretty much every existing computer system; almost all software is full of holes and is routinely hacked by bacterium-complexity viruses
“The world’s AI researchers” aren’t remotely close to a single entity working towards a single goal; a human (appropriately trained) is much more like that than Apple, which is much more like than than the US government, which is much more like that than a nebulous cluster of people who sometimes kinda know each other
Human abilities and AI abilities are not “equivalent”, even if their medians are the same AIs will be much stronger in some areas (eg. arithmetic, to pick an obvious one); AIs have no particular need for our level of visual modeling or face recognition, but will have other strengths, both obvious and not
There is already a huge body of literature, formal and informal, on when humans use System 1 vs. System 2 reasoning
A huge amount of progress has been made in compilers, in terms of designing languages that implement powerful features in reasonable amounts of computing time; just try taking any modern Python or Ruby or C++ program and porting it to Altair BASIC
Large sections of the economy are already being monopolized by AI (Google is the most obvious example)
I’m not going to bother going farther, as in previous conversations you haven’t updated your position at all (http://lesswrong.com/lw/i9/the_importance_of_saying_oops/) regardless of how much evidence I’ve given you.
Did I suggest otherwise?
Interesting point. As I wrote, I think that an AGI monopolizing larger and larger sections of the economy is a strong possibility.
(Feel free to read things before commenting on them!)
...
I agree there are important differences. Why do you feel they’re important for my argument? Quantifying ability to make AI progress with a single number is indeed a coarse approximation, but coarse approximations are all we have.
That’s not especially important for my argument, because I treat “intelligence” as “the ability to do AI research and program AIs”. (Could I have made that more clear?)
Well, if you’re familiar with that literature, feel free to share whatever’s relevant ;)
Good point.
If you want to know, over the course of thinking about this topic, I changed from leaning towards Yudkowsky’s position to leaning towards Hanson’s. Anyway, if you think you have useful things to say, it might be worth saying them for the sake of bystanders.
I think you’re failing to account for how dramatically a relatively slight difference in intelligence within such a metric is liable to compound itself. A single really intelligent human can come up with insights in seconds that a thousand dimwitted humans can’t come with in hours. Even within human scales, you can get intelligence differences that mean the difference between problems being insurmountable and trivial. In the grand scheme of things, an average human may have most of the intelligence that a brilliant one does, but that doesn’t mean that they’ll be able to do intellectual work at nearly the same rate, or even that they’d ever be able to accomplish what the brilliant one does. To suppose that the work of a self modifying AI and the human community would compound on a comparable timescale, I think presupposes that the advancement of the AI would remain within an extremely narrow window.
Well, by definition, their intelligence varies wildly according to the metric of making important discoveries. So surely you mean a relatively small difference in human biology. And this fact, while interesting, doesn’t obviously say (to me) that the smart people have some kind of killer algorithm that the less intelligent folks lack… which is the only means by which an AGI could compound its intelligence. It just says that small biological variations create large intelligence variations.
Well, there certainly don’t seem to be major hardware differences between smart and not so smart humans. But it wouldn’t take a strong AI access to a lot of resources before it would be in a position to start acquiring more hardware and computing resources.
The “powerful features” of Python and Ruby are only barely catching up to Lisp, and as far as I know Lisp is still faster than both of them.
Except then you have to program in Lisp.
A fate worse than death.
Actually, SLIME is still the best debugger and IDE I’m aware of. Honorable mention to OSD, but that’d involve working in Java.
The economy is a complicated hetrogenous process, yet it has improved by orders of magnitude.
The more a component is a bottleneck, the stronger the incentive to find any workaround. If the price of some material or component is starting to bottleneck the economy, there is a huge incentive to reduce your use of it, to find any kind of new way of making it, or other workaround.
In an economy, there is sufficient fungibility. Expensive components can be replaced by cheaper ones often enough, new methods of production can be discovered.
Just a sidenote:
There is no strictly linear correlation between neuron density and brain power:
An impressive abstract showcasing synaptic pruning:
Human cognition is fundamentally limited by biological drives, tiredness, boredom, limited working memory, and low precision. Humans can’t recursively improve their own minds and so our exponential growth rate is constant. An AI’s improvement rate will not be constant and so I think it is unreasonable to estimate the rate of exponential growth of an AI based on how long it takes human researchers to develop an AI with an equivalent level of ability.
For instance, let’s say that in 2080 we develop an AI capable of designing itself from scratch in exactly 80 years. However, the AI does not have to recreate itself from scratch and presumably it does not need to wait 80 years to improve itself. For example let’s just assume that the AI can upgrade itself once per year and the effects are cumulative. Let’s also assume that it can direct its entire output into improving itself. This means that after 1 year of 1.25% growth in capability it is fundamentally more capable at improving itself (101.25% as capable, in fact). Assuming that the growth rate is directly proportional to its current capability in the next year instead of 1.25% growth it will experience 2.5% growth. The year after that, 6%. The AI would double in capability in 5 years. Practically, the development of hardware will be a hard limit on the rapidity that an AI can improve itself, so 5 years may be a stretch.
If we factor in Moore’s Law we could talk about an AI that reaches a point in 2080 such that even with Moore’s Law it will take another 80 years to reproduce itself from scratch, e.g. that nearly half of the entire workload would be done in 2158 and 2159. The growth of such an AI would be much slower because it would have logarithmically fewer resources available in 2080 compared to 2160 (and in fact by 2161 it would be capable of doubling every year). Such an AI would have to be very weak compared to human researchers to require such a long time of 100%/18-month growth in computing power, so I don’t think it’s very meaningful to try to scale the problem this way. After all, the AI research field has not been doubling in capability every 18 months since the 1950s. So it makes sense to talk about an AI in 2080 that, if ran on the hardware of 2080, would take another 80 years to develop itself from scratch. I am fairly confident that allowing it to self-improve on improving hardware would lead to hard takeoff within a period of a few years.
Not all of the human thought process goes on inside the head. An engineer with a computer is far more productive in terms of designs generated than one with a pad of paper (and in turn more productive than one without any tools whatsoever).
We’ve merely gotten all of the obvious low-hanging recursive improvements. From exporting calculations out of our heads (abacus, paper and pencil, slide rule, computer) to better organizational systems, we’ve improved our ability to turn our thoughts into useful work.
If we find another big improvement, it will seem obvious in retrospect.
You are right, and it’s interesting to consider this quote from the article in that light:
What would a group of human AI researchers capable of completely reimplementing a copy of themselves be able to do? I’m assuming for example’s sake that if an AGI could do it, so could the human researchers it is on par with. That’s actually a tremendous amount of power for either an AGI or a group of humans. As it is today we’ve been lucky to discover modern medicine and farming techniques and find fossil fuels just to boost the total population and scavenge the tiny percentage of scientists and engineers off of it. We won’t be able to double the number of high-quality AI researchers every 50 years for long on this rock without an actual improvement in the rate of growth of AI research. The point where any system acquires the ability to be self-sustaining seems like it would have to be an inflection point of greatly increased growth.
I’m confused. If “growth rate is directly proportional to current capability”, then why would you ever stop having 1.25% growth? You’d just be seeing 1.25% of an increasingly larger number.
You’re right, I stated that incorrectly. In my example the growth rate and the capability were both increasing with the justification that an improvement in the ability to improve itself would lead to an increasing growth rate over time. For instance if each (u,d) pair of improvement and difficulty is ordered correctly (luckily?) it is likely that solving enough initial problems will decrease the difficulty of future improvements and lead to an increase in the growth rate. Instead of leading to diminishing returns as the low-hanging fruit is discovered, the low-hanging fruit will turn previously hard problems into low-hanging fruit.
Right, there are two competing forces here… diminishing returns, and the fact that early wins may help with later wins. I don’t think it’s obvious that one predominates.
I wanted to talk a bit more about what biology may or may not tell us about the ease of AGI.
This OB post discusses the importance of brain hardware differences in intelligence. One of the papers mentioned writes:
It seems plausible to me that the key software innovations for general intelligence appeared long before the evolution of humans, and humans mainly put a record-breaking number of densely packed neurons behind them. Speaking extremely speculatively, it might be that the algorithms used in human cognition get additional layers of abstraction capability (in some form or another) from additional brain hardware. This has interesting implications for throwing more hardware behind a working AGI if the AGI’s algorithms share this characteristic.
The ancient Greeks discovered a lot of maths and logic that is the distant precursor of what is needed to build an AI. Where you draw your starting point, in the year 2000, is an arbitrary point. It might roughly correspond to when research started to use the word “AI” in the title, but that has little to do with anything. If we assume that AI output is linear in research, with some arbitrary date chosen as zero, then the rate of progress depends entirely on choice of zero. Many computer technologies have shown dramatic progress in far less than 58 years. Often doing things well is only slightly harder than doing them at all. Often the first piece of software that can do a task at all can do it far faster than a human.
(eg image recognition to a particular accuracy) If we consider the task of an AI researcher trying to make improvements in a piece of AI code, there could be a small change in quality between almost all changes making the system worse, and almost all changes being improvements. And this system could be very fast on human timescales.
The economy is complicated and hetrogenious, but it has doubled repeatedly.
If you have radically improved all your processes except X then you have far more tools and reasources to use to improve the production of X. You also have far more resources, and a strong incentive to substitute X with the easier to make Y. In the real world, there are many different ways to get a job done, so we can route around bottlenecks. (Eg by replacing whale oil with mineral oil)
Edit: Yes a near repeat. Computer glitch. Thought I had deleted comment, but it got posted instead.
I haven’t read this paper in detail, but it seems to suggest that Moore’s Law-style exponential growth may not be that far off for most technologies:
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0052669
(Which probably counts as a point for Foom proponents.)
That’s how technological evolution works, though. If you’re in olden-days-Tasmania, you get devolution. Otherwise, you get progress. There’s a threshold effect involved. We have to reproduce the progress seen in cultural evolution—not just make a mind.
You can’t know the difficulty of a problem until you’ve solved it. Look at Hilbert’s problems. Some were solved immediately while others are still open today. Proving the you can color a map with five colors is easy and only takes up half a page. Proving that you can color a map with four colors is hard and takes up hundreds of pages. The same is true of science—a century ago physics was thought to soon be a dead field until the minor glitches with blackbody radiation and Mercury’s orbit turned out to be more than minor and actually dictated by mathematically complex theories who’s interaction with each other is still well beyond our best minds today. That’s why trying to predict the growth of intelligence is exactly as silly as trying to predict the number of Hilbert’s problems that will be solved over time. It has much less to do with how smart we are and much more to do with how hard the problems are, and that we won’t know until we solve them.
Contrary to everything said in (1), I think the software problem of AI is already solved. Simply note that (a) When people think programming an AI to be impossible it’s because they think of hardcoding and how no one understands the mind even remotely well enough to do this. But do we hardcode neural nets? No, in fact neural nets are magical in that no one can hardcode a facial recognition program as effective as a trained neural net. Suppose a sufficiently large neural net can be as smart as a human. Then what we would expect from smaller neural nets is exactly what we see now, namely non-rigid intelligence similar to our own but more limited. It would be absurd to expect more of them given our current hardware. (b) There are two forms of signalling in the body—electrical via action potentials and chemical via diffusion. Since the chemical sets up the electrical and diffusion is rather imprecise there are fundamental limits on how refined the brain’s macroscopic architecture can be. At the molecular scale biology is extremely complex. Enzymatic proteins are machines of profound sophistication. But none of it matters when it comes to understanding how the brain computes in real time because the only fast form of signalling is between neurons through electrical signals (chemical at the synapses but that’s a tiny distance). So the issue comes down to how the neurons are arranged to give rise to intelligence. But how they are arranged is relatively rough in its precision because that’s how chemical diffusion works.
With (1) and (2) in mind let’s address what the AI problem is really about—hardware. Moore’s law is going to hit the atomic barrier much earlier than even Kurzweil would expect computers to facilitate AI. The simple fact of the matter is that there is no clear way beyond this point. Neither parallel programming nor quantum computing is going to save the day without massive unprecedented breakthroughs. It’s a hard ware problem, and we won’t know how hard until we solve it.
~ a bioinformatics student and ex-singularitarian
We don’t use parallel systems efficiently today because we don’t have software systems that provide typical programmers with a human-comprehensible interface to program them. Writing efficient, correct parallel code in traditional programming languages is very difficult; and some of the research languages which promise automatic parallelization are on the high end of difficulty for humans to learn.
Sorry the way worded it makes me look silly. I just meant that even if we had the perfect software we simply wouldn’t get a big enough speedup to bridge the gap.
A point against there being important, chunky undiscovered insights in to intelligence is that if there were such insights, they’d likely be simple, and if they’d be simple, they likely would have been discovered already. So the fact that no one has yet discovered any such brilliant, simple idea is evidence against them existing. (I’m not the first to point out the increasing difficulty of making new contributions in math/science; gwern says anything referencing Jones on this page may be relevant. Compare the breadth and applicability of the discoveries made by a genius from centuries ago like Gauss or Euler, who discovered things which are taught to engineering undergraduates, vs a modern genius like Grothendieck).
We can’t consider “chunky undiscovered insights in to intelligence”—and then argue that they don’t exist because they would have already been discovered. We can’t have already discovered “undiscovered insights”—and we have certainly discovered plenty of big insights already. The problem is not so much that big insights don’t exist, it is more that we seem to have discovered a lot of them already.
Sure, it doesn’t sound like we disagree on anything.
Might be interesting to try to plot the number of big insights made per year to see if they were trailing off yet or not. One potential problem would be figuring out whether a recent insight was going to end up being “big” or not.