Sustained Strong Recursion
Followup to: Cascades, Cycles, Insight, Recursion, Magic
We seem to have a sticking point at the concept of “recursion”, so I’ll zoom in.
You have a friend who, even though he makes plenty of money, just spends all that money every month. You try to persuade your friend to invest a little—making valiant attempts to explain the wonders of compound interest by pointing to analogous processes in nature, like fission chain reactions.
“All right,” says your friend, and buys a ten-year bond for $10,000, with an annual coupon of $500. Then he sits back, satisfied. “There!” he says. “Now I’ll have an extra $500 to spend every year, without my needing to do any work! And when the bond comes due, I’ll just roll it over, so this can go on indefinitely. Surely, now I’m taking advantage of the power of recursion!”
“Um, no,” you say. “That’s not exactly what I had in mind when I talked about ‘recursion’.”
“But I used some of my cumulative money earned, to increase my very earning rate,” your friend points out, quite logically. “If that’s not ‘recursion’, what is? My earning power has been ‘folded in on itself’, just like you talked about!”
“Well,” you say, “not exactly. Before, you were earning $100,000 per year, so your cumulative earnings went as 100000 * t. Now, your cumulative earnings are going as 100500 * t. That’s not really much of a change. What we want is for your cumulative earnings to go as B * e^At for some constants A and B—to grow exponentially.”
“Exponentially!” says your friend, shocked.
“Yes,” you say, “recursification has an amazing power to transform growth curves. In this case, it can turn a linear process into an exponential one. But to get that effect, you have to reinvest the coupon payments you get on your bonds—or at least reinvest some of them, instead of just spending them all. And you must be able to do this over and over again. Only then will you get the ‘folding in’ transformation, so that instead of your cumulative earnings going as y = F(t) = A*t, your earnings will go as the differential equation dy/dt = F(y) = A*y whose solution is y = e^(A*t).”
(I’m going to go ahead and leave out various constants of integration; feel free to add them back in.)
“Hold on,” says your friend. “I don’t understand the justification for what you just did there.”
“Right now,” you explain, “you’re earning a steady income at your job, and you also have $500/year from the bond you bought. These are just things that go on generating money at a constant rate per unit time, in the background. So your cumulative earnings are the integral of that constant rate. If your earnings are y, then dy/dt = A, which resolves to y = At. But now, suppose that instead of having these constant earning forces operating in the background, we introduce a strong feedback loop from your cumulative earnings to your earning power.”
“But I bought this one bond here—” says your friend.
“That’s not enough for a strong feedback loop,” you say. “Future increases in your cumulative earnings aren’t going to increase the value of this one bond, or your salary, any further. One unit of force transmitted back is not a feedback loop—it has to be repeatable. You need a sustained recursion, not a one-off event.”
“Okay,” says your friend, “how about if I buy a $100 bond every year, then? Will that satisfy the strange requirements of this ritual?”
“Still not a strong feedback loop,” you say. “Suppose that next year your salary went up $10,000/year—no, an even simpler example; suppose $10,000 fell in your lap out of the sky. If you only buy $100/year of bonds, that extra $10,000 isn’t going to make any long-term difference to the earning curve. But if you’re in the habit of investing 50% of found money, then there’s a strong feedback loop from your cumulative earnings back to your earning power—we can pump up the cumulative earnings and watch the earning power rise as a direct result.”
“How about if I just invest 0.1% of all my earnings, including the coupons on my bonds?” asks your friend.
“Well...” you say slowly. “That would be a sustained feedback loop but an extremely weak one, where marginal changes to your earnings have relatively small marginal effects on future earning power. I guess it would genuinely be a recursified process, but it would take a long time for the effects to become apparent, and any stronger recursions would easily outrun it.”
“Okay,” says your friend, “I’ll start by investing a dollar, and I’ll fully reinvest all the earnings from it, and the earnings on those earnings as well—”
“I’m not really sure there are any good investments that will let you invest just a dollar without it being eaten up in transaction costs,” you say, “and it might not make a difference to anything on the timescales we have in mind—though there’s an old story about a king, and grains of wheat placed on a chessboard… But realistically, a dollar isn’t enough to get started.”
“All right,” says your friend, “suppose I start with $100,000 in bonds, and reinvest 80% of the coupons on those bonds plus rolling over all the principle, at a 5% interest rate, and we ignore inflation for now.”
“Then,” you reply, “we have the differential equation dy/dt = 0.8 * 0.05 * y, with the initial condition y = $100,000 at t=0, which works out to y = $100,000 * e^(.04*t). Or if you’re reinvesting discretely rather than continuously, y = $100,000 * (1.04)^t.”
We can similarly view the self-optimizing compiler in this light—it speeds itself up once, but never makes any further improvements, like buying a single bond; it’s not a sustained recursion.
And now let us turn our attention to Moore’s Law.
I am not a fan of Moore’s Law. I think it’s a red herring. I don’t think you can forecast AI arrival times by using it, I don’t think that AI (especially the good kind of AI) depends on Moore’s Law continuing. I am agnostic about how long Moore’s Law can continue—I simply leave the question to those better qualified, because it doesn’t interest me very much...
But for our next simpler illustration of a strong recursification, we shall consider Moore’s Law.
Tim Tyler serves us the duty of representing our strawman, repeatedly telling us, “But chip engineers use computers now, so Moore’s Law is already recursive!”
To test this, we perform the equivalent of the thought experiment where we drop $10,000 out of the sky—push on the cumulative “wealth”, and see what happens to the output rate.
Suppose that Intel’s engineers could only work using computers of the sort available in 1998. How much would the next generation of computers be slowed down?
Suppose we gave Intel’s engineers computers from 2018, in sealed black boxes (not transmitting any of 2018′s knowledge). How much would Moore’s Law speed up?
I don’t work at Intel, so I can’t actually answer those questions. I think, though, that if you said in the first case, “Moore’s Law would drop way down, to something like 1998′s level of improvement measured linearly in additional transistors per unit time,” you would be way off base. And if you said in the second case, “I think Moore’s Law would speed up by an order of magnitude, doubling every 1.8 months, until they caught up to the ’2018′ level,” you would be equally way off base.
In both cases, I would expect the actual answer to be “not all that much happens”. Seventeen instead of eighteen months, nineteen instead of eighteen months, something like that.
Yes, Intel’s engineers have computers on their desks. But the serial speed or per-unit price of computing power is not, so far as I know, the limiting resource that bounds their research velocity. You’d probably have to ask someone at Intel to find out how much of their corporate income they spend on computing clusters / supercomputers, but I would guess it’s not much compared to how much they spend on salaries or fab plants.
If anyone from Intel reads this, and wishes to explain to me how it would be unbelievably difficult to do their jobs using computers from ten years earlier, so that Moore’s Law would slow to a crawl—then I stand ready to be corrected. But relative to my present state of partial knowledge, I would say that this does not look like a strong feedback loop.
However...
Suppose that the researchers themselves are running as uploads, software on the computer chips produced by their own factories.
Mind you, this is not the tiniest bit realistic. By my standards it’s not even a very interesting way of looking at the Singularity, because it does not deal with smarter minds but merely faster ones—it dodges the really difficult and interesting part of the problem.
Just as nine women cannot gestate a baby in one month; just as ten thousand researchers cannot do in one year what a hundred researchers can do in a hundred years; so too, a chimpanzee cannot do four years what a human can do in one year, even though the chimp has around one-fourth the human’s cranial capacity. And likewise a chimp cannot do in 100 years what a human does in 95 years, even though they share 95% of our genetic material.
Better-designed minds don’t scale the same way as larger minds, and larger minds don’t scale the same way as faster minds, any more than faster minds scale the same way as more numerous minds. So the notion of merely faster researchers, in my book, fails to address the interesting part of the “intelligence explosion”.
Nonetheless, for the sake of illustrating this matter in a relatively simple case...
Suppose the researchers and engineers themselves—and the rest of the humans on the planet, providing a market for the chips and investment for the factories—are all running on the same computer chips that are the product of these selfsame factories. Suppose also that robotics technology stays on the same curve and provides these researchers with fast manipulators and fast sensors. We also suppose that the technology feeding Moore’s Law has not yet hit physical limits. And that, as human brains are already highly parallel, we can speed them up even if Moore’s Law is manifesting in increased parallelism instead of faster serial speeds—we suppose the uploads aren’t yet being run on a fully parallelized machine, and so their actual serial speed goes up with Moore’s Law. Etcetera.
In a fully naive fashion, we just take the economy the way it is today, and run it on the computer chips that the economy itself produces.
In our world where human brains run at constant speed (and eyes and hands work at constant speed), Moore’s Law for computing power s is:
s = R(t) = e^t
The function R is the Research curve that relates the amount of Time t passed, to the current Speed of computers s.
To understand what happens when the researchers themselves are running on computers, we simply suppose that R does not relate computing technology to sidereal time—the orbits of the planets, the motion of the stars—but, rather, relates computing technology to the amount of subjective time spent researching it.
Since in our world, subjective time is a linear function of sidereal time, this hypothesis fits exactly the same curve R to observed human history so far.
Our direct measurements of observables do not constrain between the two hypotheses
Moore’s Law is exponential in the number of orbits of Mars around the Sun
and
Moore’s Law is exponential in the amount of subjective time that researchers spend thinking, and experimenting and building using a proportional amount of sensorimotor bandwidth.
But our prior knowledge of causality may lead us to prefer the second hypothesis.
So to understand what happens when the Intel engineers themselves run on computers (and use robotics) subject to Moore’s Law, we recursify and get:
dy/dt = s = R(y) = e^y
Here y is the total amount of elapsed subjective time, which at any given point is increasing according to the computer speed s given by Moore’s Law, which is determined by the same function R that describes how Research converts elapsed subjective time into faster computers. Observed human history to date roughly matches the hypothesis that R is exponential with a doubling time of eighteen subjective months (or whatever).
Solving
dy/dt = e^y
yields
y = -ln(C—t)
One observes that this function goes to +infinity at a finite time C.
This is only to be expected, given our assumptions. After eighteen sidereal months, computing speeds double; after another eighteen subjective months, or nine sidereal months, computing speeds double again; etc.
Now, unless the physical universe works in a way that is not only different from the current standard model, but has a different character of physical law than the current standard model, you can’t actually do infinite computation in finite time.
Let us suppose that if our biological world had no Singularity, and Intel just kept on running as a company, populated by humans, forever, that Moore’s Law would start to run into trouble around 2020. Say, after 2020 there would be a ten-year gap where chips simply stagnated, until the next doubling occurred after a hard-won breakthrough in 2030.
This just says that R(y) is not an indefinite exponential curve. By hypothesis, from subjective years 2020 to 2030, R(y) is flat, corresponding to a constant computer speed s. So dy/dt is constant over this same time period: Total elapsed subjective time y grows at a linear rate, and as y grows, R(y) and computing speeds remain flat, until ten subjective years have passed. So the sidereal bottleneck lasts ten subjective years times the current sidereal/subjective conversion rate at 2020′s computing speeds.
In short, the whole scenario behaves exactly like what you would expect—the simple transform really does describe the naive scenario of “drop the economy into the timescale of its own computers”.
After subjective year 2030, things pick up again, maybe—there are ultimate physical limits on computation, but they’re pretty damned high, and we’ve got a ways to go until there. But maybe Moore’s Law is slowing down—going subexponential, and then as the physical limits are approached, logarithmic, and then simply giving out.
But whatever your beliefs about where Moore’s Law ultimately goes, you can just map out the way you would expect the research function R to work as a function of sidereal time in our own world, and then apply the transformation dy/dt = R(y) to get the progress of the uploaded civilization over sidereal time t. (Its progress over subjective time is simply given by R.)
If sensorimotor bandwidth is the critical limiting resource, then we instead care about R&D on fast sensors and fast manipulators. We want R_sm(y) instead R(y), where R_sm is the progress rate of sensors and manipulators, as a function of elapsed sensorimotor time. And then we write dy/dt = R_sm(y) and crank on the equation again to find out what the world looks like from a sidereal perspective.
We can verify that the Moore’s Researchers scenario is a strong positive feedback loop by performing the “drop $10,000” thought experiment. Say, we drop in chips from another six doublings down the road—letting the researchers run on those faster chips, while holding constant their state of technological knowledge.
Lo and behold, this drop has a rather large impact, much larger than the impact of giving faster computers to our own biological world’s Intel. Subjectively the impact may be unnoticeable—as a citizen, you just see the planets slow down again in the sky. But sidereal growth rates increase by a factor of 64.
So this is indeed deserving of the names, “strong positive feedback loop” and “sustained recursion”.
As disclaimed before, all this isn’t really going to happen. There would be effects like those Robin Hanson prefers to analyze, from being able to spawn new researchers as the cost of computing power decreased. You might be able to pay more to get researchers twice as fast. Above all, someone’s bound to try hacking the uploads for increased intelligence… and then those uploads will hack themselves even further… Not to mention that it’s not clear how this civilization cleanly dropped into computer time in the first place.
So no, this is not supposed to be a realistic vision of the future.
But, alongside our earlier parable of compound interest, it is supposed to be an illustration of how strong, sustained recursion has much more drastic effects on the shape of a growth curve, than a one-off case of one thing leading to another thing. Intel’s engineers running on computers is not like Intel’s engineers using computers.
- What I Think, If Not Why by 11 Dec 2008 17:41 UTC; 41 points) (
- Modelling Continuous Progress by 23 Jun 2020 18:06 UTC; 30 points) (
- True Sources of Disagreement by 8 Dec 2008 15:51 UTC; 12 points) (
- [SEQ RERUN] Sustained Strong Recursion by 12 Dec 2012 0:30 UTC; 4 points) (
- 5 Nov 2010 5:05 UTC; 2 points) 's comment on The Curve of Capability by (
- 23 May 2012 20:54 UTC; 1 point) 's comment on Thoughts on the Singularity Institute (SI) by (
- 11 Dec 2008 23:22 UTC; 0 points) 's comment on Disjunctions, Antipredictions, Etc. by (
“If anyone from Intel reads this, and wishes to explain to me how it would be unbelievably difficult to do their jobs using computers from ten years earlier, so that Moore’s Law would slow to a crawl—then I stand ready to be corrected. But relative to my present state of partial knowledge, I would say that this does not look like a strong feedback loop, compared to what happens to a compound interest investor when we bound their coupon income at 1998 levels for a while.”
This is simple to disprove whether being part of intel or not. The issue is that since current processors with multiple cores and millions-now billions of transistors are getting so complex that the actual design has to be done on computer. What is more before fabrication the design needs to be simulate to check for logic errors and to ensure good performance. It would be impossible to simulate a Tera-scale research chip on 1998 hardware. The issue is that simulating a computers design require a lot of computational power. The advances made in going from 65nm to 45nm now moving to 32nm were enabled by computers that could better simulate the designs without todays computers it would be hard to design the fabrication systems or run the fabrication system for the future processors. Since you admit partial knowledge I won’t bore you with the details of all this, suffices to say that your claim as state is incorrect.
I would however like to point out a misconception about Moores law, the law never says speed increases merely the number of transistors double every 18 months. There are a lot of facts apart from the number of transistors that play into computer speed. While more transistors are useful one has to match them with an architecture to take advantage of them otherwise you would not get the speed increase necessarily.
You can define “recursive” as accelerating growth, in which case it remains an open question whether any particular scenario, such as sped up folks researching how to speed up, is in fact recursive. Or you can, as I had thought you did, define “recursive” as a situation of loop of growth factors each encouraging the next one in the loop, in which case it is an open question if that results in accelerating growth. I was pointing out before that there exist loops of encouraging growth factors that do not result in accelerating growth. If you choose the other definition strategy, I’ll note that your model is extremely stark, and leaves out the usual items in even the simplest standard growth models.
GenericThinker, please stop posing as an authority on things you know very little about (e.g. the halting problem). If you don’t actually work at Intel or another chip fab, I’m not particularly interested in your overestimates of how much you know about the field.
Simulating a hundred-million transistor chip design, using a smaller slower chip with a few gigabytes of RAM, or a clustered computer, would certainly be possible; and if stuck with 1998 hardware that’s exactly what Intel would do, and I doubt it would slow their rate of technological progress by very much. They’d spend more on computers but I’d expect it to still be an insignificant fraction of corporate income. This is the obvious; if anyone actually works at Intel, they can describe how computationally intensive their work actually is, and whether using clustered chips from 1998 would be infeasible. Neither proofs nor chip simulations would likely be a problem, but if they’re simulating the physics of potential new chip technologies, that might be.
GT: Sources?
Robin, like I say, most AIs won’t hockey-stick, and when you fold a function in on itself this way, it can bottleneck for a billion years if its current output is flat or bounded. That’s why self-optimizing compilers don’t go FOOM.
“Recursion” is not accelerating growth. It is not a loop of growth factors. “Adding a recursion” describes situations where you might naively be tempted to take an existing function
y = F(t)
and rewrite it as
dy/dt = F(y)
Does that make it any clearer?
Eliezer, if “adding a recursion” means adding one more power to the derivative in the growth equation, then it is an open question what sorts of AIs would do that. And then it isn’t clear why you would say Englebart was “not recursive enough”, since this is a discrete definition without some parameter you can have not enough of.
I like to think of life as being a Pe^rt equation. P = you and your skills r = your investment ability/luck t = invest early, take advantage of tax laws
Robin, how is the transition
y = e^t → dy/dt = e^t
=>
dy/dt = e^y → y = -ln(C—t) → dy/dt = 1/(C—t)
“adding one more power to the derivative in the growth equation”?
I’m not sure what that phrase you used means, exactly, but I wonder if you may be mis-visualizing the general effect of what I call “recursion”.
Or what about y = t^2 ⇒ dy/dt = y^2, etc. Or y = log t ⇒ dy/dt = log y, etc.
Like I said, this doesn’t necessarily hockey-stick; if you get sublinear returns the recursified version will be slower than the original.
Engelbart was “not recursive enough” in the sense that he didn’t have a strong, sustained recursion; his tech improvements did not yield an increase in engineering velocity which was sufficient to produce tech improvements that would further improve his engineering velocity. He wasn’t running on his own chips. Like Eurisko, he used his scientific prowess to buy some bonds (computer tech) that paid a relatively low coupon on further scientific prowess, and the interest payments didn’t let him buy all that many more bonds.
“The issue is that simulating a computers design require a lot of computational power. The advances made in going from 65nm to 45nm now moving to 32nm were enabled by computers that could better simulate the designs without todays computers it would be hard to design the fabrication systems or run the fabrication system for the future processors.”
I believe (strongly) that the bottleneck is figuring out how to make 45nm and 32nm circuits work reliably. If you learn how to do 32nm, you can probably get speedup just by re-using the same design you used at 45nm.
OK, well, I didn’t actually say the quoted bit next to my name—and in the referenced comment I was talking about something a bit different—but I’ll happily adopt the mantle of someone claiming that Moore’s law is a manifestation of an iterative self-improvement process where the developments in each generation go on to cumulatively accelerate the rate of progress that leads to the next.
I wouldn’t expect so either—that’s the wrong sum.
Again that doesn’t tell us much of interest.
We seem to be agreed that the genetics of the human brain are for all intents and purposes in a state of stasis.
However, I see culture and machines educating and augmenting human intelligence (respectively) in a iterative fashion—leading to cumulative improvements over time—whereas you seem to barely acknowledge such effects.
How much is down to culture and machines—and how much is raw, native brain power? Well, the culture and machines make the difference between skyscrapers and spaceships and mud huts and horses—i.e. quite a bit of difference.
Give an agumented human an IQ test, and watch as she photographs the test with her cellphone, forwards the snapshots to an Indian IQ-test-solving sweat shop, and completes the test inside twenty minutes with a ridiculous score.
Sure Intel’s chips have only minor direct effects on productivity within Intel. However, they have effects all over the planet—they help the people who write software tools (that are then subsequently used by Intel). They help people on the internet—creating resources that are then accessed by Intel employees. They help other people designing other components of the computer systems Intel uses, monitors, screens, keyboards, etc. Intel CPUs are just one organ in the self-improving ecosystem that is human civilisation.
Essentially, this type of self-improvement cycle is the reason we observe steady progress over time. If we were not part of a self-improving system, we would not be observing a steady technological march forwards as time passes.
I did work at Intel, and two years of that was in the process engineering area (running the AI lab, perhaps ironically).
The short answer is that more computing power leads to more rapid progress. Probably the relationship is close to linear, and the multiplier is not small.
Two examples:
The speed of a chip is limited by critical paths. Finding these and verifying fixes depends on physically realistic simulations (though they make simplifying assumptions, which sometimes fail). Generally the better the simulation the tighter one can cut corners. The limit on simulation quality is typically computer power available (though it can also be understanding the physics well enough to cheat correctly).
Specifically with reference to Phil Goetz’s comment about scaling, the physics is not invariant under scaling (obviously) and the critical paths change in not entirely predictable ways. So again optimal “shrinks” are hostage to simulation performance.
The second example is more exotic. Shortly before I arrived in the process world, one of the guys who ended up working for me figured out how to watch the dynamics of a chip using a scanning electron microscope, since the charges in the chip modulate the electron beam. However integrating scanning control, imaging, chip control etc. was non-trivial and he wrote a lot of the code in Lisp. Using this tool he found the source of some serious process issues that no one had been able to diagnose.
This is a special case of the general pattern that progress in making the process better and the chips faster typically depends on modeling, analyzing, collecting data, etc. in new ways, and the limits are often how quickly humans can try out and evolve computer mediated tools. Scaling to larger data sets, using less efficient but more easily modified software, running simulations faster, etc. all pay big dividends.
Intel can’t in general substitute more processors in a cluster for faster processors, since writing software that gets good speedups on large numbers of processors is hard, and changing such software is much harder than changing single-processor software. The pool of people who can do this kind of development is also small and can’t easily be increased.
So I don’t really know what difference it makes, but I think Eliezer’s specific claim here is incorrect.
Jed, would you care to estimate the effect on Moore’s Law from Intel being able to compute using only 1998 chips, and having black-box 2018 chips, respectively? Bearing in mind that they would get a chance to adapt their processes, if they could (on both sides). Just a rough guess, to give me some idea of what you think the magnitude of the final effect would be.
Jed, your comment (the second example, specifically) reminds me of the story about how the structure of DNA was discovered. Apparently the ‘Eureka’ moment actually came after the researchers obtained better materials for modeling.
Moore’s Law says nothing about speed in the canonical form. You should probably define exactly what variant you are using.
Re: Intel and simulating microprocessors on further microprocessors.
“Simulating a hundred-million transistor chip design, using a smaller slower chip with a few gigabytes of RAM, or a clustered computer, would certainly be possible; and if stuck with 1998 hardware that’s exactly what Intel would do, and I doubt it would slow their rate of technological progress by very much.”
When you do microprocessor design there’s a subtle distinction between the simulation of the VHDL/Verilog-type information, which is basically boolean algebraic representations that are converted into the final circuits in terms of transistors etc., versus the functional testing which I know no better name of. This ‘functional testing’ is more like quality testing, where you wire up your 128-bit IO chip to testing equipment and push bits in and get stuff popped out to do formal physical verification. On 128-bit architectures this is 2^128 tests, you’re essentially traversing through the ridiculously huge state table. In practice this is infeasible to do, even in simulation (verification of all possible states), so VHDL/Verilog/RTL-type analysis is worth focusing on instead.
Bryan
I’ll try to estimate as requested, but substituting fixed computing power for “riding the curve” (as Intel does now) is a bit of an apples to fruit cocktail comparison, so I’m not sure how useful it is. A more direct comparison would be with always having a computing infrastructure from 10 years in the future or past.
Even with this amendment, the (necessary) changes to design, test, and debugging processes make this hard to answer...
I’ll think out loud a bit.
Here’s the first quick guess I can make that I’m moderately sure of: The length of time to go through a design cycle (including shrinks and transitions to new processes) would scale pretty closely with computing power, keeping the other constraints pretty much constant. (Same designers, same number of bugs acceptable, etc.) So if we assume the power follows Moore’s law (probably too simple as others have pointed out) cycles would run hundreds of times faster with computing power from 10 years in the future.
This more or less fits the reality, in that design cycles have stayed about the same length while chips have gotten hundreds of times more complex, and also much faster, both of which soak up computing power.
Probably more computing power would have also allowed faster process evolution (basically meaning smaller feature sizes) but I was never a process designer so I can’t really generate a firm opinion on that. A lot of physical experimentation is required and much of that wouldn’t go faster. So I’m going to assume very conservatively that the increased or decreased computing power would have no effect on process development.
The number of transistors on a chip is limited by process considerations, so adding computing power doesn’t directly enable more complex chips. Leaving the number of devices the same and just cycling the design of chips with more or less the same architecture hundreds of times faster doesn’t make much economic sense. Maybe instead Intel would create hundreds of times as many chip designs, but that implies a completely different corporate strategy so I won’t pursue that.
In this scenario, experimentation via computing gets hundreds of times “cheaper” than in our world, so it would get used much more heavily. Given these cheap experiments, I’d guess Intel would have adopted much more radical designs.
Examples of more radical approaches would be self-clocked chips, much more internal parallelism (right now only about 1⁄10 of the devices change state on any clock), chips that directly use more of the quantum properties of the material, chips that work with values other than 0 and 1, direct use of probabilistic computing, etc. In other words, designers would have pushed much further out into the micro-architectural design space, to squeeze more function out of the devices. Some of this (e.g. probabilistic or quantum-enhanced computing) could propagate up to the instruction set level.
(This kind of weird design is exactly what we get when evolutionary search is applied directly to a gate array, which roughly approximates the situation Intel would be in.)
Conversely, if Intel had hundreds of times less computing power, they’d have to be extremely conservative. Designs would have to stay further from any possible timing bugs, new designs would appear much more slowly, they’d probably make the transition to multiple cores much sooner because scaling processor designs to large numbers of transistors would be intractable, there’s be less fine grained internal parallelism, etc.
If we assumed that progress in process design was also more or less proportional to computing power available, then in effect we’d just be changing the exponent on the curve; to a first approximation we could assume no qualitative changes in design. However as I say this is a very big “if”.
Now however we have to contend with an interesting feedback issue. Suppose we start importing computing from ten years in the future in the mid-1980s. If it speeds everything up proportionally, the curve gets a lot steeper, because that future is getting faster faster than ours. Conversely if Intel had to run on ten year old technology the curve would be a lot flatter.
On the other hand if there is skew between different aspects of the development process (as above with chip design vs. process design) we could go somewhere else entirely. For example if Intel develops some way to use quantum effects in 2000 due to faster simulations from 1985 on, and then that gets imported (in a black box) back to 1990, things could get pretty crazy.
I think that’s all for now. Maybe I’ll have more later. Further questions welcome.
“GenericThinker, please stop posing as an authority on things you know very little about (e.g. the halting problem). If you don’t actually work at Intel or another chip fab, I’m not particularly interested in your overestimates of how much you know about the field.”
How precisely do you know I have never worked at intel? You have admitted you don’t know this issue so how would you have a clue whether I am right (cite your sources to prove me wrong)? In fact I am correct the ability to simulate in real-time is directly related to the amount of computational power available which greatly effects the level of complexity you can design into your chip. Look at the performance achieved in 1998 which was around 1 TFLOP if I recall correctly. The tera-scale research chip I spoke of which achieves the same performance with 1998 hardware it would be extremely difficult to simulate the tera-scale design. This is because emulation always requires more power then the actual design The inability to accurately simulate without having to pay a huge premium would make the design of future chips extremely difficult it would limit the design space. Take a different example the SR-71 part of the reason the SR-71 ended up looking as it did was because at the time engineers could only simulate simple shapes for super sonic flight. The same applies in processor design if you cannot simulate the design without a Blue-Gene super computer it is very hard to make improvements that are economical. Further it is extremely hard to prove your design correct which is a huge part of processor design. Obviously this is not the only issue but it is the point you made which is still false whatever you may believe being totally irrelevant to that point.
Actually intel uses things like FPGAs to simulate future processor designs since FPGAs have programmable logic.
On a final note you are a grade school dropout don’t talk to me about knowledge or pretense of knowledge. The only fraud here is you pretending to be an AI researcher (what a joke). You may feel free to critique me when you have published a technical paper proving mastery of mathematics beyond basic statistics and calculus and when you have patents to your name. Until that point you should be careful making such claims since you haven’t a leg to stand on.
“GenericThinker, please stop posing as an authority on things you know very little about (e.g. the halting problem). If you don’t actually work at Intel or another chip fab, I’m not particularly interested in your overestimates of how much you know about the field.”
As to the point of the halting problem, my point is correct the question of whether a given AI program halts may not be particularly interesting but my response was directed at the post above mine which I took to be implying that since and AGI is not an arbitrary program therefore the halting problem does not apply. If I miss understood the persons post fine I retract my comment. If that was what was meant then I am correct. Since all the halting problem does is ask the question given some program and some input does the program halt or go on infinitely? That is computability 101, it is also related to Godel’s Incompleteness theorem.
Nice thread.
Seriously, I guess Eliezer really needs this kind of reality check wakeup, before his whole idea of “FOOM” and “recursion” etc… turns into complete cargo cult science.
While I think the basic premise (strong AI friendliness) is quite concern, many of his recent posts sound like he had read too much science fiction and watched Terminator movie too many times.
There are some very basic issues with the whole recursion and singleton ideas… GenericThinker is right, ‘halting problem’ is very relevant there, in fact it proves that the whole “recursion foom in 48 hours” is completely bogus.
As for ‘singleton’, if nothing else (and there is a lot), speed of light is limiting factor. Therefore, to meaningfully react to local information, you need independent intelligent local agent. No matter what you do, independent intelligent local agent will always diverge from singleton’s global policy. End of story, forget about singletons. Strong AI will be small, fast, and there will be a lot of units.
So, while the basic premise, the concern about strong AI safety, remains, I think we should consider alternative scenario: AI grows relatively slowly (but follows the pattern of current ongoing foom), there is no singleton.
GenericThinker is simply extremely confused—as the comments about the halting problem make abundantly clear. I would comment on the idea of singletons being ruled out by the speed of light—but I can’t think of anything polite to say.
“but I can’t think of anything polite to say.”
Then say something unpolite. It is quite possible there is something I have missed about the concept of singleton. I am taking the definition from Nick Bostrom’s page.
Imagine singleton “taking over” the Mars. Every time intelligent decision is to be made, information has to round-trip with Earth − 6 minutes minimum.
In this case, any less intelligent system, but faster and local, is easily able to prevent such singleton’s expansion.
IMO, the same things happens everywhere, just times are much less than 6 minutes.
To sum it up, light speed limit says two things: “smaller is faster” and “local is more capable of reacting”.
A powerful world government that took control of human reproduction could effectively direct human evolution, reducing natural selection between agents to a subsidiary role. 6 minutes is an irrelevant delay in the context of this type of system.
Thanks, Jed. I had no idea they depended so heavily on actual physics simulations of their chips.
One of the key steps in this argument, it seems to me, is that chip engineering of the sort Intel does, relies on the fastest serial speeds available, because parallelizing is programmatically difficult. Right now, unless I’ve missed something, Intel is trying to transition to a multi-core strategy for following Moore’s Law and the serial speeds have flatlined. Would you be willing to predict a slowdown in Moore’s Law for transistors per square inch, or for the number of cores, now or in another 5-10 years, on the basis that Intel will no longer be getting the serial speed increases they need in order to keep up with Moore’s Law?
You are right about the smaller is faster and local being more capable of reacting. But Eliezer’s arguments are predicated on there being a type of AI that can change itself without deviation from a purpose. So an AI that splits itself into two may deviate in capability, but should share the same purpose.
Whether such an AI is possible or would be effective in the world is another matter.
“So an AI that splits itself into two may deviate in capability, but should share the same purpose.”
I cannot see how two initially identical strong AIs sharing the purpose would not start to diverge over time. Strong AI is by definition adaptable (means learning), different set of local conditions (= inputs) must lead to splitting AIs “personality”.
In other words, splitting itself has to end with two “itselfs”, as long as both are “strong”.
“because parallelizing is programmatically difficult”
Minor note: “Parallelization is programmatically difficult” is in fact another example of recursion.
The real reason why programming focused on serial execution was the fact that the most hardware was serial. There is not much point learning mysteries of multithreaded development if chances that your SW will run on multicore CPU is close to zero.
Now when multicore CPUs are de facto standard, parallel programming is no longer considered prohibitively difficult, it is just another thing you have to learn. There are new tools, new languages etc..
SW always lags behind HW. Intel had 32-bit CPU since 1986, it took 10 years before 32-bit PC software became mainstream...
Good post, thanks for making it. Besides the issue of whether Intel gets some of the recursive benefits, there is also the question of how FOOMable Intel would be if its engineers ran on its own hardware. Since Intel is embedded in the global economy and chip fabs are monstrously expensive undertakings, speeding up certain design issues would only go so far. I suppose the answer is that Intel will shortly invent molelcular nanotechnology but it’s not really clear to what extent Drexler’s vision or a completely flexible variant is even possible.
Still, your point here was to illustrate mathematically the way “recursion” of the type you are talking about increases growth and you did a good job of that.
In the post Eliezer and comment discussion with me tries to offer a math definition of “recursive” but in this discussion about Intel he seems to revert to the definition I thought he was using all along, about whether growing X helps Y grow better which helps X grow better. I don’t see any differential equations in the Intel discussion.
Does it help if I say that “recursion” is not something which is true or false of a given system, but rather, something by which one version of a system differs from another?
The question is not “Is Intel recursive?” but rather, “Which of these two systems is the case? Does intervening on Intel to provide them with much less or much more computing power, tremendously slow or accelerate their progress? Or would it have only small fractional effects?”
In the former case, the research going into Moore’s Law is being kept rigidly on track by the computers output by Moore’s Law, and this would make it plausible that the exponential form of Moore’s Law was due primarily to this effect.
In the latter case, computing power is only loosely coupled to Intel’s research activities, and we have to search for other explanations for Moore’s Law, such as that the market’s sensitivity to computing power is logarithmic and so Intel scales its resources as high as necessary to achieve a certain multiplicative improvement, but no higher than that.
@Pearson: There’s a huge variety of Moore’s Laws, for disk space, for memory bandwidth, etc. etc., and I am simply using “Moore’s Law” to range over the whole exponential bucket.
luzr: Learning (or change) doesn’t mean arbitrariness, adaptivity can and should be as lawful as math. Only few responses to changing context are the right ones, apparent “flexibility” of adaptive behavior is the ability to precisely select right responses to any of the huge number of possible circumstances, not uncontrollable variation that leads to all kinds of unpredictable consequences.
From “The Psychological Foundations of Culture” by Tooby & Cosmides:
Robin, perhaps you could elaborate a little bit… assuming I understand what’s going on (I’m always hopeful), the “recursion” here is the introduction of output being a function of “subjective time” (y) instead of “clock time” (t), and, further, y—it is postulated—is related to t by:
dy/dt = e^y
because the ratio of y to t is directly related to output (which as noted above is said to be an exponential function of y due Moore’s law-type arguments).
That’s seriously “strange”. It is very different than a non-”recursive” analysis where, say, dy/dt = e^t. I could imagine you objecting to the veracity of this model, or claiming that this type of recursive loop is standard practice. Which of these are you saying, or are you saying something different entirely?
Vladimir Nesov:
“Only few responses to changing context are the right ones”
As long as they are “few” instead of “one”—and these “few” still means basically infinite subset of larger infinite set, differences will accumulate over time, leading to different personality.
Note that such personality might not diverge from the basic goal. But it will inevitable start to ‘disagree’ about choosing one of those “few” good choices because of different learning experience.
This, BTW, is the reason why despite what Tooby & Cosmides says, we have highly diverse ecosystem with very large number of species.
“GenericThinker is simply extremely confused—as the comments about the halting problem make abundantly clear. I would comment on the idea of singletons being ruled out by the speed of light—but I can’t think of anything polite to say.”
Well I take this as a complement coming from people who post here. If an ignorant person thinks your wrong chances are your on the right track. If you want to correct my idea feel free but if this is the best you’ve got parroting Eliezer comment then I have nothing to fear. Eliezer comments about my post already having proved false once since my comments on the importance of computational power effecting future chip designs were right on the money, but of course they were since I have actually designed computer chips before big surprise some who posts here actually posting based on knowledge I would encourage you Tim to look into that concept and try it out.
Regarding serial vs. parallel:
The effect on progress is indirect and as a result hard to figure out with confidence.
We have gradually learned how to get nearly linear speedups from large numbers of cores. We can now manage linear speedups over dozens of cores for fairly structured computations, and linear speedup over hundreds of cores are possible in many cases. This is well beyond the near future number of cores per chip. For the purposes of this analysis I think we can assume that Intel can get linear speedups from increasing processors per chip, say for the next ten years.
But there are other issues.
More complicated / difficult programming models may not slow down a given program, but they make changing programs more difficult.
Over time our ability to create malleable highly parallel programs has improved. In special cases a serial program can be “automatically” parallelized (compilation with hints) but mostly parallelization still requires explicit design. But the abstractions have gotten much easier to use and revise.
(In my earlier analysis I was assuming, I think correctly, that this improvement was a function of human thought without much computational assist. The relevant experiments aren’t computationally expensive. Intel has been building massively parallel systems since the mid-80s but it didn’t produce most major improvements. The parallel programming ideas accreted slowly from a very broad community.)
So I guess I’d say that with the current software technology and trend, Intel can probably maintain most of its computational curve-riding. Certainly simulations with a known software architecture can be parallelized quite effectively, and can be maintained as requirements evolve.
The limitation will be on changes that violate the current pervasive assumptions of the simulation design. I don’t know what those are these days, and if I did I probably couldn’t say. However they reflect properties that are common to all the “processor like” chips Intel designs, over all the processes it can easily imagine.
Changes to software that involve revising pervasive assumptions have always been difficult, of course. Parallelization just increases the difficulty by some significant constant factor. Not really constant, though, it has been slowly decreasing over time as noted above.
So the types of improvement that will slow down are the ones that involve major new ways to simulate chips, or major new design approaches that don’t fit Intel’s current assumptions about chip micro-architecture or processes.
While these could be significant, unfortunately I can’t predict how or when. I can’t even come up with a list of examples where such improvement were made. They are pretty infrequent and hard to categorize.
I hope this helps.
Jed, it does. Thanks for your comments!
If multiple local agents have a common goal system and share information (keeping in mind Aumann Agreement, and that instrumental values are information like any other and won’t tend to get promoted to terminal values in clean architectures), why can’t you consider the set of them as a single decision-making agent on long enough timescales?
The only problem I can see is that one local agent might choose a policy that would work only if another agent didn’t choose a particular policy (which it ends up doing). However, I can’t imagine that this wouldn’t be noticed and factored in in advanced.
I’ve been wondering how much of Moore’s law was due to increasing the amount of human resources being devoted to the problem. The semiconductor industry has grown tremendously over the past fifty years, with more and more researchers all over the world being drawn into the problem. Jed, do you have any intuition about how much this has contributed?
Eliezer, I don’t know what is your implicit referent to divide “tremendous” from “fractional” influence of growth of X on growth of Y. Perhaps you can define that clearly in a very simple model, but I don’t see how to generalize that to more realistic models.
Derek, I’m not sure your proposed definition makes sense outside of a one-dimensional model.
design cycles have stayed about the same length while chips have gotten hundreds of times more complex, and also much faster, both of which soak up computing power.
So...if you use chip x to simulate its successor chip y, and chip y to simulate its successor, chip z, the complexity and speed progressions both scale at exactly the right ratio to keep simulation times roughly constant? Interesting stuff.
Sounds as though the introduction of black-box 2015 chips would lead to a small bump and level off quite quickly, short of a few huge insights, which Jed seems to suggest are quite rare. Eliezer, is this another veiled suggestion that hardware is not what we need to be working on if we’re looking to FOOM?
Changes to software that involve revising pervasive assumptions have always been difficult, of course.
Welcome to Overcoming Bias.
Nick Tarleton:
“If multiple local agents have a common goal system”
I guess, you can largely apply that to current human civilization as well...
I was thinking about the “multimind singleton” a little bit more and came to the conclusion the fundamental question is perhaps:
Is strong AI supposed to have the free will?
luzr: The AI can have a provable and predictable goal system and still have free will. Pretty much the same way humans have free will.
Lightwave:
But goal system of humans, both as individuals, as social group and as civilization, tends to change over the time, and my undestanding is that it is the result of free will and learning experience. It is in fact basis of adaptibility.
I expect the same must be true for real strong AI. And, in that case, singleton simply does not make too much sense anymore, IMHO (when smaller is faster and local is more capable of reacting..)
Eliezer’s hard takeoff scenario for “AI go FOOM” is if the AI takes off in a few hours or weeks. Let’s say that the AI has to increase in intelligence by a factor of 10 for it to count as “FOOM”. If there is no increase in resources, then this means that intelligence has to double anywhere from once an hour to once every few days just through recursion or cascades. If intelligence doubles once a day, then this corresponds to an annual interest rate of about 10 to the 100th power. This is quite a large number. It seems more likely that “AI goes FOOM” will be the result of resource overhang than recursion or cascades.
Note that a nuclear chain reaction is not an example of recursion. Once an atom is split, it can’t be split again. A nuclear chain reaction is more like a forest fire when the tinder is very dry. It is probably better explained as a resource overhang than recursion.
Jed, serial speed limiting Intel makes sense, and is about the only theory I’ve heard that does, but now that we move to parallel machines, it seems to me that this theory predicts either that Moore’s law falls apart, or that parallel software makes it possible to throw lots of money at the problem and it speeds up.
You don’t have to choose one or the other, but it seems to me that you have to raise your error bars. There’s an implausibly small window for the quality of parallel software to rise just fast enough to make Moore’s law continue, if this is the key bottleneck.
Dawkins understood the self-improving nature of modern computer technology in 1991 - see:
“The Genesis of Purpose”—http://www.youtube.com/watch?v=qm-0Z0ceezQ
He called it “self-feeding co-evolution”.
His description of the concept starts 31 minutes in.
His description of how computers help to design the next generation of computers starts 36 minutes in.
He also uses the term “take off”.
If you have much of an understanding of this sort of thing, a lot of the material on this site about rapid transitions and the associated risks can be seen as being a confused muddle—based on a misunderstanding of the history of the field. Dawkins is correct. We have been seeing self-improvement for millennia now.
I’ve been explaining this point here for a while now. Nobody seems to have a coherent critique. I remain rather puzzled about why people persist in taking this kind of material seriously.
Dawkins understood the self-improving nature of modern computer technology in 1991 - see his lecture:
The Genesis of Purpose.
He called it “self-feeding co-evolution”. His description of the concept starts 31 minutes in.
His description of how computers help to design the next generation of computers starts 36 minutes in. He also uses the term “take off”.
If you have much of an understanding of this sort of thing, a lot of the material on this site about rapid transitions and the associated risks appears to be a confused muddle—based on misunderstandings of the history of the field (my more charitable interpretation). Dawkins is correct about the issue. We have actually been seeing self-improvement for millennia now.
I’ve been explaining this point here for a while now. IMO, nobody seems to have a coherent critique. Some people persist in taking this kind of material seriously—but the reason seems to be that they haven’t thought things through properly.