I want to try to break down the chain of events in this region of the graph. Just to start the ball rolling:
So, one question is the degree to which additional computer hardware has to be built in order to support additional levels of recursive self-improvement.
If the rate of takeoff is constrained by a need for the AI to participate in the manufacture of new hardware with assistance from people, then (correct me if I am missing something) we have a slow or moderate takeoff.
If there is already a “hardware overhang” when key algorithms are created, then perhaps a great deal of recursive self-improvement can occur rapidly within existing computer systems.
Manufacturing computer components is quite involved. A hardware/software system which can independently manufacture similar components to the ones it runs on would have to already have the abilities of dozens of different human specialists. It would already be a superintelligence.
If there was no hardware overhang initially, however, strong incentives would exist for people, along with whatever software tools they have, to optimize the system so that it runs faster, and on less hardware.
If development follows the pattern of previous AI systems, chances are they will succeed. Additional efficiencies can always be wrung out of prototype software systems.
Therefore, if there is no hardware overhang initially, one probably will materialize fairly shortly through software optimization which includes human engineers in the process.
In the past, such processes have delivered x1000 increases.
So, when the AI is turned on, there could be a hardware overhang of 1, 10, or 100 right within the computer it is on.
My belief is that the development team, if there is just one, has a good chance to succeed at preventing this initial AI from finding its way to the internet (since I gather this is controversial, I should work to substantiate that later), and attaching more processing power to it will be under their control.
If they do decide to add hardware to the first AI, either via increasing the server farm size, cloud or internet release, I think I can make a case that the team will be able to justify the investment to obtain a hardware speedup of x1000 fairly quickly if the cost of the first set of hardware is less than $1 Billion, or has a chance of expropriating the resources.
The team can also decide to try to optimize how well it runs on existing hardware, gaining x 1000. The kind of optimization process might take maybe two years today.
So as a point estimate I am thinking that the developers have the option of increasing hardware overhang by x1,000,000 in two years or less. Can work on a better estimate over time.
That x1,000,000 improvement can be used either for speed AI or to run additional processes.
So, when the AI is turned on, there could be a hardware overhang of 1, 10, or 100 right within the computer it is on.
I didn’t follow where this came from.
Also, when you say ‘hardware overhang’, do you mean the speedup available by buying more hardware at some fixed rate? Or could you elaborate—it seems a bit different from the usage I’m familiar with.
Here is what I mean by “hardware overhang.” It’s different from what you discussed.
Let’s suppose that YouTube just barely runs in a satisfactory way on a computer with an 80486 processor. If we move up to a processor with 10X the speed, or we move to a computer with ten 80486 processors, for this YouTube application we now have a “hardware overhang” of nine. We can run the YouTube application ten times and it still performs OK in each of these ten runs.
So, when we turn on an AI system on a computer, let’s say a neuromorphic NLP system, we might have enough processing power to run several copies of it right on that computer.
Yes, a firmer definition of “satisfactory” is necessary for this concept to be used in a study.
Yes, this basic approach assumes that the AI processes are acting fully independently and in parallel, rather than interacting. We do not have to be satisfied with either of those later.
Anyway, what I am saying here is the following:
Let’s say that in 2030 a neuromorphic AI system is running on standard cloud hardware in a satisfactory according to a specific set of benchmarks, and that the hardware cost is $100 Million.
If ten copies of the AI can run on that hardware, and still meet the defined benchmarks, then there is a hardware overhang of nine on that computer.
If, for example, a large government could martial at least $100 Billion at that time to invest in renting or quickly building more existing hardware on which to run this AI, then the hardware overhang gets another x1000.
What I am further saying is that at the moment this AI is created, it may be coded in an inefficient way that is subject to software optimization by human engineers, like the famous IBM AI systems have been. I estimate that software optimization frequently gives a x1000 improvement.
That is the (albeit rough) chain of reasoning that leads me to think that a x1,000,000 hardware overhang will develop very quickly for a powerful AI system, even if the AI does not get in the manufacturing business itself quite yet.
I am trying to provide a dollop of analysis for understanding take-off speed, and I am saying that AI systems can get x1,000,000 the power shortly after they are invented, even if they DO NOT recursively self-improve.
At the same time this additional x1,000,000 or so hardware overhang is developing (there is a good chance that a significant hardware overhang existed before the AI was turned on in the first place), the system is in the process of being interfaced and integrated by the development team with an array of other databases and abilities.
Some of these databases and abilities were available to the system from the start. The development team has to be extremely cautious about what they interface with which copies of the AI-these decisions are probably more important to the rate of takeoff than creating the system in the first place.
Language Acquisition
Because of the centrality of language to human cognition, the relationship between language acquisition and takeoff speed is worth analyzing as a separate question from takeoff speed for an abstract notion of general intelligence.
Human-level language acquisition capability seems to be a necessary condition to develop human-level AGI, but I do not believe it is a necessary condition to develop an intelligence either capable of manufacturing, or even of commerce, or hiring people and giving them a set of instructions. (For this reason, among, others, thinking about surpassing human-level does not seem to be the right question to ask if we are debating policy.)
Here are three scenarios for advanced AI language acquisition:
1) The system, like a child, is initially capable of language acquisition, but in a somewhat different way. (Note that for children, language acquisition and object recognition skills develop at about the same time. For this reason, I believe that these skills are intertwined, although they have not been that intertwined in the development of AI systems so far.)
2) The system begins with parts of a single language somewhat hardwired.
3) The system performs other functions than language acquisition, and any language capability has to be interfaced in a second phase of development.
If 1) comes about, and the system has language acquisition capability initially, then it will be able to acquire all human languages it is introduced to very quickly. However, the system may still have conceptual deficits it is unable to overcome on its own. In the movies the one they like is a deficit of emotion understanding, but there could be others, for instance, a system that acquired language may not be able to do design. I happen to think that emotion understanding may prove more tractable than in the movies. So much of human language is centered around feelings, therefore a significant level of emotion understanding (which differs from emotion recognition) is a requirement for some important portions of language acquisition. Some amount of emotion recognition and even emotion forecasting is required for successful interactions with people.
In case 2), if the system is required to translate from its first language, it will also be capable of communicating with people in these other languages within a very short time, because word and phrase lookup tables can be placed right in working memory. However, it may have lower comprehension and its phrasing might sound awkward.
In either case 1) or 2), roughly as soon as the system develops some facility at language, it will be capable of superhumanly communicating with millions of people at a time, and possibly with everyone. Why? Because computers have been capable of personalized communication with millions of people for many years already.
In case 3), the system was designed for other purposes but can be interfaced in a more hard-wired fashion with whatever less-than-complete forms of linguistic processing are available at the time. These less-than-complete abilities are already considerable today, and they will become even more considerable under any scenario other than disinclination to advance and government regulation.
A powerful sub-set of abilities from a list more like this:
Planning, design, transportation, chemistry, physics, engineering, commerce, sensing, object recognition, object manipulation and knowledge base utilization.
Might be sufficient to perform computer electronics manufacturing.
Little or no intelligence is required for a system to manufacture using living things.
Why would much of this optimization for running on existing hardware etc not have been done prior to reaching human-level? Are we assuming there was a sharp jump, such that the previous system was much worse?
The x1000 gained by adding more hardware just comes if the developers have a government or a big company as their partner and they buy more hardware or rent on the cloud. The botnet is another way to gain this x1000.
Now to my claim of possible x1000 software speed-up for AI systems: The amount to be gained from software optimization in any specific project is indeed a random variable, and conditions when an AGI system is invented make a difference.
Where has such speed-up come from in problems I have been involved with or heard about?
When developers are first creating a piece of software, typically they are rewarded by being focused first on just getting some result which works. The first version of the code may come out slower because they take the easy way out and just use whatever code is available in some library, for instance. Once they have proved the concept, then they optimize for performance.
Developers can adjust the resource level they have to invest to try to improve the speed at which software runs and how fast they gain these speed-ups. If they invest little or no resources in optimization, the speed will not increase.
If the code was written in a high-level language, then it is relying on the compiler to perform some optimization. Today’s compilers do not utilize any knowledge about the data sets they are going to operate on, and also optimizing for a system with a variety of different computers is difficult.
True enough, the technology of optimizing compilers almost certainly will improve between now and when this system is created. Maybe by that time when we are programming in a high-level language, the interpreter, compiler or other code transformer will be able to automagically correct some of the sub-optimal code we write.
When I was working in database design, de-normalization, a way of combining tables, would change join operations from taking an impossibly long amount of time-I would just stop the query-to being possible in a few seconds.
When working on a problem involving matrices and I was able to identify a sparsity condition and use a specialized algorithm, then I got this kind of speed-up as well.
After creating code to process a file, writing a macro that runs the code on 1,000 files.
This kind of thing just seems to happen a lot. I guess I make no guarantees.
During some phases of an AI system’s early improvement, sometimes we will have difficulty discerning whether the system is recursively self-improving or whether it is being improved by the developers.
When a child learns to count pennies, they get faster at after they have done it a few times (self-improvement). Then, after they learn to multiply, someone comes along and shows them that it’s even faster to create groups of five and count the number of groups, an optimization that comes from an external source.
Since this is a prediction, there is no shame in following your lead: Here are some possible reasons why non-recursive self-improvement might not gain x1000 in software speed-up:
Suppose the last step forward in the development of an AI system is to interface and integrate six different software systems that have each been improving steadily over many years using pre-digested data sources that have already been heavily pre-processed. Maybe most of human knowledge has been captured in some steadily improving fast-access data structures years before AGI actually comes about. To create AGI, the developers write some linking code add a few new things. In that case, the X1000 software speed-up may not be possible.
We also have little idea how software optimization might improve algorithm speed thirty years from now, for example, on an array of 1,000,000 interconnected quantum computers and field programmable analog photonic processors with nanotech lenses that change shape, or something. In these environments, I guess the bottleneck could be setting up the run rather than actually making some enormous calculation.
One way to slow things down if we did not know what to do next with an AGI would be, immediately after the system began to work, to halt or limit investment in software optimization, halt buying more hardware, and halt hardware optimization.
If the initial system does not find a hardware overhang, it seems unclear to me that a 1000x less expensive system necessarily will. For any system which doesn’t have a hardware overhang, there is another 1000x less efficient that also doesn’t.
If there is already a “hardware overhang” when key algorithms are created, then perhaps a great deal of recursive self-improvement can occur rapidly within existing computer systems.
Do you mean that if a hardware overhang is large enough, the AI could scale up quickly to the crossover, and so engage in substantial recursive self-improvement? If the hardware overhang is not that large, I’m not sure how it would help with recursive self-improvement.
Algorithm development starts with an abstract idea. Then this concept has to be transformed into computational form for testing. The initial proof-of-concept algorithm is as close as possible to the abstract idea. This first key algorithm is the reference. All future improved algorihms versions have to be checked against this reference. High hardware overhang of this new created algorithm is very likely.
For complex hardware extensions it takes years until these new ressources (e.g. MMX, SSE, AVX, multi-cores, GPU) are fully exploited by software. Compilers have to be optimized first, assembler optimization of critical sections give further optimization potential. PS4 lead architect Mark Cerny said regarding the previous PS3 Cell processor: “People have now spent eight years learning the depths of that architecture, how to make games look beautiful.”
Assume two key algorithms inspired by biological brain analyis:
simulation of a neuron
pattern analysis structure consisting of simulated neurons
If an AI reaches superhuman capabilities in software engineering and testing it could improve its own core base. All aquired knowledge remains intact. If necessary the AI develops a migration tool to transfer stored information into the improved system. If configurable hardware is used this can be optimized as well. Hardware overhang will be even higher and FOOM more likely.
What projects can you think of that are closest to ‘crossover’?
I want to try to break down the chain of events in this region of the graph. Just to start the ball rolling:
So, one question is the degree to which additional computer hardware has to be built in order to support additional levels of recursive self-improvement.
If the rate of takeoff is constrained by a need for the AI to participate in the manufacture of new hardware with assistance from people, then (correct me if I am missing something) we have a slow or moderate takeoff.
If there is already a “hardware overhang” when key algorithms are created, then perhaps a great deal of recursive self-improvement can occur rapidly within existing computer systems.
Manufacturing computer components is quite involved. A hardware/software system which can independently manufacture similar components to the ones it runs on would have to already have the abilities of dozens of different human specialists. It would already be a superintelligence.
If there was no hardware overhang initially, however, strong incentives would exist for people, along with whatever software tools they have, to optimize the system so that it runs faster, and on less hardware.
If development follows the pattern of previous AI systems, chances are they will succeed. Additional efficiencies can always be wrung out of prototype software systems.
Therefore, if there is no hardware overhang initially, one probably will materialize fairly shortly through software optimization which includes human engineers in the process.
In the past, such processes have delivered x1000 increases.
So, when the AI is turned on, there could be a hardware overhang of 1, 10, or 100 right within the computer it is on.
My belief is that the development team, if there is just one, has a good chance to succeed at preventing this initial AI from finding its way to the internet (since I gather this is controversial, I should work to substantiate that later), and attaching more processing power to it will be under their control.
If they do decide to add hardware to the first AI, either via increasing the server farm size, cloud or internet release, I think I can make a case that the team will be able to justify the investment to obtain a hardware speedup of x1000 fairly quickly if the cost of the first set of hardware is less than $1 Billion, or has a chance of expropriating the resources.
The team can also decide to try to optimize how well it runs on existing hardware, gaining x 1000. The kind of optimization process might take maybe two years today.
So as a point estimate I am thinking that the developers have the option of increasing hardware overhang by x1,000,000 in two years or less. Can work on a better estimate over time.
That x1,000,000 improvement can be used either for speed AI or to run additional processes.
I didn’t follow where this came from.
Also, when you say ‘hardware overhang’, do you mean the speedup available by buying more hardware at some fixed rate? Or could you elaborate—it seems a bit different from the usage I’m familiar with.
Here is what I mean by “hardware overhang.” It’s different from what you discussed.
Let’s suppose that YouTube just barely runs in a satisfactory way on a computer with an 80486 processor. If we move up to a processor with 10X the speed, or we move to a computer with ten 80486 processors, for this YouTube application we now have a “hardware overhang” of nine. We can run the YouTube application ten times and it still performs OK in each of these ten runs.
So, when we turn on an AI system on a computer, let’s say a neuromorphic NLP system, we might have enough processing power to run several copies of it right on that computer.
Yes, a firmer definition of “satisfactory” is necessary for this concept to be used in a study.
Yes, this basic approach assumes that the AI processes are acting fully independently and in parallel, rather than interacting. We do not have to be satisfied with either of those later.
Anyway, what I am saying here is the following:
Let’s say that in 2030 a neuromorphic AI system is running on standard cloud hardware in a satisfactory according to a specific set of benchmarks, and that the hardware cost is $100 Million.
If ten copies of the AI can run on that hardware, and still meet the defined benchmarks, then there is a hardware overhang of nine on that computer.
If, for example, a large government could martial at least $100 Billion at that time to invest in renting or quickly building more existing hardware on which to run this AI, then the hardware overhang gets another x1000.
What I am further saying is that at the moment this AI is created, it may be coded in an inefficient way that is subject to software optimization by human engineers, like the famous IBM AI systems have been. I estimate that software optimization frequently gives a x1000 improvement.
That is the (albeit rough) chain of reasoning that leads me to think that a x1,000,000 hardware overhang will develop very quickly for a powerful AI system, even if the AI does not get in the manufacturing business itself quite yet.
I am trying to provide a dollop of analysis for understanding take-off speed, and I am saying that AI systems can get x1,000,000 the power shortly after they are invented, even if they DO NOT recursively self-improve.
At the same time this additional x1,000,000 or so hardware overhang is developing (there is a good chance that a significant hardware overhang existed before the AI was turned on in the first place), the system is in the process of being interfaced and integrated by the development team with an array of other databases and abilities.
Some of these databases and abilities were available to the system from the start. The development team has to be extremely cautious about what they interface with which copies of the AI-these decisions are probably more important to the rate of takeoff than creating the system in the first place.
Language Acquisition
Because of the centrality of language to human cognition, the relationship between language acquisition and takeoff speed is worth analyzing as a separate question from takeoff speed for an abstract notion of general intelligence.
Human-level language acquisition capability seems to be a necessary condition to develop human-level AGI, but I do not believe it is a necessary condition to develop an intelligence either capable of manufacturing, or even of commerce, or hiring people and giving them a set of instructions. (For this reason, among, others, thinking about surpassing human-level does not seem to be the right question to ask if we are debating policy.)
Here are three scenarios for advanced AI language acquisition:
1) The system, like a child, is initially capable of language acquisition, but in a somewhat different way. (Note that for children, language acquisition and object recognition skills develop at about the same time. For this reason, I believe that these skills are intertwined, although they have not been that intertwined in the development of AI systems so far.)
2) The system begins with parts of a single language somewhat hardwired.
3) The system performs other functions than language acquisition, and any language capability has to be interfaced in a second phase of development.
If 1) comes about, and the system has language acquisition capability initially, then it will be able to acquire all human languages it is introduced to very quickly. However, the system may still have conceptual deficits it is unable to overcome on its own. In the movies the one they like is a deficit of emotion understanding, but there could be others, for instance, a system that acquired language may not be able to do design. I happen to think that emotion understanding may prove more tractable than in the movies. So much of human language is centered around feelings, therefore a significant level of emotion understanding (which differs from emotion recognition) is a requirement for some important portions of language acquisition. Some amount of emotion recognition and even emotion forecasting is required for successful interactions with people.
In case 2), if the system is required to translate from its first language, it will also be capable of communicating with people in these other languages within a very short time, because word and phrase lookup tables can be placed right in working memory. However, it may have lower comprehension and its phrasing might sound awkward.
In either case 1) or 2), roughly as soon as the system develops some facility at language, it will be capable of superhumanly communicating with millions of people at a time, and possibly with everyone. Why? Because computers have been capable of personalized communication with millions of people for many years already.
In case 3), the system was designed for other purposes but can be interfaced in a more hard-wired fashion with whatever less-than-complete forms of linguistic processing are available at the time. These less-than-complete abilities are already considerable today, and they will become even more considerable under any scenario other than disinclination to advance and government regulation.
A powerful sub-set of abilities from a list more like this:
Planning, design, transportation, chemistry, physics, engineering, commerce, sensing, object recognition, object manipulation and knowledge base utilization.
Might be sufficient to perform computer electronics manufacturing.
Little or no intelligence is required for a system to manufacture using living things.
Why would much of this optimization for running on existing hardware etc not have been done prior to reaching human-level? Are we assuming there was a sharp jump, such that the previous system was much worse?
Also, where do you get these figures from?
The x1000 gained by adding more hardware just comes if the developers have a government or a big company as their partner and they buy more hardware or rent on the cloud. The botnet is another way to gain this x1000.
Now to my claim of possible x1000 software speed-up for AI systems: The amount to be gained from software optimization in any specific project is indeed a random variable, and conditions when an AGI system is invented make a difference.
Where has such speed-up come from in problems I have been involved with or heard about?
When developers are first creating a piece of software, typically they are rewarded by being focused first on just getting some result which works. The first version of the code may come out slower because they take the easy way out and just use whatever code is available in some library, for instance. Once they have proved the concept, then they optimize for performance.
Developers can adjust the resource level they have to invest to try to improve the speed at which software runs and how fast they gain these speed-ups. If they invest little or no resources in optimization, the speed will not increase.
If the code was written in a high-level language, then it is relying on the compiler to perform some optimization. Today’s compilers do not utilize any knowledge about the data sets they are going to operate on, and also optimizing for a system with a variety of different computers is difficult.
True enough, the technology of optimizing compilers almost certainly will improve between now and when this system is created. Maybe by that time when we are programming in a high-level language, the interpreter, compiler or other code transformer will be able to automagically correct some of the sub-optimal code we write.
When I was working in database design, de-normalization, a way of combining tables, would change join operations from taking an impossibly long amount of time-I would just stop the query-to being possible in a few seconds.
When working on a problem involving matrices and I was able to identify a sparsity condition and use a specialized algorithm, then I got this kind of speed-up as well.
After creating code to process a file, writing a macro that runs the code on 1,000 files.
This kind of thing just seems to happen a lot. I guess I make no guarantees.
During some phases of an AI system’s early improvement, sometimes we will have difficulty discerning whether the system is recursively self-improving or whether it is being improved by the developers.
When a child learns to count pennies, they get faster at after they have done it a few times (self-improvement). Then, after they learn to multiply, someone comes along and shows them that it’s even faster to create groups of five and count the number of groups, an optimization that comes from an external source.
Since this is a prediction, there is no shame in following your lead: Here are some possible reasons why non-recursive self-improvement might not gain x1000 in software speed-up:
Suppose the last step forward in the development of an AI system is to interface and integrate six different software systems that have each been improving steadily over many years using pre-digested data sources that have already been heavily pre-processed. Maybe most of human knowledge has been captured in some steadily improving fast-access data structures years before AGI actually comes about. To create AGI, the developers write some linking code add a few new things. In that case, the X1000 software speed-up may not be possible.
We also have little idea how software optimization might improve algorithm speed thirty years from now, for example, on an array of 1,000,000 interconnected quantum computers and field programmable analog photonic processors with nanotech lenses that change shape, or something. In these environments, I guess the bottleneck could be setting up the run rather than actually making some enormous calculation.
One way to slow things down if we did not know what to do next with an AGI would be, immediately after the system began to work, to halt or limit investment in software optimization, halt buying more hardware, and halt hardware optimization.
Instead, I prefer to plan ahead.
If the initial system does not find a hardware overhang, it seems unclear to me that a 1000x less expensive system necessarily will. For any system which doesn’t have a hardware overhang, there is another 1000x less efficient that also doesn’t.
Do you mean that if a hardware overhang is large enough, the AI could scale up quickly to the crossover, and so engage in substantial recursive self-improvement? If the hardware overhang is not that large, I’m not sure how it would help with recursive self-improvement.
Algorithm development starts with an abstract idea. Then this concept has to be transformed into computational form for testing. The initial proof-of-concept algorithm is as close as possible to the abstract idea. This first key algorithm is the reference. All future improved algorihms versions have to be checked against this reference. High hardware overhang of this new created algorithm is very likely.
For complex hardware extensions it takes years until these new ressources (e.g. MMX, SSE, AVX, multi-cores, GPU) are fully exploited by software. Compilers have to be optimized first, assembler optimization of critical sections give further optimization potential. PS4 lead architect Mark Cerny said regarding the previous PS3 Cell processor: “People have now spent eight years learning the depths of that architecture, how to make games look beautiful.”
Assume two key algorithms inspired by biological brain analyis:
simulation of a neuron
pattern analysis structure consisting of simulated neurons
If an AI reaches superhuman capabilities in software engineering and testing it could improve its own core base. All aquired knowledge remains intact. If necessary the AI develops a migration tool to transfer stored information into the improved system. If configurable hardware is used this can be optimized as well. Hardware overhang will be even higher and FOOM more likely.
How much do ordinary projects contribute to their own inputs?