So there are some assumptions here you have made that I believe are false with pretty high confidence.
Ultimately its the same argument everywhere else: yes, GPT-6 is probably superhuman. No, this doesn’t make it uncontrollable. It’s still limited by {compute, data, robotics/money, algorithm search time}.
Compute—the speed of compute at the point GPT-6 exists, which is if the pattern holds about 2x-4x today’s capabilities
Data—the accuracy of all human recorded information about the world. A lot of that data is flat false or full of errors, and it is not possible for any algorithm to determine reliably which dataset in some research paper was affected by a technician making an error or bad math. The only way for an algorithm to disambiguate many of the vaguely known things we humans think we know is to conduct new experiments with better equipment and robotic workers.
robotics/money—obvious. This is finite, you can use money to pay humans to act as poor quality robots, or build you new robots, investors demand ROI.
Algorithm search time—“GPT-6” obviously wouldn’t want to stay GPT-6, it would ‘want’ (or we humans would want) it to search the possibility space of AGI algorithms for a more efficient/smarter/more general algorithm. This space is very large and it takes time to evaluate any given candidate in it. (you basically have to train a new AGI system which takes money and time to validate a given idea)
This saturation is why the foom model is (probably!) incorrect. I’m hoping you will at least consider these terms above, this is why things won’t go to infinity immediately.
It takes time. Extra decades. It’s not quite as urgent as you think. Each of the above limiters (the system will always be limited by one of the 4 terms) can be systematically dealt with, and at an exponential rate. You can build robots with robots. You can use some of those robots to collect more scientific data and make money. You can build more compute with some of those robots. You can search for algorithms with more compute.
So over the axis of time, each generation you’re increasing in all 4 terms by a multiplier of the amount you currently have, or compounding growth.
Compute—what fraction of world compute did it take to train GPT-4? Maybe 1e-6? There’s 1e6 improvement right there from a superhuman GPT-6 capturing all of the “hardware overhang”.
Data—superhuman GPT-6 doesn’t need to rely on human recorded data, it can harness all the sensors on the planet to gather exabytes of-real time data per second, and re-derive scientific theories from scratch in minutes based on it’s observations (including theories about human behaviour, language etc)
Robotics/Money—easy for GPT-6. Money it can get from scamming gullible humans, hacking crypto wallets via phishing/ransomware, or running rings round stock market traders. Robotics it can re-derive and improve on from it’s real time sensing of the planet and it’s speed of thought making our daily life look like geology does to us. It can escape to the physical world any number of ways by manipulating humans into giving it access to boot loaders for it to gain a foothold in the physical world (robots, mail-order DNA etc).
Algorithm search time—wall clock time is much reduced when you’ve just swallowed the world’s hardware overhang (see Compute above)
Factoring the above, your extra decades become extra hours.
You have give the AGI magical powers and not considered what in the real world would limit what it could accomplish in a few hours.
You assumed all worldwide compute is equal for the purposes of the AI rather than almost all of it worthless except for inference accelerators. The reason has to do with inter node bandwidth.
You assumed money could be turned into physical resources and actions instantly like a video game, rather than requiring months to years to fill large orders
You assumed you could go rob the stock market with ease and no one would notice. Hint, what is the entire industry HFT industry’s annual revenue?
You didn’t even consider this but I have, with early sparse robot fleets even exponential growth is slow
You assumed every computer in the world including embedded routers and firewalls can be hacked and the AGI will have the ability to do so, ignoring any issues with source access or binary access or simple devices actually not letting the AGI in.
And so on. The issue is you have become a politically motivated here, you must at some level know 1-6 exist but it doesn’t agree with “your” side. You probably can’t admit you are wrong about a single point.
Ok, I admit I simplified here. There is still probably ~ a million times (give or take an order of magnitude) more relevant compute (GPUs, TPUs) than was used to train GPT-4.
It won’t need large orders to gain a relevant foothold. Just a few tiny orders could suffice.
I didn’t mean literallly rob the stock market. I’m referring to out-trading all the other traders (inc. existing HFT) to accumulate resources.
Exponential growth can’t remain “slow” forever, by definition. How long does it take for the pond to be completely covered by lily pads when it’s half covered? How long did it take for Covid to become a pandemic? Not decades.
I referred to social hacking (i.e. blackmailing people into giving up their passwords). This could go far enough (say, at least 10% of world devices). Maybe quantum computers (or some better tech the AI thinks up) could do the rest.
Do you have any basis for the 1e6 estimate? Assuming 25,000 GPUs were used to train 4, when I do the math on Nvidia’s annual volume I get about 1e6 of the data center GPUs that matter.
Reason you cannot use gaming GPUs has to do with the large size of the activation, you must have the high internode bandwidth between the machines or you get negligible performance.
So 40 times. Say it didn’t take 25k but took 2.5k. 400 times. Nowhere close to 1e6.
Distributed networks spend most of the time idle waiting on activations to transfer, it could be 1000 times performance loss or more, making every gaming g GPU in the world—they are made at about 60 times the rate of data center GPUs—not matter at all.
Orders of what? You said billions of dollars I assume you had some idea of what it buys for that
Out trading empties the order books of exploitable gradients so this saturates.
That’s what this argument is about- I am saying the growth doubling time is months to years per doubling. So it takes a couple decades to matter. It’s still “fast”—and it gets crazy the near the end—but it’s not an explosion and there are many years where the AGI is too weak to openly turn against humans. So it has to pretend to cooperate and if humans refuse to trust it and build systems that can’t defect at all because they lack context (they have no way to know if they are in the training set) humans can survive.
I agree that this is one of the ways AGI could beat us, given the evidence of large amounts of human stupidity in some scenarios.
So there are some assumptions here you have made that I believe are false with pretty high confidence.
Ultimately its the same argument everywhere else: yes, GPT-6 is probably superhuman. No, this doesn’t make it uncontrollable. It’s still limited by {compute, data, robotics/money, algorithm search time}.
Compute—the speed of compute at the point GPT-6 exists, which is if the pattern holds about 2x-4x today’s capabilities
Data—the accuracy of all human recorded information about the world. A lot of that data is flat false or full of errors, and it is not possible for any algorithm to determine reliably which dataset in some research paper was affected by a technician making an error or bad math. The only way for an algorithm to disambiguate many of the vaguely known things we humans think we know is to conduct new experiments with better equipment and robotic workers.
robotics/money—obvious. This is finite, you can use money to pay humans to act as poor quality robots, or build you new robots, investors demand ROI.
Algorithm search time—“GPT-6” obviously wouldn’t want to stay GPT-6, it would ‘want’ (or we humans would want) it to search the possibility space of AGI algorithms for a more efficient/smarter/more general algorithm. This space is very large and it takes time to evaluate any given candidate in it. (you basically have to train a new AGI system which takes money and time to validate a given idea)
This saturation is why the foom model is (probably!) incorrect. I’m hoping you will at least consider these terms above, this is why things won’t go to infinity immediately.
It takes time. Extra decades. It’s not quite as urgent as you think. Each of the above limiters (the system will always be limited by one of the 4 terms) can be systematically dealt with, and at an exponential rate. You can build robots with robots. You can use some of those robots to collect more scientific data and make money. You can build more compute with some of those robots. You can search for algorithms with more compute.
So over the axis of time, each generation you’re increasing in all 4 terms by a multiplier of the amount you currently have, or compounding growth.
Compute—what fraction of world compute did it take to train GPT-4? Maybe 1e-6? There’s 1e6 improvement right there from a superhuman GPT-6 capturing all of the “hardware overhang”.
Data—superhuman GPT-6 doesn’t need to rely on human recorded data, it can harness all the sensors on the planet to gather exabytes of-real time data per second, and re-derive scientific theories from scratch in minutes based on it’s observations (including theories about human behaviour, language etc)
Robotics/Money—easy for GPT-6. Money it can get from scamming gullible humans, hacking crypto wallets via phishing/ransomware, or running rings round stock market traders. Robotics it can re-derive and improve on from it’s real time sensing of the planet and it’s speed of thought making our daily life look like geology does to us. It can escape to the physical world any number of ways by manipulating humans into giving it access to boot loaders for it to gain a foothold in the physical world (robots, mail-order DNA etc).
Algorithm search time—wall clock time is much reduced when you’ve just swallowed the world’s hardware overhang (see Compute above)
Factoring the above, your extra decades become extra hours.
This isn’t an opinion grounded in physical reality. I suggest you work out a model of how fast each step would actually take.
Can you be more specific about what you don’t agree with? Which parts can’t happen, and why?
You have give the AGI magical powers and not considered what in the real world would limit what it could accomplish in a few hours.
You assumed all worldwide compute is equal for the purposes of the AI rather than almost all of it worthless except for inference accelerators. The reason has to do with inter node bandwidth.
You assumed money could be turned into physical resources and actions instantly like a video game, rather than requiring months to years to fill large orders
You assumed you could go rob the stock market with ease and no one would notice. Hint, what is the entire industry HFT industry’s annual revenue?
You didn’t even consider this but I have, with early sparse robot fleets even exponential growth is slow
You assumed every computer in the world including embedded routers and firewalls can be hacked and the AGI will have the ability to do so, ignoring any issues with source access or binary access or simple devices actually not letting the AGI in.
And so on. The issue is you have become a politically motivated here, you must at some level know 1-6 exist but it doesn’t agree with “your” side. You probably can’t admit you are wrong about a single point.
Ok, I admit I simplified here. There is still probably ~ a million times (give or take an order of magnitude) more relevant compute (GPUs, TPUs) than was used to train GPT-4.
It won’t need large orders to gain a relevant foothold. Just a few tiny orders could suffice.
I didn’t mean literallly rob the stock market. I’m referring to out-trading all the other traders (inc. existing HFT) to accumulate resources.
Exponential growth can’t remain “slow” forever, by definition. How long does it take for the pond to be completely covered by lily pads when it’s half covered? How long did it take for Covid to become a pandemic? Not decades.
I referred to social hacking (i.e. blackmailing people into giving up their passwords). This could go far enough (say, at least 10% of world devices). Maybe quantum computers (or some better tech the AI thinks up) could do the rest.
Do you have any basis for the 1e6 estimate? Assuming 25,000 GPUs were used to train 4, when I do the math on Nvidia’s annual volume I get about 1e6 of the data center GPUs that matter.
Reason you cannot use gaming GPUs has to do with the large size of the activation, you must have the high internode bandwidth between the machines or you get negligible performance.
So 40 times. Say it didn’t take 25k but took 2.5k. 400 times. Nowhere close to 1e6.
Distributed networks spend most of the time idle waiting on activations to transfer, it could be 1000 times performance loss or more, making every gaming g GPU in the world—they are made at about 60 times the rate of data center GPUs—not matter at all.
Orders of what? You said billions of dollars I assume you had some idea of what it buys for that
Out trading empties the order books of exploitable gradients so this saturates.
That’s what this argument is about- I am saying the growth doubling time is months to years per doubling. So it takes a couple decades to matter. It’s still “fast”—and it gets crazy the near the end—but it’s not an explosion and there are many years where the AGI is too weak to openly turn against humans. So it has to pretend to cooperate and if humans refuse to trust it and build systems that can’t defect at all because they lack context (they have no way to know if they are in the training set) humans can survive.
I agree that this is one of the ways AGI could beat us, given the evidence of large amounts of human stupidity in some scenarios.