This letter is probably significantly net negative due to its impact on capabilities researchers who don’t like it. I don’t understand why the authors thought it was a good idea. Perhaps they don’t realize that there’s no possible enforcement of it that could prevent a GPT4 level model that runs on individual gpus from being trained? to the people who can do that, it’s really obvious, and I don’t think they’re going to stop; but I could imagine them rushing harder if they think The Law is coming after them.
It’s especially egregious because of accepting names without personal verification. That will probably amplify the negative response.
It may not be possible to prevent GPT4-sized models, but it probably is possible to prevent GPT-5-sized models, if the large companies sign on and don’t want it to be public knowledge that they did it. Right?
“Beyond a reasonable doubt” the AI will be safe? “Don’t do anything for 6 months to try to reach full human intelligence”?
Guess what happens in 6 months. They will say they aren’t convinced the AI will be safe and ask for 6 more months and so on until another nation mass produces automated weapons (something they also have asked for banning) and conquers the planet.
Automated weapons are precisely the kind of thing that because they can attack in overwhelming coordinated swarms, the only defense is....yep, your own automated weapons. Human reactions are too slow.
Only way to know if a safety strategy will work is to build the thing.
“Don’t do anything for 6 months” is a ridiculous exaggeration. The proposal is to stop training for 6 months. You can do research on smaller models without training the large one.
I agree it is debatable whether “beyond a reasonable doubt” standard is appropriate, but it seems entirely sane to pause for 6 months and use that time to, for example, discuss which standard is appropriate.
Other arguments you made seem to say “we shouldn’t cooperate if the other side defects”, and I agree, that’s game theory 101, but that’s not an argument against cooperating? If you are saying anything more, please elaborate.
I am saying the stakes are enormously not in favor of cooperating just for the chance the other parties defect. Very similar to the logic that lead to nuclear arsenal buildups in the cold war.
As terrible as those weapons, it would have been even more terrible for the other side to secretly defect then surprise attack with them, an action we can be virtually certain they would have committed. Note this is bilateral: this is equally true from the East and West sides of the cold war.
Automated weapons can be replied to with nukes if they’re the same scale, and the US has demonstrated drone amplification of fighter pilots, so I’m actually slightly less worried about that—as much as I hate their inefficiency and want them to get the fuck out of yemen, I’m also not worried about US losing air superiority. I’m pretty sure the main weapons risk from AI is superpathogens designed to kill all humans. Sure, humans wouldn’t use them, but it’s been imaginable how to build them for a while, it would only take an AI who thought they could live without us.
I think your model of safety doesn’t match mine much at all. What’s your timeline until AI that is stronger than every individual human at every competitive game?
No, automated weapons cannot be countered with nukes. I specifically meant the scenario of
General purpose task robotics controlling models. Appears extremely feasible because the generality hypothesis turns out to be correct. (meaning it’s actually easier to solve all robotics tasks all at once than to solve individual ones to human level. Gpt-3 source is no more complex than efficient zero.)
Self replication which is an obvious property of 2
The mining manufacturing equivalent of having 10 billion or 100 billion workers.
Enough automated weapons as to create an impervious defense against nuclear attack by parties with current or near future human built technology. 1000 ICBMs is scary when you have 10 abms and do not have defenses at each target or thousands of backup radars.
It is an annoyance when you have overwhelming numbers of defensive weapons and can actually afford to make enough bunkers for every living citizen.
I don’t think being stronger at every game makes AI necessarily uncontrollable. I think the open agency model allows for competitive AGI and ASI that will be potentially more effective than the global RL stateful agent model. (More effective because as humans we care about task performance and reliability and a stateless system will be many times more reliable)
Why do you think it only applies to the US? It applies to the whole world. It says “all AI labs”, and “govenrments”. I hope the top signatories are reaching out to labs in China and other countries. And the UN for that matter. There’s no reason why they wouldn’t also agree. We need a global moratorium on AGI.
You seriously believe that we can make China and Russia and all other countries not do this?
Based on our excellent track record of making them stop the other deeply unethical and existentially dangerous things they absolutely do?
We can’t stop Russia from waging an atrocious war in Europe and threatening bloody nukes. And like, we really tried. We pulled all economic stops, and are as close as you can get to being in open warfare against them without actually declaring war. Essentially the US, the EU, UK, and others, united. And it is not enough.
And they are so much weaker than China. The US and EU attempts to get China to follow minimal standards of safety and ethics on anything have been depressing. China is literally running concentration camps, spying on us with balloons, apps and gadgets, not recognising an independent company and threatening future invasion, backing a crazy dictator developing nukes while starving his populace, regularly engaging in biological research that is absolutely unethical, destroying the planetary climate, and recently unleashed a global pandemic, with the only uncertainty being whether this is due to them encroaching on wildlands and having risky and unethical wetmarket practices and the trying to cover the resulting zoonosis up, or due to them running illegal gain of function research and having such poor safety standards in their lab that they lost the damn thing. I hate to say it, because the people of China deserve so much better, and have such an incredible history, and really do not deserve the backlash from their government’s action’s hitting them, but I would not trust their current rulers one inch.
My country, Germany, has a long track record of trying to cooperate with Russia for mutual benefit, because we really and honestly believed that this was the way to go and would work for everyone’s safety, that every country’s leadership is, at its core, reasonable, that if you listen and make compromises and act honestly and in good faith, international conflicts can all be fixed. We have succeeded in establishing peaceful and friendly relations with a large number of countries that were former enemies, who forgave us for our atrocities, despite us having started a world-wide conflict and genocide, which would look to be beyond what can be forgiven. We played a core role in the European union project trying to put peaceful relations through dialogue and trade into practice and guarantee long-term peace, and I am very proud of that. I am impressed at how it became possible to reestablish trust, how France was an ancestral enemy, already long before the world wars, and is now one of our closest partners. I would love to live in a world were Russia would be a trustworthy partner for this project. But the data that Putin was not willing to cooperate on that and was just exploiting anything we offered became overwhelming. There is a good case to be made for acting ethically and hopeful even when there is little hope and ethics in your opponent. But there comes a point where the data suggests that this is really not working, and you are just being taken advantage of, and we went way, way past that point. Anyone trusting anything Putin says at this point is unfortunately an idiot. Things would need to drastically change for trust to be re-established, and there is no indication or likelihood of that happening for now.
Both China and Russia have openly acknowledged that they see obtaining AGI first as crucial to world dominion. The chance of them slowing down for ethical reasons or due to international pressure under their current leadership is bloody zero.
I’m not even sure the US could pull this off internally. I wouldn’t be surprised if it didn’t just end with Silicon Valley taking off seafaring with a submarine datacenter. There is so, so much power tied to AI. For governments, for cooperations, for individuals. So many urgent problems it could solve, so much money to be made, and so much relative power to obtain over others, at a time where the world order is changing and many people see the role of their country in it as an existential ideological fight. To get everyone to give AI up, for a risk whose likelihood and time of onset we cannot quantify or prove?
We can but hope they will see sense (as will the US government—and it’s worth considering that in hindsight, maybe they were actually the baddies when it came to nuclear escalation). There is an iceberg on the horizon. It’s not the time to be fighting over revenue from deckchair rentals, or who gets to specify their arrangement. There’s geopolitical recklessness, and there’s suicide. Putin and Xi aren’t suicidal.
But if they say they “see sense” but start a secret lab that trains the biggest models their compute can support do we want to be in that position? We don’t even know what the capabilities are, or if this is a danger and when if we “pause” research until we are “beyond a reasonable doubt” sure going bigger is safe.
Only reason we know how much plutonium or u-235 even matters and how pure it needs to be and how to detect from a distance the activities to make a nuke is we built a bunch of them and did a bunch of research.
This is saying close the lab down before we learn anything. We are barely seeing the sparks of AGI, it barely works at all.
Ultimately, it doesn’t matter which monkey gets the poison banana. We’re all dead either way. This is much worse than nukes, in that we really can’t risk even one (intelligence) explosion.
Note this depends on assumptions about the marginal utility of intelligence or that explosions are possible. An alternative model is that in the next few years, someone will build recursively improving AI. The machine will quickly improve until it is at a limit of : compute, data, physical resources, difficulty of finding an improved algorithm in the remaining search space.
If when at the limit the machine is NOT as capable as you are assuming—say it’s superintelligent, but it’s manipulation abilities for a specific person are not perfect when it doesn’t have enough data on the target, or it’s ability to build a nanoforge still requires it to have a million robots or maybe it’s 2 million.
We don’t know the exact point where saturation is reached but it could be not far above human intelligence, making explosions impossible.
there’s significant reason to believe that even if actual intelligence isn’t that far above human, level of strategic advantage could be catastrophically intense, able to destroy all humans in a way your imagined weapons can’t defend against. That’s the key threat.
You’re right that we can’t slow down to fix it, though. Do you actually have any ideas about how to defend against all kinds of weapons, even superbioweapons intended to kill all non-silicon life?
Yes. Get your own non agentic AGIs or you are helpless. Because the atmosphere is going to become unusable to unprotected humans, assymetric attacks are too easy.
You need vast numbers of bunkers, and immense amounts of new equipment including life support and medical equipment that is beyond human ability to build.
Basically if you don’t have self replicating robots—controlled by session based non agentic open agency stateless AI systems—you will quickly lose to parties that have them. It’s probably the endgame tech for control of the planet, because the tiniest advantage compounds until one party has an overwhelming lead.
the tiniest advantage compounds until one party has an overwhelming lead.
This, but x1000 to what you are thinking. I don’t think we have any realistic chance of approximate parity between the first and second movers. The speed that the first mover will be thinking makes this so. Say GPT-6 is smarter at everything, even by a little bit, compared to everything else on the planet (humans, other AIs). It’s copied itself 1000 times, and each copy is thinking 10,000,000 times faster than a human. We will essentially be like rocks to it, operating on geological time periods. It can work out how to disassemble our environment (including an unfathomable number of contingencies against counter strike) over subjective decades or centuries of human-equivalent thinking time before your sentinal AI protectors even pick up it’s activity.
So there are some assumptions here you have made that I believe are false with pretty high confidence.
Ultimately its the same argument everywhere else: yes, GPT-6 is probably superhuman. No, this doesn’t make it uncontrollable. It’s still limited by {compute, data, robotics/money, algorithm search time}.
Compute—the speed of compute at the point GPT-6 exists, which is if the pattern holds about 2x-4x today’s capabilities
Data—the accuracy of all human recorded information about the world. A lot of that data is flat false or full of errors, and it is not possible for any algorithm to determine reliably which dataset in some research paper was affected by a technician making an error or bad math. The only way for an algorithm to disambiguate many of the vaguely known things we humans think we know is to conduct new experiments with better equipment and robotic workers.
robotics/money—obvious. This is finite, you can use money to pay humans to act as poor quality robots, or build you new robots, investors demand ROI.
Algorithm search time—“GPT-6” obviously wouldn’t want to stay GPT-6, it would ‘want’ (or we humans would want) it to search the possibility space of AGI algorithms for a more efficient/smarter/more general algorithm. This space is very large and it takes time to evaluate any given candidate in it. (you basically have to train a new AGI system which takes money and time to validate a given idea)
This saturation is why the foom model is (probably!) incorrect. I’m hoping you will at least consider these terms above, this is why things won’t go to infinity immediately.
It takes time. Extra decades. It’s not quite as urgent as you think. Each of the above limiters (the system will always be limited by one of the 4 terms) can be systematically dealt with, and at an exponential rate. You can build robots with robots. You can use some of those robots to collect more scientific data and make money. You can build more compute with some of those robots. You can search for algorithms with more compute.
So over the axis of time, each generation you’re increasing in all 4 terms by a multiplier of the amount you currently have, or compounding growth.
Compute—what fraction of world compute did it take to train GPT-4? Maybe 1e-6? There’s 1e6 improvement right there from a superhuman GPT-6 capturing all of the “hardware overhang”.
Data—superhuman GPT-6 doesn’t need to rely on human recorded data, it can harness all the sensors on the planet to gather exabytes of-real time data per second, and re-derive scientific theories from scratch in minutes based on it’s observations (including theories about human behaviour, language etc)
Robotics/Money—easy for GPT-6. Money it can get from scamming gullible humans, hacking crypto wallets via phishing/ransomware, or running rings round stock market traders. Robotics it can re-derive and improve on from it’s real time sensing of the planet and it’s speed of thought making our daily life look like geology does to us. It can escape to the physical world any number of ways by manipulating humans into giving it access to boot loaders for it to gain a foothold in the physical world (robots, mail-order DNA etc).
Algorithm search time—wall clock time is much reduced when you’ve just swallowed the world’s hardware overhang (see Compute above)
Factoring the above, your extra decades become extra hours.
You have give the AGI magical powers and not considered what in the real world would limit what it could accomplish in a few hours.
You assumed all worldwide compute is equal for the purposes of the AI rather than almost all of it worthless except for inference accelerators. The reason has to do with inter node bandwidth.
You assumed money could be turned into physical resources and actions instantly like a video game, rather than requiring months to years to fill large orders
You assumed you could go rob the stock market with ease and no one would notice. Hint, what is the entire industry HFT industry’s annual revenue?
You didn’t even consider this but I have, with early sparse robot fleets even exponential growth is slow
You assumed every computer in the world including embedded routers and firewalls can be hacked and the AGI will have the ability to do so, ignoring any issues with source access or binary access or simple devices actually not letting the AGI in.
And so on. The issue is you have become a politically motivated here, you must at some level know 1-6 exist but it doesn’t agree with “your” side. You probably can’t admit you are wrong about a single point.
Ok, I admit I simplified here. There is still probably ~ a million times (give or take an order of magnitude) more relevant compute (GPUs, TPUs) than was used to train GPT-4.
It won’t need large orders to gain a relevant foothold. Just a few tiny orders could suffice.
I didn’t mean literallly rob the stock market. I’m referring to out-trading all the other traders (inc. existing HFT) to accumulate resources.
Exponential growth can’t remain “slow” forever, by definition. How long does it take for the pond to be completely covered by lily pads when it’s half covered? How long did it take for Covid to become a pandemic? Not decades.
I referred to social hacking (i.e. blackmailing people into giving up their passwords). This could go far enough (say, at least 10% of world devices). Maybe quantum computers (or some better tech the AI thinks up) could do the rest.
Do you have any basis for the 1e6 estimate? Assuming 25,000 GPUs were used to train 4, when I do the math on Nvidia’s annual volume I get about 1e6 of the data center GPUs that matter.
Reason you cannot use gaming GPUs has to do with the large size of the activation, you must have the high internode bandwidth between the machines or you get negligible performance.
So 40 times. Say it didn’t take 25k but took 2.5k. 400 times. Nowhere close to 1e6.
Distributed networks spend most of the time idle waiting on activations to transfer, it could be 1000 times performance loss or more, making every gaming g GPU in the world—they are made at about 60 times the rate of data center GPUs—not matter at all.
Orders of what? You said billions of dollars I assume you had some idea of what it buys for that
Out trading empties the order books of exploitable gradients so this saturates.
That’s what this argument is about- I am saying the growth doubling time is months to years per doubling. So it takes a couple decades to matter. It’s still “fast”—and it gets crazy the near the end—but it’s not an explosion and there are many years where the AGI is too weak to openly turn against humans. So it has to pretend to cooperate and if humans refuse to trust it and build systems that can’t defect at all because they lack context (they have no way to know if they are in the training set) humans can survive.
I agree that this is one of the ways AGI could beat us, given the evidence of large amounts of human stupidity in some scenarios.
Yeah, that seems more or less correct long-term to me. By long term I mean, like, definitely by the end of 2025. Probably a lot sooner. Probably not for at least 6 months. Curious if you disagree with those bounds. If you’re interested in helping build non-first-strike defensive ais that will protect me and people I care about, I’d be willing to help you do the same. In general, that’s my perspective on safety: try to design yourself so you’re bad at first strike and really good at parry and if necessary also at retaliate. I’d prioritize parry, if possible. There are algorithms that make me think parry is a long term extremely effective move, but you need to be able to parry everything, which is pretty dang hard, shielding materials take energy to build.
I’m pretty sure it’s possible for everyone to get a defensive agi. everyone gets a fair defense window, and the world stays “normal” ish, but now with scifi forcefield-immunesystems protecting everyone from everyone else.
also, please don’t capture anyone’s bodies from themselves. it’s pretty cheap to keep all humans alive actually, and you can invite everyone to reproduce their souls through AI and/or live indefeinitely. This is going to be crazy to get through, but let’s build star trek, whether or not it looks exactly the same economically we can have almost everyone else good about it (...besides warp) if things go well enough through the agi war.
I think it will take a little longer, the most elite companies cut back on robotics to go all in on the same llm meme tech (which is really stupid. Yes llms are the best thing found but you probably won’t win a race if you start a little behind and everyone else is also competing)
Because how could you trust such a promise? It’s exactly like nukes. The risk if you don’t have any or any protection is they will incinerate 50+ million of your people, blowing all your major cities, and declare war after. That is almost certainly what would have happened during the cold war had either side pledged to not build nukes and spies confirmed they were honoring the pledge.
Except the risk of igniting the atmosphere with the Trinity test is judged to be ~10%. It’s not “you slow down, and let us win”, it’s “we all slow down, or we all die”. This is not a Prisoners Dilema:
This is misinformation. There is a chance of a positive outcome in all boxes, except the upper left because it has the negative of entropy, aging, dictators killing us eventually with a p of 1.0.
Even the certain doomers admit there is a chance the AGI systems are controllable, and there are straightforward ways to build controllable AGIs people ignore in their campaign for alignment research money. They just say “well people will make them globally agentic” if you point this out. Like blocking nuclear power building if you COULD make a reactor that endlessly tickles the tail of prompt criticality.
See Eric Drexlers proposals on this very site. Those systems are controllable.
Look, I agree re “negative of entropy, aging, dictators killing us eventually”, and a chance of positive outcome, but right now I think the balance is approximately like the above payoff matrix over the next 5-10 years, without a global moratorium (i.e. the positive outcome is very unlikely unless we take a decade or two to pause and think/work on alignment). I’d love to live in something akin to Iain M Banks’ culture, but we need to get through this acute risk period first, to stand any chance of that.
Do you think Drexler’s CAIS is straightforwardly controllable? Why? What’s to stop it being amalgamated into more powerful, less controllable systems? “People” don’t need to make them globally agentic. That can happen automatically via Basic AI Drives and Mesaoptimisation once thresholds in optimisation power are reached.
I’m worried that actually, Alignment might well turn out to be impossible. Maybe a moratorium will allow for such impossibility proofs to be established. What then?
“People” don’t need to make them globally agentic. That can happen automatically via Basic AI Drives and Mesaoptimisation once thresholds in optimisation power are reached.
Care to explain? The idea of open agency is we subdivide everything into short term, defined tasks that many AI can do and it is possible to compare notes.
AI systems are explicitly designed where it is difficult to know if they are even in the world or receiving canned training data. (This is explicitly true for gpt-4 for example, it is perfectly stateless and you can move the token input vector between nodes and fix the RNG seed and get the same answer each time)
This makes them highly reliable in the real world, whole anything else is less reliable, so...
The idea is that instead of helplessly waiting to die from other people’s misaligned AGI you beat them and build one you can control and use it to take the offensive when you have to. I suspect this may be the actual course of action surviving human worlds take. Your proposal is possibly certain death because ONLY people who care at all about ethics would consider delaying AGI. Making the unethical ones the ones who get it first for certain.
Kind of how spaying and neutering friendly pets reduces the gene pool for those positive traits.
Selection pressure will cause models to become agentic as they increase in power—those doing the agentic things (following universal instrumental goals like accumulating more resources and self-improvement) will outperform those that don’t. Mesaoptimisation (explainer video) is kind of like cheating—models that create inner optimisers that target something easier to get than what we meant, will be selected (by getting higher rewards) over models that don’t (because we won’t be aware of the inner misalignment). Evolution is a case in point—we are products of it, yet misaligned to its goals (we want sex, and high calorie foods, and money, rather than caring explicitly about inclusive genetic fitness). Without alignment being 100% watertight, powerful AIs will have completely alien goals.
This letter is probably significantly net negative due to its impact on capabilities researchers who don’t like it. I don’t understand why the authors thought it was a good idea. Perhaps they don’t realize that there’s no possible enforcement of it that could prevent a GPT4 level model that runs on individual gpus from being trained? to the people who can do that, it’s really obvious, and I don’t think they’re going to stop; but I could imagine them rushing harder if they think The Law is coming after them.
It’s especially egregious because of accepting names without personal verification. That will probably amplify the negative response.
It may not be possible to prevent GPT4-sized models, but it probably is possible to prevent GPT-5-sized models, if the large companies sign on and don’t want it to be public knowledge that they did it. Right?
Not for long. Sure, maybe it’s a few months.
I mean, the letter is asking for six months, so it seems reasonable.
perhaps. but don’t kid yourself—that’s the entire remaining runway!
It’s outright surrendering to entropy and death.
“Beyond a reasonable doubt” the AI will be safe? “Don’t do anything for 6 months to try to reach full human intelligence”?
Guess what happens in 6 months. They will say they aren’t convinced the AI will be safe and ask for 6 more months and so on until another nation mass produces automated weapons (something they also have asked for banning) and conquers the planet.
Automated weapons are precisely the kind of thing that because they can attack in overwhelming coordinated swarms, the only defense is....yep, your own automated weapons. Human reactions are too slow.
Only way to know if a safety strategy will work is to build the thing.
“Don’t do anything for 6 months” is a ridiculous exaggeration. The proposal is to stop training for 6 months. You can do research on smaller models without training the large one.
I agree it is debatable whether “beyond a reasonable doubt” standard is appropriate, but it seems entirely sane to pause for 6 months and use that time to, for example, discuss which standard is appropriate.
Other arguments you made seem to say “we shouldn’t cooperate if the other side defects”, and I agree, that’s game theory 101, but that’s not an argument against cooperating? If you are saying anything more, please elaborate.
I am saying the stakes are enormously not in favor of cooperating just for the chance the other parties defect. Very similar to the logic that lead to nuclear arsenal buildups in the cold war.
As terrible as those weapons, it would have been even more terrible for the other side to secretly defect then surprise attack with them, an action we can be virtually certain they would have committed. Note this is bilateral: this is equally true from the East and West sides of the cold war.
Automated weapons can be replied to with nukes if they’re the same scale, and the US has demonstrated drone amplification of fighter pilots, so I’m actually slightly less worried about that—as much as I hate their inefficiency and want them to get the fuck out of yemen, I’m also not worried about US losing air superiority. I’m pretty sure the main weapons risk from AI is superpathogens designed to kill all humans. Sure, humans wouldn’t use them, but it’s been imaginable how to build them for a while, it would only take an AI who thought they could live without us.
I think your model of safety doesn’t match mine much at all. What’s your timeline until AI that is stronger than every individual human at every competitive game?
No, automated weapons cannot be countered with nukes. I specifically meant the scenario of
General purpose task robotics controlling models. Appears extremely feasible because the generality hypothesis turns out to be correct. (meaning it’s actually easier to solve all robotics tasks all at once than to solve individual ones to human level. Gpt-3 source is no more complex than efficient zero.)
Self replication which is an obvious property of 2
The mining manufacturing equivalent of having 10 billion or 100 billion workers.
Enough automated weapons as to create an impervious defense against nuclear attack by parties with current or near future human built technology. 1000 ICBMs is scary when you have 10 abms and do not have defenses at each target or thousands of backup radars.
It is an annoyance when you have overwhelming numbers of defensive weapons and can actually afford to make enough bunkers for every living citizen.
I don’t think being stronger at every game makes AI necessarily uncontrollable. I think the open agency model allows for competitive AGI and ASI that will be potentially more effective than the global RL stateful agent model. (More effective because as humans we care about task performance and reliability and a stateless system will be many times more reliable)
interesting...
Yeah. The planet is too small. Geopolitical stalemates are only possible when someone doesn’t have a big enough weapon.
The endgame will converge to one winner. Winning is not guaranteed but you can always choose to lose.
Why do you think it only applies to the US? It applies to the whole world. It says “all AI labs”, and “govenrments”. I hope the top signatories are reaching out to labs in China and other countries. And the UN for that matter. There’s no reason why they wouldn’t also agree. We need a global moratorium on AGI.
You seriously believe that we can make China and Russia and all other countries not do this?
Based on our excellent track record of making them stop the other deeply unethical and existentially dangerous things they absolutely do?
We can’t stop Russia from waging an atrocious war in Europe and threatening bloody nukes. And like, we really tried. We pulled all economic stops, and are as close as you can get to being in open warfare against them without actually declaring war. Essentially the US, the EU, UK, and others, united. And it is not enough.
And they are so much weaker than China. The US and EU attempts to get China to follow minimal standards of safety and ethics on anything have been depressing. China is literally running concentration camps, spying on us with balloons, apps and gadgets, not recognising an independent company and threatening future invasion, backing a crazy dictator developing nukes while starving his populace, regularly engaging in biological research that is absolutely unethical, destroying the planetary climate, and recently unleashed a global pandemic, with the only uncertainty being whether this is due to them encroaching on wildlands and having risky and unethical wetmarket practices and the trying to cover the resulting zoonosis up, or due to them running illegal gain of function research and having such poor safety standards in their lab that they lost the damn thing. I hate to say it, because the people of China deserve so much better, and have such an incredible history, and really do not deserve the backlash from their government’s action’s hitting them, but I would not trust their current rulers one inch.
My country, Germany, has a long track record of trying to cooperate with Russia for mutual benefit, because we really and honestly believed that this was the way to go and would work for everyone’s safety, that every country’s leadership is, at its core, reasonable, that if you listen and make compromises and act honestly and in good faith, international conflicts can all be fixed. We have succeeded in establishing peaceful and friendly relations with a large number of countries that were former enemies, who forgave us for our atrocities, despite us having started a world-wide conflict and genocide, which would look to be beyond what can be forgiven. We played a core role in the European union project trying to put peaceful relations through dialogue and trade into practice and guarantee long-term peace, and I am very proud of that. I am impressed at how it became possible to reestablish trust, how France was an ancestral enemy, already long before the world wars, and is now one of our closest partners. I would love to live in a world were Russia would be a trustworthy partner for this project. But the data that Putin was not willing to cooperate on that and was just exploiting anything we offered became overwhelming. There is a good case to be made for acting ethically and hopeful even when there is little hope and ethics in your opponent. But there comes a point where the data suggests that this is really not working, and you are just being taken advantage of, and we went way, way past that point. Anyone trusting anything Putin says at this point is unfortunately an idiot. Things would need to drastically change for trust to be re-established, and there is no indication or likelihood of that happening for now.
Both China and Russia have openly acknowledged that they see obtaining AGI first as crucial to world dominion. The chance of them slowing down for ethical reasons or due to international pressure under their current leadership is bloody zero.
I’m not even sure the US could pull this off internally. I wouldn’t be surprised if it didn’t just end with Silicon Valley taking off seafaring with a submarine datacenter. There is so, so much power tied to AI. For governments, for cooperations, for individuals. So many urgent problems it could solve, so much money to be made, and so much relative power to obtain over others, at a time where the world order is changing and many people see the role of their country in it as an existential ideological fight. To get everyone to give AI up, for a risk whose likelihood and time of onset we cannot quantify or prove?
We can but hope they will see sense (as will the US government—and it’s worth considering that in hindsight, maybe they were actually the baddies when it came to nuclear escalation). There is an iceberg on the horizon. It’s not the time to be fighting over revenue from deckchair rentals, or who gets to specify their arrangement. There’s geopolitical recklessness, and there’s suicide. Putin and Xi aren’t suicidal.
But if they say they “see sense” but start a secret lab that trains the biggest models their compute can support do we want to be in that position? We don’t even know what the capabilities are, or if this is a danger and when if we “pause” research until we are “beyond a reasonable doubt” sure going bigger is safe.
Only reason we know how much plutonium or u-235 even matters and how pure it needs to be and how to detect from a distance the activities to make a nuke is we built a bunch of them and did a bunch of research.
This is saying close the lab down before we learn anything. We are barely seeing the sparks of AGI, it barely works at all.
Ultimately, it doesn’t matter which monkey gets the poison banana. We’re all dead either way. This is much worse than nukes, in that we really can’t risk even one (intelligence) explosion.
Note this depends on assumptions about the marginal utility of intelligence or that explosions are possible. An alternative model is that in the next few years, someone will build recursively improving AI. The machine will quickly improve until it is at a limit of : compute, data, physical resources, difficulty of finding an improved algorithm in the remaining search space.
If when at the limit the machine is NOT as capable as you are assuming—say it’s superintelligent, but it’s manipulation abilities for a specific person are not perfect when it doesn’t have enough data on the target, or it’s ability to build a nanoforge still requires it to have a million robots or maybe it’s 2 million.
We don’t know the exact point where saturation is reached but it could be not far above human intelligence, making explosions impossible.
there’s significant reason to believe that even if actual intelligence isn’t that far above human, level of strategic advantage could be catastrophically intense, able to destroy all humans in a way your imagined weapons can’t defend against. That’s the key threat.
You’re right that we can’t slow down to fix it, though. Do you actually have any ideas about how to defend against all kinds of weapons, even superbioweapons intended to kill all non-silicon life?
Yes. Get your own non agentic AGIs or you are helpless. Because the atmosphere is going to become unusable to unprotected humans, assymetric attacks are too easy.
You need vast numbers of bunkers, and immense amounts of new equipment including life support and medical equipment that is beyond human ability to build.
Basically if you don’t have self replicating robots—controlled by session based non agentic open agency stateless AI systems—you will quickly lose to parties that have them. It’s probably the endgame tech for control of the planet, because the tiniest advantage compounds until one party has an overwhelming lead.
This, but x1000 to what you are thinking. I don’t think we have any realistic chance of approximate parity between the first and second movers. The speed that the first mover will be thinking makes this so. Say GPT-6 is smarter at everything, even by a little bit, compared to everything else on the planet (humans, other AIs). It’s copied itself 1000 times, and each copy is thinking 10,000,000 times faster than a human. We will essentially be like rocks to it, operating on geological time periods. It can work out how to disassemble our environment (including an unfathomable number of contingencies against counter strike) over subjective decades or centuries of human-equivalent thinking time before your sentinal AI protectors even pick up it’s activity.
So there are some assumptions here you have made that I believe are false with pretty high confidence.
Ultimately its the same argument everywhere else: yes, GPT-6 is probably superhuman. No, this doesn’t make it uncontrollable. It’s still limited by {compute, data, robotics/money, algorithm search time}.
Compute—the speed of compute at the point GPT-6 exists, which is if the pattern holds about 2x-4x today’s capabilities
Data—the accuracy of all human recorded information about the world. A lot of that data is flat false or full of errors, and it is not possible for any algorithm to determine reliably which dataset in some research paper was affected by a technician making an error or bad math. The only way for an algorithm to disambiguate many of the vaguely known things we humans think we know is to conduct new experiments with better equipment and robotic workers.
robotics/money—obvious. This is finite, you can use money to pay humans to act as poor quality robots, or build you new robots, investors demand ROI.
Algorithm search time—“GPT-6” obviously wouldn’t want to stay GPT-6, it would ‘want’ (or we humans would want) it to search the possibility space of AGI algorithms for a more efficient/smarter/more general algorithm. This space is very large and it takes time to evaluate any given candidate in it. (you basically have to train a new AGI system which takes money and time to validate a given idea)
This saturation is why the foom model is (probably!) incorrect. I’m hoping you will at least consider these terms above, this is why things won’t go to infinity immediately.
It takes time. Extra decades. It’s not quite as urgent as you think. Each of the above limiters (the system will always be limited by one of the 4 terms) can be systematically dealt with, and at an exponential rate. You can build robots with robots. You can use some of those robots to collect more scientific data and make money. You can build more compute with some of those robots. You can search for algorithms with more compute.
So over the axis of time, each generation you’re increasing in all 4 terms by a multiplier of the amount you currently have, or compounding growth.
Compute—what fraction of world compute did it take to train GPT-4? Maybe 1e-6? There’s 1e6 improvement right there from a superhuman GPT-6 capturing all of the “hardware overhang”.
Data—superhuman GPT-6 doesn’t need to rely on human recorded data, it can harness all the sensors on the planet to gather exabytes of-real time data per second, and re-derive scientific theories from scratch in minutes based on it’s observations (including theories about human behaviour, language etc)
Robotics/Money—easy for GPT-6. Money it can get from scamming gullible humans, hacking crypto wallets via phishing/ransomware, or running rings round stock market traders. Robotics it can re-derive and improve on from it’s real time sensing of the planet and it’s speed of thought making our daily life look like geology does to us. It can escape to the physical world any number of ways by manipulating humans into giving it access to boot loaders for it to gain a foothold in the physical world (robots, mail-order DNA etc).
Algorithm search time—wall clock time is much reduced when you’ve just swallowed the world’s hardware overhang (see Compute above)
Factoring the above, your extra decades become extra hours.
This isn’t an opinion grounded in physical reality. I suggest you work out a model of how fast each step would actually take.
Can you be more specific about what you don’t agree with? Which parts can’t happen, and why?
You have give the AGI magical powers and not considered what in the real world would limit what it could accomplish in a few hours.
You assumed all worldwide compute is equal for the purposes of the AI rather than almost all of it worthless except for inference accelerators. The reason has to do with inter node bandwidth.
You assumed money could be turned into physical resources and actions instantly like a video game, rather than requiring months to years to fill large orders
You assumed you could go rob the stock market with ease and no one would notice. Hint, what is the entire industry HFT industry’s annual revenue?
You didn’t even consider this but I have, with early sparse robot fleets even exponential growth is slow
You assumed every computer in the world including embedded routers and firewalls can be hacked and the AGI will have the ability to do so, ignoring any issues with source access or binary access or simple devices actually not letting the AGI in.
And so on. The issue is you have become a politically motivated here, you must at some level know 1-6 exist but it doesn’t agree with “your” side. You probably can’t admit you are wrong about a single point.
Ok, I admit I simplified here. There is still probably ~ a million times (give or take an order of magnitude) more relevant compute (GPUs, TPUs) than was used to train GPT-4.
It won’t need large orders to gain a relevant foothold. Just a few tiny orders could suffice.
I didn’t mean literallly rob the stock market. I’m referring to out-trading all the other traders (inc. existing HFT) to accumulate resources.
Exponential growth can’t remain “slow” forever, by definition. How long does it take for the pond to be completely covered by lily pads when it’s half covered? How long did it take for Covid to become a pandemic? Not decades.
I referred to social hacking (i.e. blackmailing people into giving up their passwords). This could go far enough (say, at least 10% of world devices). Maybe quantum computers (or some better tech the AI thinks up) could do the rest.
Do you have any basis for the 1e6 estimate? Assuming 25,000 GPUs were used to train 4, when I do the math on Nvidia’s annual volume I get about 1e6 of the data center GPUs that matter.
Reason you cannot use gaming GPUs has to do with the large size of the activation, you must have the high internode bandwidth between the machines or you get negligible performance.
So 40 times. Say it didn’t take 25k but took 2.5k. 400 times. Nowhere close to 1e6.
Distributed networks spend most of the time idle waiting on activations to transfer, it could be 1000 times performance loss or more, making every gaming g GPU in the world—they are made at about 60 times the rate of data center GPUs—not matter at all.
Orders of what? You said billions of dollars I assume you had some idea of what it buys for that
Out trading empties the order books of exploitable gradients so this saturates.
That’s what this argument is about- I am saying the growth doubling time is months to years per doubling. So it takes a couple decades to matter. It’s still “fast”—and it gets crazy the near the end—but it’s not an explosion and there are many years where the AGI is too weak to openly turn against humans. So it has to pretend to cooperate and if humans refuse to trust it and build systems that can’t defect at all because they lack context (they have no way to know if they are in the training set) humans can survive.
I agree that this is one of the ways AGI could beat us, given the evidence of large amounts of human stupidity in some scenarios.
Yeah, that seems more or less correct long-term to me. By long term I mean, like, definitely by the end of 2025. Probably a lot sooner. Probably not for at least 6 months. Curious if you disagree with those bounds. If you’re interested in helping build non-first-strike defensive ais that will protect me and people I care about, I’d be willing to help you do the same. In general, that’s my perspective on safety: try to design yourself so you’re bad at first strike and really good at parry and if necessary also at retaliate. I’d prioritize parry, if possible. There are algorithms that make me think parry is a long term extremely effective move, but you need to be able to parry everything, which is pretty dang hard, shielding materials take energy to build.
I’m pretty sure it’s possible for everyone to get a defensive agi. everyone gets a fair defense window, and the world stays “normal” ish, but now with scifi forcefield-immunesystems protecting everyone from everyone else.
also, please don’t capture anyone’s bodies from themselves. it’s pretty cheap to keep all humans alive actually, and you can invite everyone to reproduce their souls through AI and/or live indefeinitely. This is going to be crazy to get through, but let’s build star trek, whether or not it looks exactly the same economically we can have almost everyone else good about it (...besides warp) if things go well enough through the agi war.
I think it will take a little longer, the most elite companies cut back on robotics to go all in on the same llm meme tech (which is really stupid. Yes llms are the best thing found but you probably won’t win a race if you start a little behind and everyone else is also competing)
I think 2030+.
Because how could you trust such a promise? It’s exactly like nukes. The risk if you don’t have any or any protection is they will incinerate 50+ million of your people, blowing all your major cities, and declare war after. That is almost certainly what would have happened during the cold war had either side pledged to not build nukes and spies confirmed they were honoring the pledge.
Except the risk of igniting the atmosphere with the Trinity test is judged to be ~10%. It’s not “you slow down, and let us win”, it’s “we all slow down, or we all die”. This is not a Prisoners Dilema:
[Image credit]
This is misinformation. There is a chance of a positive outcome in all boxes, except the upper left because it has the negative of entropy, aging, dictators killing us eventually with a p of 1.0.
Even the certain doomers admit there is a chance the AGI systems are controllable, and there are straightforward ways to build controllable AGIs people ignore in their campaign for alignment research money. They just say “well people will make them globally agentic” if you point this out. Like blocking nuclear power building if you COULD make a reactor that endlessly tickles the tail of prompt criticality.
See Eric Drexlers proposals on this very site. Those systems are controllable.
Look, I agree re “negative of entropy, aging, dictators killing us eventually”, and a chance of positive outcome, but right now I think the balance is approximately like the above payoff matrix over the next 5-10 years, without a global moratorium (i.e. the positive outcome is very unlikely unless we take a decade or two to pause and think/work on alignment). I’d love to live in something akin to Iain M Banks’ culture, but we need to get through this acute risk period first, to stand any chance of that.
Do you think Drexler’s CAIS is straightforwardly controllable? Why? What’s to stop it being amalgamated into more powerful, less controllable systems? “People” don’t need to make them globally agentic. That can happen automatically via Basic AI Drives and Mesaoptimisation once thresholds in optimisation power are reached.
I’m worried that actually, Alignment might well turn out to be impossible. Maybe a moratorium will allow for such impossibility proofs to be established. What then?
“People” don’t need to make them globally agentic. That can happen automatically via Basic AI Drives and Mesaoptimisation once thresholds in optimisation power are reached.
Care to explain? The idea of open agency is we subdivide everything into short term, defined tasks that many AI can do and it is possible to compare notes.
AI systems are explicitly designed where it is difficult to know if they are even in the world or receiving canned training data. (This is explicitly true for gpt-4 for example, it is perfectly stateless and you can move the token input vector between nodes and fix the RNG seed and get the same answer each time)
This makes them highly reliable in the real world, whole anything else is less reliable, so...
The idea is that instead of helplessly waiting to die from other people’s misaligned AGI you beat them and build one you can control and use it to take the offensive when you have to. I suspect this may be the actual course of action surviving human worlds take. Your proposal is possibly certain death because ONLY people who care at all about ethics would consider delaying AGI. Making the unethical ones the ones who get it first for certain.
Kind of how spaying and neutering friendly pets reduces the gene pool for those positive traits.
Selection pressure will cause models to become agentic as they increase in power—those doing the agentic things (following universal instrumental goals like accumulating more resources and self-improvement) will outperform those that don’t. Mesaoptimisation (explainer video) is kind of like cheating—models that create inner optimisers that target something easier to get than what we meant, will be selected (by getting higher rewards) over models that don’t (because we won’t be aware of the inner misalignment). Evolution is a case in point—we are products of it, yet misaligned to its goals (we want sex, and high calorie foods, and money, rather than caring explicitly about inclusive genetic fitness). Without alignment being 100% watertight, powerful AIs will have completely alien goals.