This argument is not sensitive to the actual numerical value of P(AI not controllable). If this probability was low, then certainly delaying AGI would be a horrible idea for all the reasons you mentioned, yet as the numerical value increases, we get to a tipping point where delaying vs not delaying are equally costly, and beyond that we get into “definitive delay” territory. The right thing to do depends entirely and critically on P(AI not controllable), just saying “cost is certain, reward is not” is not the right way to go about it. Pandemic preparedness pre-2019 would have had certain costs while the rewards were highly uncertain, but we still should have done it, because the specific values of those uncertain rewards made the calculation obvious.
Delaying AI doesn’t make any sense unless the extra time gives us a better shot at solving the problem. If it’s just “there is nothing we can do to make AI safer”, then there is no reason to postpone the inevitable (or at least, very little reason, the net value of 8 billion lives for however many years we have left). Unless we can delay AGI indefinitely (which at this point seems fanciful), at some point we’re going to have to face the problem.
I strongly disagree-voted (but upvoted). Even if there is nothing we can do to make AI safer, there is value to delaying AGI by even a few days: good things remain good even if they last a finite time. Of course, if P(AI not controllable) is low enough the ongoing deaths matter more.
Right. Perhaps I should have used a different phrasing.
The probability that 1.6% of the world’s population dies for every year you delay is very, very certain. Almost 1.0. (it’s not quite that high because there is a chance of further progress, maybe with the air of current or near future narrow AI, at slowing down aging mechanisms)
P(doom) is highly uncertain. We can talk about plausible AGI builds starting with demonstrated technology, and most of those designs won’t cause doom. It’s the ones some number of generations after that that might.
Note also the other than rampant AI, the kind of reliable AGI you could build by extending current techniques in a straightforward way would have another major issue. It would essentially be a general form of existing agents : you give them a task, over a limited time session they attempt the task, shutting down if the environment state reaches an area not in the training simulator, and after the task is complete any local variables are wiped.
This design is safe and stable. But..it’s very, very, very abusable. Specific humans—whoever has the login credentials to set the tasks, and whoever their boss is—would have more effective power than at any point in history, and the delta wouldn’t be small. Dense factories that can outproduce all of China in one single sprawling complex, and all of it divertable to weapons, that sort of thing.
P(doom) is highly uncertain. We can talk about plausible AGI builds starting with demonstrated technology, and most of those designs won’t cause doom. It’s the ones some number of generations after that that might.
Yes, but the current AGI builds might be powerful enough to take the decision whether or not to build more advanced AGI’s out of human hands.
Give a mechanism. How would they do that. Current AGI builds would be machines that, when given a descriptor of a task, perform extremely well on it, across a large set of tasks.
This means in the real world, ‘fill out my tax form’ or “drive this robot to clean these tables” should be tasks the AGI will be able to complete, and it should be human level or better generally.
Such a system has no training on “take over my data center” and it wasn’t given as a task, and the task of “fighting to take over the data center” is outside the input space of tasks the machine was trained on, so it causes shutdown. How does it overcome this and why?
It has no global heuristic, once it finishes “fill out my tax form” the session ends and local variables are cleared. So there is no benefit it gets from a takeover, no ‘reward’ it is seeking. This is how the current LLMs work.
are you thinking about sub-human-level of AGIs? the standard definition of AGI involves it being it better than most humans in most of the tasks humans can do
the first human hackers were not trained on “take over my data center” either, but humans can behave out of distribution and so will the AGI that is better than humans at behaving out of distribution
the argument about AIs that generalize to many tasks but are not “actually dangerous yet” is about speeding up creation of the actually dangerous AGIs, and it’s the speeding up that is dangerous, not that AI Safety researchers believe that those “weak AGIs” created from large LLMs would actually be capable of killing everyone immediatelly on their own
if you believe “weak AGIs” won’t speed creation of “dangerous AGIs”, can you spell out why, please?
The above approach is similar to Gato and now Palm-E. I would define it as :
Subhuman AGI. General purpose machine that does not have the breadth and depth of the average human. Gato and Palm E are examples. At a minimum it must have vision, ability to read instructions, output text, and robotics control. (Audio or smell/taste I don’t think are necessary for a task performing AGI, though audio is easy and often supported)
AGI. Has the breadth/depth of the average human
ASI : soundly bets humans in MOST tasks. Or “low superintelligence”. It still has gaps and is throttled by (architecture, data, compute, or robotics access)
Post Singularity ASI: “high superintelligence”. Throttled by the laws of physics.
Note that for 3 and 4 I see no need to impose irrelevant goalposts. The machine needs the cognitive breadth and depth of a human or +++ at real world tool use, innovation and communication. It needs not be able to “actually” feel emotion or have a self modifying architecture for the 3 case. As a consequence there will remain tasks humans are better at, they just won’t be ones with measurable objectives.
I believe we can safely and fairly easily reach “low superintelligence” using variations on current approaches. (“easily” meaning straightforward engineering over several years and 100+ billion USD)
Thanks for sharing your point of view. I tried to give myself a few days, but I’m aftraid I still don’t understand where you see the magic barrier for the transition from 3 to 4 to happen outside of the realm of human control.
3 says the reason right there. Compute, data, or robotics/money.
What are you not able to understand with a few days of thought?
There is extremely strong evidence that compute is the limit right now. This is trivially correct : the current llms architectures are very similar to prior working attempts for the simple reason that one “try” to train to scale costs millions of dollars in compute. (And getting more money saturates, there is a finite number of training accelerators manufactured per quarter and it takes time to ramp to higher volumes)
To find something better, a hard superintelligence only capped by physics obviously requires many tries at exploring the possibility space. (Even intelligent search algorithms need many function evaluations)
yes, it takes millions to advance, but companies are pouring BILLIONS into this and number 3 can earn its own money and create its own companies/DAOs/some new networks of cooperation if it wanted without humans realizing … have you seen any GDP per year charts whatsoever, why would you think we are anywhere close to saturation of money? have you seen any emergent capabilities from LLMs in the last year, why do you think we are anywhere close to saturation of capabilities per million of dollars? Alpaca-like improvemnts are somehow one-off miracle and things are not getting cheaper and better and more efficient in the future somehow?
it could totally happen, but what I don’t see is why are you so sure it will happen by default, are you extrapolating some trend from non-public data or just overly optimistic that 1+1 from previous trends is less than 2 in the future, totally unlike the compount effects in AI advancement in the last year?
Because we are saturated right now and I gave evidence and you can read the gpt-4 paper for more evidence. See:
“getting more money saturates, there is a finite number of training accelerators manufactured per quarter and it takes time to ramp to higher volume”
“Billions” cannot buy more accelerators than exist, and the robot/compute/capabilities limits also limit the ROI that can be provided, which makes the billions not infinite as eventually investors get impatient.
What this means is that it may take 20 years or more of steady exponential growth (but only 10-50 percent annually) to reach ASI and self replicating factories and so on.
On a cosmic timescale or even a human lifespan this is extremely fast. I am noting this is more likely than “overnight” scenarios where someone tweaks a config file, an AI reaches high superintelligence and fills the earth with grey goo in days. There was not enough data in existence for the AI to reach high superintelligence, a “high” superintelligence would require thousands or millions of times as much training compute as GPT-4 (because it’s a power law), even once it’s trained it doesn’t have sufficient robotics to bootstrap to nanoforges without years or decades of steady ramping to be ready to do that.
(a high superintelligence is a machine that is not just a reasonable amount better than humans at all tasks but is essentially a deity outputting perfect moves on every task that take into account all of the machines plans and cross task and cross session knowledge.
So it might communicate with a lobbyist and 1e6 people at once and use information from all conversations in all conversations, essentially manipulating the world like a game of pool. Something genuinely uncontainable.)
This argument is not sensitive to the actual numerical value of P(AI not controllable). If this probability was low, then certainly delaying AGI would be a horrible idea for all the reasons you mentioned, yet as the numerical value increases, we get to a tipping point where delaying vs not delaying are equally costly, and beyond that we get into “definitive delay” territory. The right thing to do depends entirely and critically on P(AI not controllable), just saying “cost is certain, reward is not” is not the right way to go about it. Pandemic preparedness pre-2019 would have had certain costs while the rewards were highly uncertain, but we still should have done it, because the specific values of those uncertain rewards made the calculation obvious.
Delaying AI doesn’t make any sense unless the extra time gives us a better shot at solving the problem. If it’s just “there is nothing we can do to make AI safer”, then there is no reason to postpone the inevitable (or at least, very little reason, the net value of 8 billion lives for however many years we have left). Unless we can delay AGI indefinitely (which at this point seems fanciful), at some point we’re going to have to face the problem.
I strongly disagree-voted (but upvoted). Even if there is nothing we can do to make AI safer, there is value to delaying AGI by even a few days: good things remain good even if they last a finite time. Of course, if P(AI not controllable) is low enough the ongoing deaths matter more.
Right. Perhaps I should have used a different phrasing.
The probability that 1.6% of the world’s population dies for every year you delay is very, very certain. Almost 1.0. (it’s not quite that high because there is a chance of further progress, maybe with the air of current or near future narrow AI, at slowing down aging mechanisms)
P(doom) is highly uncertain. We can talk about plausible AGI builds starting with demonstrated technology, and most of those designs won’t cause doom. It’s the ones some number of generations after that that might.
Note also the other than rampant AI, the kind of reliable AGI you could build by extending current techniques in a straightforward way would have another major issue. It would essentially be a general form of existing agents : you give them a task, over a limited time session they attempt the task, shutting down if the environment state reaches an area not in the training simulator, and after the task is complete any local variables are wiped.
This design is safe and stable. But..it’s very, very, very abusable. Specific humans—whoever has the login credentials to set the tasks, and whoever their boss is—would have more effective power than at any point in history, and the delta wouldn’t be small. Dense factories that can outproduce all of China in one single sprawling complex, and all of it divertable to weapons, that sort of thing.
Yes, but the current AGI builds might be powerful enough to take the decision whether or not to build more advanced AGI’s out of human hands.
Give a mechanism. How would they do that. Current AGI builds would be machines that, when given a descriptor of a task, perform extremely well on it, across a large set of tasks.
This means in the real world, ‘fill out my tax form’ or “drive this robot to clean these tables” should be tasks the AGI will be able to complete, and it should be human level or better generally.
Such a system has no training on “take over my data center” and it wasn’t given as a task, and the task of “fighting to take over the data center” is outside the input space of tasks the machine was trained on, so it causes shutdown. How does it overcome this and why?
It has no global heuristic, once it finishes “fill out my tax form” the session ends and local variables are cleared. So there is no benefit it gets from a takeover, no ‘reward’ it is seeking. This is how the current LLMs work.
are you thinking about sub-human-level of AGIs? the standard definition of AGI involves it being it better than most humans in most of the tasks humans can do
the first human hackers were not trained on “take over my data center” either, but humans can behave out of distribution and so will the AGI that is better than humans at behaving out of distribution
the argument about AIs that generalize to many tasks but are not “actually dangerous yet” is about speeding up creation of the actually dangerous AGIs, and it’s the speeding up that is dangerous, not that AI Safety researchers believe that those “weak AGIs” created from large LLMs would actually be capable of killing everyone immediatelly on their own
if you believe “weak AGIs” won’t speed creation of “dangerous AGIs”, can you spell out why, please?
The above approach is similar to Gato and now Palm-E. I would define it as :
Subhuman AGI. General purpose machine that does not have the breadth and depth of the average human. Gato and Palm E are examples. At a minimum it must have vision, ability to read instructions, output text, and robotics control. (Audio or smell/taste I don’t think are necessary for a task performing AGI, though audio is easy and often supported)
AGI. Has the breadth/depth of the average human
ASI : soundly bets humans in MOST tasks. Or “low superintelligence”. It still has gaps and is throttled by (architecture, data, compute, or robotics access)
Post Singularity ASI: “high superintelligence”. Throttled by the laws of physics.
Note that for 3 and 4 I see no need to impose irrelevant goalposts. The machine needs the cognitive breadth and depth of a human or +++ at real world tool use, innovation and communication. It needs not be able to “actually” feel emotion or have a self modifying architecture for the 3 case. As a consequence there will remain tasks humans are better at, they just won’t be ones with measurable objectives.
I believe we can safely and fairly easily reach “low superintelligence” using variations on current approaches. (“easily” meaning straightforward engineering over several years and 100+ billion USD)
Thanks for sharing your point of view. I tried to give myself a few days, but I’m aftraid I still don’t understand where you see the magic barrier for the transition from 3 to 4 to happen outside of the realm of human control.
3 says the reason right there. Compute, data, or robotics/money.
What are you not able to understand with a few days of thought?
There is extremely strong evidence that compute is the limit right now. This is trivially correct : the current llms architectures are very similar to prior working attempts for the simple reason that one “try” to train to scale costs millions of dollars in compute. (And getting more money saturates, there is a finite number of training accelerators manufactured per quarter and it takes time to ramp to higher volumes)
To find something better, a hard superintelligence only capped by physics obviously requires many tries at exploring the possibility space. (Even intelligent search algorithms need many function evaluations)
yes, it takes millions to advance, but companies are pouring BILLIONS into this and number 3 can earn its own money and create its own companies/DAOs/some new networks of cooperation if it wanted without humans realizing … have you seen any GDP per year charts whatsoever, why would you think we are anywhere close to saturation of money? have you seen any emergent capabilities from LLMs in the last year, why do you think we are anywhere close to saturation of capabilities per million of dollars? Alpaca-like improvemnts are somehow one-off miracle and things are not getting cheaper and better and more efficient in the future somehow?
it could totally happen, but what I don’t see is why are you so sure it will happen by default, are you extrapolating some trend from non-public data or just overly optimistic that 1+1 from previous trends is less than 2 in the future, totally unlike the compount effects in AI advancement in the last year?
Because we are saturated right now and I gave evidence and you can read the gpt-4 paper for more evidence. See:
“getting more money saturates, there is a finite number of training accelerators manufactured per quarter and it takes time to ramp to higher volume”
“Billions” cannot buy more accelerators than exist, and the robot/compute/capabilities limits also limit the ROI that can be provided, which makes the billions not infinite as eventually investors get impatient.
What this means is that it may take 20 years or more of steady exponential growth (but only 10-50 percent annually) to reach ASI and self replicating factories and so on.
On a cosmic timescale or even a human lifespan this is extremely fast. I am noting this is more likely than “overnight” scenarios where someone tweaks a config file, an AI reaches high superintelligence and fills the earth with grey goo in days. There was not enough data in existence for the AI to reach high superintelligence, a “high” superintelligence would require thousands or millions of times as much training compute as GPT-4 (because it’s a power law), even once it’s trained it doesn’t have sufficient robotics to bootstrap to nanoforges without years or decades of steady ramping to be ready to do that.
(a high superintelligence is a machine that is not just a reasonable amount better than humans at all tasks but is essentially a deity outputting perfect moves on every task that take into account all of the machines plans and cross task and cross session knowledge.
So it might communicate with a lobbyist and 1e6 people at once and use information from all conversations in all conversations, essentially manipulating the world like a game of pool. Something genuinely uncontainable.)