I suspect that thinking about the AI x-risk would benefit from stopping using the term “pivotal act” even more than from using it as defined.
1. It introduces an artificial and confusing discontinuity in the space of actions 2. It nudges people to come up with heroic actions. Heroic changes are mostly not the way you improve safety of complex systems
My impression is it’s mostly a wrong-way reduction. (per Chapman: “when you have a problem that is nebulous—complicated, messy, and ambiguous. A wrong-way reduction claims to replace that with a simple, tidy, clear-cut problem. What’s wrong is that the new problem is harder than your original one …”)
Thinking about how to reduce AI related x-risk fits exactly—it’s complicated, nebulous and ambigous. Pivotal act helps with tidy definition. It’s appealing: instead of dealing with the ambiguity and staring into the whirling ocean of complexity, you can brainstorm specific stories (“nanobots which will destroy GPUs”), you have something clear, and crisp.
The problem is in my estimate the “reduced problem” is actually harder. For example, here is challenge: name pivotal acts (or maybe “pivotal events”) which happened in past, let’s say, 3 billion years—if any such acts happened? (If you want to actually do this and write it down, I would suggest blacking out your reply.)
I think the argument for thinking in terms of pivotal acts is quite strong. AGI destroys the world by default; a variety of deliberate actions can stop this, but there isn’t a smooth and continuous process that rolls us from 2022 to an awesome future, with no major actions or events deliberately occurring to set us on that trajectory.
By contrast, the counter-arguments here seem weak to me, and overly insensitive to the specifics of the actual situation. “Complex situations don’t get resolved via phase transitions” and “heroism never makes a big difference in real life” are extremely general objections; if there was a particular situation humanity found itself in that was the exception to this rule, the heuristic would provide no guidance that this is an exception. You just die if you over-rely on the heuristic.
“There hasn’t been a past pivotal act” likewise seems like a very weak argument to me. AGI is a world-historical novelty, with its own causal dynamics. There’s some weak similarity to the advent of human intelligence, but mostly, it’s just a new event. There is no natural force pushing AGI toward being low-impact, just because non-AGI processes were low-impact. There is no natural force pushing for AGI to have better outcomes if no one ever does anything about it (no “heroes”), just because heroes are rare historically. The causal dynamics of AGI, and of the world in relation to AGI, are a product of the specifics of the situation (e.g., facts about CS and about how engineers work on ML), not a prophecy or echo of causally dissimilar past events.
At the level of abstraction “complex event”, sure, complicated stuff is often continuous in various ways. But switching topics to “complex event” means fuzzing out all the details about the actual event we’re talking about. It’s throwing away nearly all information we have, and hoping that this one bit (“complex: y or n?”) carries the day. I think fuzzing out details can be a neat exercise for doing Original Seeing at the problem, but I wouldn’t put my weight down on that style of reasoning.
The problem is in my estimate the “reduced problem” is actually harder.
I don’t think it’s harder; i.e., I don’t think a significant fraction of our hope rests on long-term processes that trend in good directions but include no important positive “events” or phase changes. (And, more strongly, I think humanity will in fact die if we don’t leverage some novel tech to prevent the proliferation of AGI systems.)
I think the hardness is just easier to see and easier to emotionally appreciate, exactly because you’re getting concrete about sequences of events in the world.
Rushing to premature concreteness is indeed an error. But I think EA thus far has mostly made the opposite error, refusing to go concrete and thereby avoiding the pressure and constraint of having to actually plan, face tradeoffs, entertain unpleasant realities, etc. If you stay in vagueness indefinitely, things may feel more optimistic, but I flatly doubt that this vague feeling, with little associated scenario analysis or chains of reasoning, is grounded in reality.
[it was easier to draw some things vs. write them]
AGI destroys the world by default; a variety of deliberate actions can stop this, but there isn’t a smooth and continuous process that rolls us from 2022 to an awesome future, with no major actions or events deliberately occurring to set us on that trajectory.
This seems to conflate multiple claims. Consider the whole trajectory.
“AGI destroys the world by default”—seem clear, I interpret it is as “if you straightforwardly extrapolate past trajectory, we end in catastrophe””
It’s less clear to me what the rest means.
Option a) “trajectories like in the picture bellow do not exist”
(note the turn is smooth)
This seems very strong a claim to me, and highly implausible. Still, if I understand correctly, this is what you put most weight on.
Option b) “trajectories like in a) exist, but it won’t be our trajectory, without significant deliberate efforts”
This seems plausible, although the word “deliberate” introduces some ambiguity.
One way to think about this is in terms of “steering forces” and incentive gradients. In my view it is more likely than not that with increasing power of the systems, parts of “alignment” will become more of a convergent goal for developers (e.g. because aligned systems get better performance, or alignment tools and theory helps you with designing more competitive systems). I’m not sure if you would count that as “deliberate” (my guess: no). (Just to be sure: this isn’t to claim that this pull is sufficient for safety.)
In my view the the steering forces can become sufficiently strong without any legible “major event”. In particular without any event legible as important when it is happening. (As far as I understand, you would strongly disagree)
In contrast, pivotal act would look more like this:
I don’t think this is necessary or even common feature of winning trajectories.
“Complex situations don’t get resolved via phase transitions” and “heroism never makes a big difference in real life” are extremely general objections.
Sorry but this reads like a strawman of my position. “Heroic changes are mostly not the way you improve safety of complex systems.” is a very different claim to “heroism never makes a big difference in real life”.
To convey the intuition, consider the case of a nuclear power plant. How do you make something like that safe? Basically, not by one strong intervention on one link in a causal graph, but by intervening at a large fraction of the causal graph, and by adding layered defense, preventing failures from propagating.
Heroic acts obviously can make a big difference. In the case of the nuclear power plant, some scenarios could be saved by a team of heroic firefighters who will provide emergency cooling. Or, clearly, a Chernobyl disaster would have been prevented if a SWAT team landed in the control room, shot everyone, and stopped the plant in a safe way.
My claim isn’t that this never works. The only claim is that the majority of bits of safety originates from a different types of intervention (And I do think this is also true for AI safety.)
There is no natural force…
As is probably clear, I like the forces framing. Note that it feels quite different from the “pivotal acts” framing.
I don’t care that much whether the forces are natural or not, but whether they exist. Actually I do think one of the more useful things to do about AI safety is— think about directions in which you want movement - think about “types” of forces which may pull in that direction (where “type” could be e.g. profit incentives from market, cultural incentives, or instrumental technological usefulness) -think about what sort of a system is able to exert such force (where the type could be e.g. individual engineer, a culture-based superagent, or even useful math theory) - this 3d space gives you a lot of combinations. Compare, choose and execute
At the level of abstraction “complex event”, sure, complicated stuff is often continuous in various ways. …
This isn’t what I mean. I don’t advocate for people to throw out all the details. I mostly advocate for people to project the very high-dimensional real world situation into low-dimensional representations which are continuous, as opposed to categorical.
Moreover, you (and Eliezer, and others) have a strong tendency to discretize the projections in an iterative way. Let’s say you start with “pivotal acts”. In the next step, you discretize the “power of system” dimension: “strong systems” are capable of pivotal acts, “weak systems” are not. In the next step, you use this to discretize a bunch of other dimensions—e.g. weak interpretability tools help with weak systems, but not with strong systems. And so on. The endpoint are just a few actually continuous dimensions, and a longer list of discrete labels.
To be clear: I’m very much in favour of someone trying this.(I expect this to fail, at least for now.)
But I’m also very much in favour of many people trying to not do this, and focusing more on trying different projection. Or looking for steepest local gradient descend updates from the point where we are now.
But I think EA thus far has mostly made the opposite error, refusing to go concrete and thereby avoiding the pressure and constraint of having to actually plan, face tradeoffs, entertain unpleasant realities, etc. (...)
Sorry but I’m confused how the EA label landed here and I’m a bit worried it has some properties of a red herring. I don’t know if the “you” is directed at me, “EA” (whatever it is), or readers of our conversation
I think the diagram could be better drawn with at least one axis with a scale like “potential AI cognitive capability”.
At the bottom, in the big white zone, everything is safe and nothing is amazing.
Further up the page, some big faint green “applications of AI” patches appear in which things start to be nicer in some ways. There are also some big faint red patches, many of which overlap the green, where misapplication of AI makes things worse in some ways.
As you go up the page, both the red and green regions intensify, and some of the deeper green regions dead-end into black representing paths that can no longer be averted from extinction or other uncorrectable bad futures. Some big patches of black start to appear straight in front of white or pale green, representing humanity holding off from implementing AGI until they thought alignment was solved, but it went wrong before any benefits could appear.
By the time you reach the top of the page, it is almost all black. There are a few tiny spots of intense green, connected only by thin, zig-zag threads that are mostly white to lower parts of the page. Even at the top of the page, we don’t know which of those brilliant green points might actually lead to dead-ends into black further up.
That’s roughly how I see the alignment landscape: that steering to those brilliant green specks will mostly require avoiding implementing AGI.
Notably, the case for certain doom as proposed by Rob Bensinger et al relies on 3 assumptions that needs to be tested: A: The Singularity/AI Foom scenarios are likely. I have my problems with this assumption, but I will accept it to show why that doesn’t lead to certain doom.
The next assumption is B: That AI will all have the same goals and that these goals all lead to destroying humanity. Now this is a case where I see factions forming on this question, and naively I expect a bell curve of opinions on this question, as well as many different opinions. I don’t expect coordination of all AIs to destroy humanity not because of fundamental incapability, but because I don’t expect unification of opinions here.
And finally, this rests on assumption C: that humanity and it’s descendants are narrowly defined. The nice thing about AI Foom scenarios is that while I don’t expect instant technology to come online, it also makes transhumanism far easier than otherwise, quickly closing the gap. That doesn’t mean it’s all sunny and rainbows, but we are spared certain doom by this.
That AI will all have the same goals and that these goals all lead to destroying humanity.
Nope. I think AI can have any goal; by default its goal will be ‘random’; and most random goals destroy humanity. See Bostrom’s “The Superintelligent Will” for a description of my view on this.
I don’t expect coordination of all AIs to destroy humanity
I don’t know what you mean by “coordination of all AIs” here, or why you think it’s relevant.
but because I don’t expect unification of opinions here
Unification of whose opinions, about what? Are you saying you don’t expect all possible AIs to have the same “opinions”? I think more precise language would be better here; “opinion” is a very vague word.
that humanity and it’s descendants are narrowly defined.
Again, nope! I’m a transhumanist who wants to usher in an awesome posthuman future. I would consider it a massive existential catastrophe to lock in humanity’s current, flawed understanding of The Good.
I actually don’t necessarily disagree with this. (I’m generally pretty confused about how to think about pivotal acts, and AI strategy generally)
But, insofar as one doesn’t think they should use the pivotal-act frame, I think the solution is to just not use it, rather than water-down the word.
(I think an important thing that the pivotal act frame is getting at is that somehow you actually need to exist the Acute Risk Period. There are a lot of vague plans that sounds sorta helpful but don’t actually add up to “we have left the acute risk period”, and many of those vague plans won’t work even if you stack them all up together. I think it is plausible you don’t need the all-or-nothing implication of the Pivotal frame, but there is something important about plans that could possibly work, or be part of a constellation of plans that could possibly-work-together.)
To be clear— I don’t disagree with the original post—just wanted to suggest not using the term as an option - I do agree there is value in asking, in my paraphrase, “what’s the implied safe end here” - I mostly don’t agree with assumption behind the term that many small changes generally don’t add up
I suspect that thinking about the AI x-risk would benefit from stopping using the term “pivotal act” even more than from using it as defined.
1. It introduces an artificial and confusing discontinuity in the space of actions
2. It nudges people to come up with heroic actions. Heroic changes are mostly not the way you improve safety of complex systems
My impression is it’s mostly a wrong-way reduction. (per Chapman: “when you have a problem that is nebulous—complicated, messy, and ambiguous. A wrong-way reduction claims to replace that with a simple, tidy, clear-cut problem. What’s wrong is that the new problem is harder than your original one …”)
Thinking about how to reduce AI related x-risk fits exactly—it’s complicated, nebulous and ambigous.
Pivotal act helps with tidy definition. It’s appealing: instead of dealing with the ambiguity and staring into the whirling ocean of complexity, you can brainstorm specific stories (“nanobots which will destroy GPUs”), you have something clear, and crisp.
The problem is in my estimate the “reduced problem” is actually harder. For example, here is challenge: name pivotal acts (or maybe “pivotal events”) which happened in past, let’s say, 3 billion years—if any such acts happened? (If you want to actually do this and write it down, I would suggest blacking out your reply.)
I think the argument for thinking in terms of pivotal acts is quite strong. AGI destroys the world by default; a variety of deliberate actions can stop this, but there isn’t a smooth and continuous process that rolls us from 2022 to an awesome future, with no major actions or events deliberately occurring to set us on that trajectory.
By contrast, the counter-arguments here seem weak to me, and overly insensitive to the specifics of the actual situation. “Complex situations don’t get resolved via phase transitions” and “heroism never makes a big difference in real life” are extremely general objections; if there was a particular situation humanity found itself in that was the exception to this rule, the heuristic would provide no guidance that this is an exception. You just die if you over-rely on the heuristic.
“There hasn’t been a past pivotal act” likewise seems like a very weak argument to me. AGI is a world-historical novelty, with its own causal dynamics. There’s some weak similarity to the advent of human intelligence, but mostly, it’s just a new event. There is no natural force pushing AGI toward being low-impact, just because non-AGI processes were low-impact. There is no natural force pushing for AGI to have better outcomes if no one ever does anything about it (no “heroes”), just because heroes are rare historically. The causal dynamics of AGI, and of the world in relation to AGI, are a product of the specifics of the situation (e.g., facts about CS and about how engineers work on ML), not a prophecy or echo of causally dissimilar past events.
At the level of abstraction “complex event”, sure, complicated stuff is often continuous in various ways. But switching topics to “complex event” means fuzzing out all the details about the actual event we’re talking about. It’s throwing away nearly all information we have, and hoping that this one bit (“complex: y or n?”) carries the day. I think fuzzing out details can be a neat exercise for doing Original Seeing at the problem, but I wouldn’t put my weight down on that style of reasoning.
I don’t think it’s harder; i.e., I don’t think a significant fraction of our hope rests on long-term processes that trend in good directions but include no important positive “events” or phase changes. (And, more strongly, I think humanity will in fact die if we don’t leverage some novel tech to prevent the proliferation of AGI systems.)
I think the hardness is just easier to see and easier to emotionally appreciate, exactly because you’re getting concrete about sequences of events in the world.
Rushing to premature concreteness is indeed an error. But I think EA thus far has mostly made the opposite error, refusing to go concrete and thereby avoiding the pressure and constraint of having to actually plan, face tradeoffs, entertain unpleasant realities, etc. If you stay in vagueness indefinitely, things may feel more optimistic, but I flatly doubt that this vague feeling, with little associated scenario analysis or chains of reasoning, is grounded in reality.
[it was easier to draw some things vs. write them]
This seems to conflate multiple claims. Consider the whole trajectory.
“AGI destroys the world by default”—seem clear, I interpret it is as “if you straightforwardly extrapolate past trajectory, we end in catastrophe””
It’s less clear to me what the rest means.
Option a) “trajectories like in the picture bellow do not exist”
(note the turn is smooth)
This seems very strong a claim to me, and highly implausible. Still, if I understand correctly, this is what you put most weight on.
Option b) “trajectories like in a) exist, but it won’t be our trajectory, without significant deliberate efforts”
This seems plausible, although the word “deliberate” introduces some ambiguity.
One way to think about this is in terms of “steering forces” and incentive gradients. In my view it is more likely than not that with increasing power of the systems, parts of “alignment” will become more of a convergent goal for developers (e.g. because aligned systems get better performance, or alignment tools and theory helps you with designing more competitive systems). I’m not sure if you would count that as “deliberate” (my guess: no). (Just to be sure: this isn’t to claim that this pull is sufficient for safety.)
In my view the the steering forces can become sufficiently strong without any legible “major event”. In particular without any event legible as important when it is happening. (As far as I understand, you would strongly disagree)
In contrast, pivotal act would look more like this:
I don’t think this is necessary or even common feature of winning trajectories.
Sorry but this reads like a strawman of my position. “Heroic changes are mostly not the way you improve safety of complex systems.” is a very different claim to “heroism never makes a big difference in real life”.
To convey the intuition, consider the case of a nuclear power plant. How do you make something like that safe? Basically, not by one strong intervention on one link in a causal graph, but by intervening at a large fraction of the causal graph, and by adding layered defense, preventing failures from propagating.
Heroic acts obviously can make a big difference. In the case of the nuclear power plant, some scenarios could be saved by a team of heroic firefighters who will provide emergency cooling. Or, clearly, a Chernobyl disaster would have been prevented if a SWAT team landed in the control room, shot everyone, and stopped the plant in a safe way.
My claim isn’t that this never works. The only claim is that the majority of bits of safety originates from a different types of intervention (And I do think this is also true for AI safety.)
As is probably clear, I like the forces framing. Note that it feels quite different from the “pivotal acts” framing.
I don’t care that much whether the forces are natural or not, but whether they exist. Actually I do think one of the more useful things to do about AI safety is—
think about directions in which you want movement
- think about “types” of forces which may pull in that direction (where “type” could be e.g. profit incentives from market, cultural incentives, or instrumental technological usefulness)
-think about what sort of a system is able to exert such force (where the type could be e.g. individual engineer, a culture-based superagent, or even useful math theory)
- this 3d space gives you a lot of combinations. Compare, choose and execute
This isn’t what I mean. I don’t advocate for people to throw out all the details. I mostly advocate for people to project the very high-dimensional real world situation into low-dimensional representations which are continuous, as opposed to categorical.
Moreover, you (and Eliezer, and others) have a strong tendency to discretize the projections in an iterative way. Let’s say you start with “pivotal acts”. In the next step, you discretize the “power of system” dimension: “strong systems” are capable of pivotal acts, “weak systems” are not. In the next step, you use this to discretize a bunch of other dimensions—e.g. weak interpretability tools help with weak systems, but not with strong systems. And so on. The endpoint are just a few actually continuous dimensions, and a longer list of discrete labels.
To be clear: I’m very much in favour of someone trying this.(I expect this to fail, at least for now.)
But I’m also very much in favour of many people trying to not do this, and focusing more on trying different projection. Or looking for steepest local gradient descend updates from the point where we are now.
Sorry but I’m confused how the EA label landed here and I’m a bit worried it has some properties of a red herring. I don’t know if the “you” is directed at me, “EA” (whatever it is), or readers of our conversation
I think the diagram could be better drawn with at least one axis with a scale like “potential AI cognitive capability”.
At the bottom, in the big white zone, everything is safe and nothing is amazing.
Further up the page, some big faint green “applications of AI” patches appear in which things start to be nicer in some ways. There are also some big faint red patches, many of which overlap the green, where misapplication of AI makes things worse in some ways.
As you go up the page, both the red and green regions intensify, and some of the deeper green regions dead-end into black representing paths that can no longer be averted from extinction or other uncorrectable bad futures. Some big patches of black start to appear straight in front of white or pale green, representing humanity holding off from implementing AGI until they thought alignment was solved, but it went wrong before any benefits could appear.
By the time you reach the top of the page, it is almost all black. There are a few tiny spots of intense green, connected only by thin, zig-zag threads that are mostly white to lower parts of the page. Even at the top of the page, we don’t know which of those brilliant green points might actually lead to dead-ends into black further up.
That’s roughly how I see the alignment landscape: that steering to those brilliant green specks will mostly require avoiding implementing AGI.
Notably, the case for certain doom as proposed by Rob Bensinger et al relies on 3 assumptions that needs to be tested: A: The Singularity/AI Foom scenarios are likely. I have my problems with this assumption, but I will accept it to show why that doesn’t lead to certain doom.
The next assumption is B: That AI will all have the same goals and that these goals all lead to destroying humanity. Now this is a case where I see factions forming on this question, and naively I expect a bell curve of opinions on this question, as well as many different opinions. I don’t expect coordination of all AIs to destroy humanity not because of fundamental incapability, but because I don’t expect unification of opinions here.
And finally, this rests on assumption C: that humanity and it’s descendants are narrowly defined. The nice thing about AI Foom scenarios is that while I don’t expect instant technology to come online, it also makes transhumanism far easier than otherwise, quickly closing the gap. That doesn’t mean it’s all sunny and rainbows, but we are spared certain doom by this.
Nope. I think AI can have any goal; by default its goal will be ‘random’; and most random goals destroy humanity. See Bostrom’s “The Superintelligent Will” for a description of my view on this.
I don’t know what you mean by “coordination of all AIs” here, or why you think it’s relevant.
Unification of whose opinions, about what? Are you saying you don’t expect all possible AIs to have the same “opinions”? I think more precise language would be better here; “opinion” is a very vague word.
Again, nope! I’m a transhumanist who wants to usher in an awesome posthuman future. I would consider it a massive existential catastrophe to lock in humanity’s current, flawed understanding of The Good.
True, I’ve got to be more specific in my wording when I talk about stuff. And I’ll read that link you’ve gave me.
I actually don’t necessarily disagree with this. (I’m generally pretty confused about how to think about pivotal acts, and AI strategy generally)
But, insofar as one doesn’t think they should use the pivotal-act frame, I think the solution is to just not use it, rather than water-down the word.
(I think an important thing that the pivotal act frame is getting at is that somehow you actually need to exist the Acute Risk Period. There are a lot of vague plans that sounds sorta helpful but don’t actually add up to “we have left the acute risk period”, and many of those vague plans won’t work even if you stack them all up together. I think it is plausible you don’t need the all-or-nothing implication of the Pivotal frame, but there is something important about plans that could possibly work, or be part of a constellation of plans that could possibly-work-together.)
To be clear—
I don’t disagree with the original post—just wanted to suggest not using the term as an option
- I do agree there is value in asking, in my paraphrase, “what’s the implied safe end here”
- I mostly don’t agree with assumption behind the term that many small changes generally don’t add up