Since Divia said, and Eliezer retweeted, that good things might happen if people give their honest, detailed reactions:
My honest, non-detailed reaction is AAAAAAH. In more detail -
Yup, this seems right.
This is technobabble to me, since I don’t actually understand nanomachines, but it makes me rather more optimistic about my death being painless than my most likely theory, which is that a superhuman AI takes over first and has better uses for our atoms later.
(If we had unlimited retries—if every time an AGI destroyed all the galaxies we got to go back in time four years and try again—we would in a hundred years figure out which bright ideas actually worked.) My brain immediately starts looking for ways to set up some kind of fast testing for ways to do this in a closed, limited world without letting it know ours exists… which is already answered below, under 10. Yup, doomed.
And then we all died.
Yup.
I imagine it would be theoretically—but not practically—possible to fire off a spaceship accelerating fast enough (that is, with enough lead time) that it could outrun the AI and so escape an Earth about to be eaten by an AI (a pivotal act well short of melting all CPUs that would save at least a part of humanity), but that given that the AI could probably take over the ship just by flashing lights at it, that seems unlikely to actually work in practice.
I think the closest thing I get to a “pivotal weak act” would be persuading everyone to halt all AI research with a GPT-5 that can be superhumanly persuasive at writing arguments to persuade humans, but doesn’t yet have a model of the world-as-real-and-affecting-it that it could use to realize that it could achieve its goals by taking over the world, but I don’t actually expect this would work—that would be a very narrow belt of competence and I’m skeptical it could be achieved.
Not qualified to comment.
Seems right.
Yeah, we’re doomed.
Doomed.
Seems right to me. If the AI never tries a plan because it correctly knows it won’t work, this doesn’t tell you anything about the AI not trying a plan when it would work.
“It’s not that we can’t roll one twenty, it’s that we’ll roll a one eventually.” I don’t think humanity has successfully overcome this genre of problem, and we encounter it a lot. (In practice, our solutions are fail-safe systems, requiring multiple humans to concur to do anything, and removing these problems from people’s environments, none of which really work in context.)
Doomed.
Yup, doomed.
I’d also add a lot of “we have lots of experience with bosses trying to make their underlings serve them instead of serving themselves and none of them really work”, as more very weak evidence in the same direction.
Doomed.
Doomed.
Not qualified to discuss this.
We are really very doomed, aren’t we.
This seems very logical and probably correct, both about the high-level points Eliezer makes and the history of human alignment with other humans.
Seems valid.
Not qualified to comment.
You know, I’d take something that was imperfectly aligned with my Real Actual Values as long as it gave me enough Space Heroin, if the alternative was death. I’d rather the thing aligned with my Real Actual Values, but if we can’t manage that, Space Heroin seems better than nothing. (Also, yup, doomed.)
Not qualified to comment.
This seems valid but I don’t know enough about current AI to comment.
Good point!
Yup.
Yup.
Yup.
Yup.
Good point.
Doomed.
We do seem doomed, yup.
Doomed.
Indeed, humans already work this way!
This is a good point about social dynamics but does not immediately make me go ‘we’re all doomed’, I think because social dynamics seem potentially contingent.
You’re the expert and I’m not; I don’t know the field well enough to comment.
No comment.
No comment; this seems plausible but I don’t know enough to say.
This is another reply in this vein, I’m quite new to this so don’t feel obliged to read through. I just told myself I will publish this.
I agree (90-99% agreement) with almost all of the points Eliezer made. And the rest is where I probably didn’t understand enough or where there’s no need for a comment, e.g.:
1. − 8. agree
9. Not sure if I understand it right—if the AGI has been successfully designed not to kill everyone then why need oversight? If it is capable to do so and the design fails then on the other hand what would our oversight do? I don’t think this is like the nuclear cores. Feels like it’s a bomb you are pretty sure won’t go off at random but if it does your oversight won’t stop it.
10. − 14. - agree
15. - I feel like I need to think about it more to honestly agree.
16. − 18. - agree
19. - to my knowledge, yes
20. − 23. - agree
24. - initially I put “80% agree” to the first part of the argument here (that
The complexity of what needs to be aligned or meta-aligned for our Real Actual Values is far out of reach for our FIRST TRY at AGI
but then discussing it with my reading group I reiterated this few times and begun to agree even more grasping the complexity of something like CEV.
25. − 29. - agree
30. - agree, although wasn’t sure about
an AI whose action sequence you can fully understand all the effects of, before it executes, is much weaker than humans in that domain
I think that the key part of this claim is “all the effects of” and I wasn’t sure whether we have to understand all, but of course we have to be sure one of the effects is not human extintion then yes, so for “solving alignment” also yes.
31. − 34. - agree
35. - no comment, I have to come back to this once I graps LDT better
36. - agree
37. - no comment, seems like a rant 😅
38. - agree
39. - ok, I guess
40. - agree, I’m glad some people want to experiment with the financing of research re 40.
41. - agree , although I agree with some of the top comments on this, e.g. evhub’s
Regarding 9: I believe it’s when you are successful enough that your AGI doesn’t instantly kill you immediately but it still can kill you in the process of using it. It’s in the context of a pivotal act, so it assumes you will operate it to do something significant and potentially dangerous.
Since Divia said, and Eliezer retweeted, that good things might happen if people give their honest, detailed reactions:
My honest, non-detailed reaction is AAAAAAH. In more detail -
Yup, this seems right.
This is technobabble to me, since I don’t actually understand nanomachines, but it makes me rather more optimistic about my death being painless than my most likely theory, which is that a superhuman AI takes over first and has better uses for our atoms later.
(If we had unlimited retries—if every time an AGI destroyed all the galaxies we got to go back in time four years and try again—we would in a hundred years figure out which bright ideas actually worked.) My brain immediately starts looking for ways to set up some kind of fast testing for ways to do this in a closed, limited world without letting it know ours exists… which is already answered below, under 10. Yup, doomed.
And then we all died.
Yup.
I imagine it would be theoretically—but not practically—possible to fire off a spaceship accelerating fast enough (that is, with enough lead time) that it could outrun the AI and so escape an Earth about to be eaten by an AI (a pivotal act well short of melting all CPUs that would save at least a part of humanity), but that given that the AI could probably take over the ship just by flashing lights at it, that seems unlikely to actually work in practice.
I think the closest thing I get to a “pivotal weak act” would be persuading everyone to halt all AI research with a GPT-5 that can be superhumanly persuasive at writing arguments to persuade humans, but doesn’t yet have a model of the world-as-real-and-affecting-it that it could use to realize that it could achieve its goals by taking over the world, but I don’t actually expect this would work—that would be a very narrow belt of competence and I’m skeptical it could be achieved.
Not qualified to comment.
Seems right.
Yeah, we’re doomed.
Doomed.
Seems right to me. If the AI never tries a plan because it correctly knows it won’t work, this doesn’t tell you anything about the AI not trying a plan when it would work.
“It’s not that we can’t roll one twenty, it’s that we’ll roll a one eventually.” I don’t think humanity has successfully overcome this genre of problem, and we encounter it a lot. (In practice, our solutions are fail-safe systems, requiring multiple humans to concur to do anything, and removing these problems from people’s environments, none of which really work in context.)
Doomed.
Yup, doomed.
I’d also add a lot of “we have lots of experience with bosses trying to make their underlings serve them instead of serving themselves and none of them really work”, as more very weak evidence in the same direction.
Doomed.
Doomed.
Not qualified to discuss this.
We are really very doomed, aren’t we.
This seems very logical and probably correct, both about the high-level points Eliezer makes and the history of human alignment with other humans.
Seems valid.
Not qualified to comment.
You know, I’d take something that was imperfectly aligned with my Real Actual Values as long as it gave me enough Space Heroin, if the alternative was death. I’d rather the thing aligned with my Real Actual Values, but if we can’t manage that, Space Heroin seems better than nothing. (Also, yup, doomed.)
Not qualified to comment.
This seems valid but I don’t know enough about current AI to comment.
Good point!
Yup.
Yup.
Yup.
Yup.
Good point.
Doomed.
We do seem doomed, yup.
Doomed.
Indeed, humans already work this way!
This is a good point about social dynamics but does not immediately make me go ‘we’re all doomed’, I think because social dynamics seem potentially contingent.
You’re the expert and I’m not; I don’t know the field well enough to comment.
No comment.
No comment; this seems plausible but I don’t know enough to say.
No comment.
No comment.
No comment.
[Deleted.]
This is another reply in this vein, I’m quite new to this so don’t feel obliged to read through. I just told myself I will publish this.
I agree (90-99% agreement) with almost all of the points Eliezer made. And the rest is where I probably didn’t understand enough or where there’s no need for a comment, e.g.:
1. − 8. agree
9. Not sure if I understand it right—if the AGI has been successfully designed not to kill everyone then why need oversight? If it is capable to do so and the design fails then on the other hand what would our oversight do? I don’t think this is like the nuclear cores. Feels like it’s a bomb you are pretty sure won’t go off at random but if it does your oversight won’t stop it.
10. − 14. - agree
15. - I feel like I need to think about it more to honestly agree.
16. − 18. - agree
19. - to my knowledge, yes
20. − 23. - agree
24. - initially I put “80% agree” to the first part of the argument here (that
but then discussing it with my reading group I reiterated this few times and begun to agree even more grasping the complexity of something like CEV.
25. − 29. - agree
30. - agree, although wasn’t sure about
I think that the key part of this claim is “all the effects of” and I wasn’t sure whether we have to understand all, but of course we have to be sure one of the effects is not human extintion then yes, so for “solving alignment” also yes.
31. − 34. - agree
35. - no comment, I have to come back to this once I graps LDT better
36. - agree
37. - no comment, seems like a rant 😅
38. - agree
39. - ok, I guess
40. - agree, I’m glad some people want to experiment with the financing of research re 40.
41. - agree , although I agree with some of the top comments on this, e.g. evhub’s
42. - agree
43. - agree, at least this is what it feels like
Regarding 9: I believe it’s when you are successful enough that your AGI doesn’t instantly kill you immediately but it still can kill you in the process of using it. It’s in the context of a pivotal act, so it assumes you will operate it to do something significant and potentially dangerous.