Here is my partial honest reaction, just two points I’m somewhat dissatisfied with (not meant to be exhaustive): 2. “A cognitive system with sufficiently high cognitive powers, given any medium-bandwidth channel of causal influence, will not find it difficult to bootstrap to overpowering capabilities independent of human infrastructure.” I would like there to be an argument for this claim that doesn’t rely on nanotech, and solidly relies on actually existing amounts of compute. E.g. if the argument relies on running intractable detailed simulations of proteins, then it doesn’t count. (I’m not disagreeing with the nanotech example by the way, or saying that it relies on unrealistic amounts of compute, I’d just like to have an argument for this that is very solid and minimally reliant on speculative technology, and actually shows that it is). 6. “We need to align the performance of some large task, a ‘pivotal act’ that prevents other people from building an unaligned AGI that destroys the world.”. You name “burn all GPU’s” as an “overestimate for the rough power level of what you’d have to do”, but it seems to me that it would be too weak of a pivotal act? Assuming there isn’t some extreme change in generally held views, people would consider this an extreme act of terrorism, and shut you down, put you in jail, and then rebuild the GPU’s and go on with what they were planning to do. Moreover, now there is probably an extreme taboo on anything AI safety related. (I’m assuming here that law enforcement finds out that you were the one who did this). Maybe the idea is to burn all GPU’s indefinitely and forever (i.e. leave nanobots that continually check for GPU’s and burn them when they are created), but even this seems either insufficient or undesirable long term depending on what is counted as a GPU. Possibly I’m not getting what you mean, but it just seems completely too weak as an act.
Interventions on the order of burning all GPUs in clusters larger than 4 and preventing any new clusters from being made, including the reaction of existing political entities to that event and the many interest groups who would try to shut you down and build new GPU factories or clusters hidden from the means you’d used to burn them, would in fact really actually save the world for an extended period of time and imply a drastically different gameboard offering new hopes and options. [...]
If Iceland did this, it would plausibly need some way to (1) not have its AGI project bombed in response, and (2) be able to continue destroying GPUs in the future if new ones are built, until humanity figures out ‘what it wants to do next’. This more or less eliminates the time pressure to rush figuring out what to do next, which seems pretty crucial for good long-term outcomes. It’s a much harder problem than just ‘cause all GPUs to stop working for a year as a one-time event’, and I assume Eliezer’s focusing on nanotech it part because it’s a very general technology that can be used for tasks like those as well.
But assuming that law enforcement figures out that you did this, then puts you in jail, you wouldn’t be able to control the further use of such nanotech, i.e. there would just be a bunch of systems indefinitely destroying GPU’s, or maybe you set a timer or some conditions on it or something. I certainly see no reason why Iceland or anyone in iceland could get away with this unless those systems rely on completely unchecked nanosystems to which the US military has no response. Maybe all of this is what Eliezer means by “melt the GPU’s”, but I thought he did just mean “melt the GPU’s as a single act” (not weird that I thought this, given the phrasing “the pivotal act to melt all the GPU’s”). If this is what is meant, then it would be a strong enough pivotal act, and would be an extreme level of capability I agree.
Just wanna remind the reader that Eliezer isn’t actually proposing to do this, and I am not seriously discussing it as an option and nor was Eliezer (nor would I support it unless done legally), just thinking through a thought experiment.
But assuming that law enforcement figures out that you did this, then puts you in jail, you wouldn’t be able to control the further use of such nanotech
This would violate Eliezer’s condition “including the reaction of existing political entities to that event”. If Iceland melts all the GPUs but then the servers its AGI is running on get bombed, or its AGI researchers get kidnapped or arrested, then I assume that the attempted pivotal act failed and we’re back to square one.
(I assume this because (a) I don’t expect most worlds to be able to get their act together before GPUs proliferate again and someone destroys the world with AGI; and (b) I assume there’s little chance of Iceland recovering from losing its AGI or its AGI team.)
Ok I admit I read over it. I must say though that this makes the whole thing more involved than it sounded at fist, since it would maybe require essentially escalating a conflict with all major military powers and still coming out on top? One possible outcome of this would be that the entire global intellectual public opinion turns against you, meaning you also possibly lose access to a lot of additional humans working with you on further alignment research? I’m not sure if I’m imagining it correctly, but it seems like this plan would either require so many elements that I’m not sure if it isn’t just equivalent to solving the entire alignment problem, or otherwise it isn’t actually enough.
it seems like this plan would either require so many elements that I’m not sure if it isn’t just equivalent to solving the entire alignment problem
This seems way too extreme to me; I expect the full alignment problem to take subjective centuries to solve. CEV seems way harder to me than, e.g., ‘build nanotech that helps you build machinery to relocate your team and your AGI to the Moon, then melt all the GPUs on Earth’.
Leaving the Earth is probably overkill for defensive purposes, given the wide range of defensive options nanotech would open up (and the increasing capabilities gap as more time passes and more tasks become alignable). But it provides another proof of concept that this is a much, much simpler engineering feat than aligning CEV and solving the whole of human values.
Separately, I do in fact think it’s plausible that the entire world would roll over (at least for ten years or so) in response to an overwhelming display of force of that kind, surprising and counter-intuitive as that sounds.
I would feel much better about a plan that doesn’t require this assumption; but there are historical precedents for world powers being surprisingly passive and wary-of-direct-conflict in cases like this.
“CEV seems way harder to me than …” yes, I agree it seems way harder, and I’m assuming we won’t need to do it and that we could instead “run CEV” by just actually continuing human society and having humans figure out what they want, etc. It currently seems to me that the end game is to get to an AI security service (in analogy to state security services) that protects the world from misaligned AI, and then let humanity figure out what it wants (CEV). The default is just to do CEV directly by actual human brains, but we could instead use AI, but once you’re making that choice you’ve already won. i.e. the victory condition is having a permanent defense against misaligned AI using some AI-nanotech security service, how you do CEV after that is a luxury problem. My point about your further clarification of the “melt all the GPU’s option is that it seemed to me (upon first thinking about it), that once you are able to do that, you can basically instead just make this permanent security service. (This is what I meant by “the whole alignment problem”, but I shouldn’t have put it that way). I’m not confident though, because it might be that such a security service is in fact much harder due to having to constantly monitor software for misaligned AI.
Summary: My original interpretation of “melt the GPUs” was that it buys us a bit of extra time, but now I’m thinking it might be so involved and hard that if you can do that safely, you almost immediately can just create AI security services to permanently defend against misaligned AI (which seems to me to be the victory condition). (But not confident, I haven’t thought about it much).
Part of my intuition is, in order to create such a system safely, you have to (in practice, not literally logically necessary) be able to monitor an AI system for misalignment (in order to make sure your GPU melter doesn’t kill everyone), and do fully general scientific research. EDIT: maybe this doesn’t need you to do worst-case monitoring of misalignment though, so maybe that is what makes a GPU melter easier than fully general AI security services....
you can basically instead just make this permanent security service
Who is “you”? What sequence of events are you imagining resulting in a permanent security service (= a global surveillance and peacekeeping force?) that prevents AGI from destroying the world, without an AGI-enabled pivotal act occurring?
“you” obviously is whoever would be building the AI system that ended up burning all the GPU’s (and ensuring no future GPU’s are created). I don’t know such sequence of events just as I don’t know the sequence of events for building the “burn all GPU’s” system, except at the level of granularity of “Step 1. build a superintelligent AI system that can perform basically any easily human-specifiable task without destroying the world. Step 2. make that system burn all GPU’s indefintely/build security services that prevent misaligned AI from destroying the world”.
I basically meant to say that I don’t know that “burn all the GPU’s” isn’t already as difficult as building the security services, because they both require step 1, which is basically all of the problem (with the caveat that I’m not sure, and made an edit stating a reason why it might be far from true). I basically don’t see how you execute the “burn all gpu’s” strategy without basically solving almost the entire problem.
Step 1. build a superintelligent AI system that can perform basically any easily human-specifiable task without destroying the world.
I’d guess this is orders of magnitude harder than, e.g., ‘build an AGI that can melt all the GPUs, build you a rocket to go to the Moon, and build you a Moon base with 10+ years of supplies’.
Both sound hard, but ‘any easily human-specifiable task’ is asking for a really mature alignment science in your very first AGI systems—both in terms of ‘knowing how to align such a wide variety of tasks’ (e.g., you aren’t depending on ‘the system isn’t modeling humans’ as a safety assumption), and in terms of ‘being able to actually do the required alignment work on fairly short timescales’.
If we succeed in deploying aligned AGI systems, I expect the first such systems to be very precariously aligned—just barely able to safely perform a very minimal, limited set of tasks.
I expect humanity, if it survives at all, to survive by the skin of our teeth. Adding any extra difficulty to the task (e.g., an extra six months of work) could easily turn a realistic success scenario into a failure scenario, IMO. So I actually expect it to matter quite a lot exactly how much extra research and engineering work and testing we require; we may not be able to afford to waste a month.
I’m surprised if I haven’t made this clear yet, but the thing that (from my perspective) seems different between my and your view is not that Step 1 seems easier to me than it seems to you, but that the “melt the GPUs” strategy (and possibly other pivotal acts one might come up with) seems way harder to me than it seems to you. You don’t have to convince me of “‘any easily human-specifiable task’ is asking for a really mature alignment”, because in my model this is basically equivalent to fully solving the hard problem of AI alignment.
Some reasons:
I don’t see how you can do “melt the GPUs” without having an AI that models humans. What if a government decides to send a black ops team to kill this new terrorist organization (your alignment research team), or send a bunch of icbms at your research lab, or do any of a handful of other violent things? Surely the AI needs to understand humans to a significant degree? Maybe you think we can intentionally restrict the AI’s model of humans to be only about precisely those abstractions that this alignment team considers safe and covers all the human-generated threat models such as “a black ops team comes to kill your alignment team” (e.g. the abstraction of a human as a soldier with a gun).
What if global public opinion among scientists turns against you and all ideas about “AI alignment” are from now on considered to be megalomaniacal crackpottery? Maybe part of your alignment team even has this reaction after the event, so now you’re working with a small handful of people on alignment and the world is against you, and you’ve semi-premanently destroyed any opportunity that outside researchers can effectively collaborate on alignment research. Probably your team will fail to solve alignment by themselves. It seems to me this effect alone could be enough to make the whole plan predictably backfire. You must have thought of this effect before, so maybe you consider it to be unlikely enough to take the risk, or maybe you think it doesn’t matter somehow? To me it seems almost inevitable, and could only be prevented with basically a level of secrecy and propaganda that would require your AI to model humans anyway.
These two things alone make me think that this plan doesn’t work in practice in the real world, unless you basically solve Step 1 already. Although I must say the point which I just speculated you might have, that we could somehow control the AI’s model of humans to be restricted to particular abstractions, gives me some pause and maybe I end up being wrong via something like that. This doesn’t affect the second bullet point though.
Reminder to the reader: This whole discussion is about a thought experiment that neither party actually seriously proposed as a realistic option. I want to mention this because lines might be taken out of context to give the impression that we are actually discussing whether to do this, which we aren’t.
You don’t have to convince me of “‘any easily human-specifiable task’ is asking for a really mature alignment”, because in my model this is basically equivalent to fully solving the hard problem of AI alignment.
This seems very implausible to me. One task looks something like “figure out how to get an AGI to think about physics within a certain small volume of space, output a few specific complicated machines in that space, and not think about or steer the rest of the world”.
The other task looks something like “solve all of human psychology and moral philosophy, figure out how to get an AGI to do arbitrarily specific tasks across arbitrary cognitive domains with unlimited capabilities and free reign over the universe, and optimize the entire future light cone with zero opportunity to abort partway through if you screw anything up”.
The first task can be astoundingly difficult and still be far easier than that.
I don’t see how you can do “melt the GPUs” without having an AI that models humans.
If you’re on the Moon, on Mars, deep in the Earth’s crust, etc., or if you’ve used AGI to build fast-running human whole-brain emulations, then you can go without AGI-assisted modeling like that for a very long time (and potentially indefinitely). None of the pivotal acts that seem promising to me involve any modeling of humans, beyond the level of modeling needed to learn a specific simple physics task like ‘build more advanced computing hardware’ or ‘build an artificial ribosome’.
What if global public opinion among scientists turns against you
If humanity has solved the weak alignment problem, escaped imminent destruction via AGI proliferation, and ended the acute existential risk period, then we can safely take our time arguing about what to do next, hashing out whether the pivotal act that prevented the death of humanity violated propriety, etc. If humanity wants to take twenty years to hash out that argument, or for that matter a hundred years, then go wild!
I feel optimistic about the long-term capacity of human civilization to figure things out, grow into maturity, and eventually make sane choices about the future, if we don’t destroy ourselves. I’m much more concerned with the “let’s not destroy ourselves” problem than with the finer points of PR and messaging when it comes to discussing afterwards whatever it was someone did to prevent our imminent deaths. Humanity will have time to sort that out, if someone does successfully save us all.
a small organization going rogue
One small messaging point, though: not destroying the world isn’t “going rogue”. Destroying the world is “going rogue”. If you’re advancing AGI, the non-rogue option, the prosocial thing to do, is the thing that prevents the world from dying, not the thing that increases the probability of everyone dying.
Or, if we’re going to call ‘killing everyone’ “not going rogue”, and ‘preventing the non-rogues from killing everyone’ “going rogue”, then let’s at least be clear on the fact that going rogue is the obviously prosocial thing to do, and not going rogue (“building AGI with no remotely reasonable plan to effect pivotal outcomes”) is omnicidal and not a good idea.
I think I communicated unclearly and it’s my fault, sorry for that: I shouldn’t have used the phrase “any easily specifiable task” for what I meant, because I didn’t mean it to include “optimize the entire human lightcone w.r.t. human values”. In fact, I was being vague and probably there isn’t really a sensible notion that I was trying to point to. However, to clarify what I really was trying to say: What I mean by “hard problem of alignment” is : “develop an AI system that keeps humanity permanently safe from misaligned AI (and maybe other x risks), and otherwise leaves humanity to figure out what it wants and do what it wants without restricting it in much of any way except some relatively small volume of behaviour around ‘things that cause existential catastrophe’ ” (maybe this ends up being to develop a second version AI that then gets free reign to optimize the universe w.r.t. human values, but I’m a bit skeptical). I agree that “solve all of human psychology and moral …” is significantly harder than that (as a technical problem). (maybe I’d call this the “even harder problem”).
Ehh, maybe I am changing my mind and also agree that even what I’m calling the hard problem is significantly more difficult than the pivotal act you’re describing, if you can really do it without modelling humans, by going to mars and doing WBE. But then still the whole thing would have to rely on the WBE, and I find it implausible to do it without it (currently, but you’ve been updating me about lack of need of human modelling so maybe I’ll update here too). Basically the pivotal act is very badly described as merely “melt the gpus”, and is much more crazy than what I thought it was meant to refer to.
Regarding “rogue”: I just looked up the meaning and I thought it meant “independent from established authority”, but it seems to mean “cheating/dishonest/mischievous”, so I take back that statement about rogueness.
Here is my partial honest reaction, just two points I’m somewhat dissatisfied with (not meant to be exhaustive):
2. “A cognitive system with sufficiently high cognitive powers, given any medium-bandwidth channel of causal influence, will not find it difficult to bootstrap to overpowering capabilities independent of human infrastructure.” I would like there to be an argument for this claim that doesn’t rely on nanotech, and solidly relies on actually existing amounts of compute. E.g. if the argument relies on running intractable detailed simulations of proteins, then it doesn’t count. (I’m not disagreeing with the nanotech example by the way, or saying that it relies on unrealistic amounts of compute, I’d just like to have an argument for this that is very solid and minimally reliant on speculative technology, and actually shows that it is).
6. “We need to align the performance of some large task, a ‘pivotal act’ that prevents other people from building an unaligned AGI that destroys the world.”. You name “burn all GPU’s” as an “overestimate for the rough power level of what you’d have to do”, but it seems to me that it would be too weak of a pivotal act? Assuming there isn’t some extreme change in generally held views, people would consider this an extreme act of terrorism, and shut you down, put you in jail, and then rebuild the GPU’s and go on with what they were planning to do. Moreover, now there is probably an extreme taboo on anything AI safety related. (I’m assuming here that law enforcement finds out that you were the one who did this). Maybe the idea is to burn all GPU’s indefinitely and forever (i.e. leave nanobots that continually check for GPU’s and burn them when they are created), but even this seems either insufficient or undesirable long term depending on what is counted as a GPU. Possibly I’m not getting what you mean, but it just seems completely too weak as an act.
From an Eliezer comment:
If Iceland did this, it would plausibly need some way to (1) not have its AGI project bombed in response, and (2) be able to continue destroying GPUs in the future if new ones are built, until humanity figures out ‘what it wants to do next’. This more or less eliminates the time pressure to rush figuring out what to do next, which seems pretty crucial for good long-term outcomes. It’s a much harder problem than just ‘cause all GPUs to stop working for a year as a one-time event’, and I assume Eliezer’s focusing on nanotech it part because it’s a very general technology that can be used for tasks like those as well.
But assuming that law enforcement figures out that you did this, then puts you in jail, you wouldn’t be able to control the further use of such nanotech, i.e. there would just be a bunch of systems indefinitely destroying GPU’s, or maybe you set a timer or some conditions on it or something. I certainly see no reason why Iceland or anyone in iceland could get away with this unless those systems rely on completely unchecked nanosystems to which the US military has no response. Maybe all of this is what Eliezer means by “melt the GPU’s”, but I thought he did just mean “melt the GPU’s as a single act” (not weird that I thought this, given the phrasing “the pivotal act to melt all the GPU’s”). If this is what is meant, then it would be a strong enough pivotal act, and would be an extreme level of capability I agree.
Just wanna remind the reader that Eliezer isn’t actually proposing to do this, and I am not seriously discussing it as an option and nor was Eliezer (nor would I support it unless done legally), just thinking through a thought experiment.
This would violate Eliezer’s condition “including the reaction of existing political entities to that event”. If Iceland melts all the GPUs but then the servers its AGI is running on get bombed, or its AGI researchers get kidnapped or arrested, then I assume that the attempted pivotal act failed and we’re back to square one.
(I assume this because (a) I don’t expect most worlds to be able to get their act together before GPUs proliferate again and someone destroys the world with AGI; and (b) I assume there’s little chance of Iceland recovering from losing its AGI or its AGI team.)
Ok I admit I read over it. I must say though that this makes the whole thing more involved than it sounded at fist, since it would maybe require essentially escalating a conflict with all major military powers and still coming out on top? One possible outcome of this would be that the entire global intellectual public opinion turns against you, meaning you also possibly lose access to a lot of additional humans working with you on further alignment research? I’m not sure if I’m imagining it correctly, but it seems like this plan would either require so many elements that I’m not sure if it isn’t just equivalent to solving the entire alignment problem, or otherwise it isn’t actually enough.
This seems way too extreme to me; I expect the full alignment problem to take subjective centuries to solve. CEV seems way harder to me than, e.g., ‘build nanotech that helps you build machinery to relocate your team and your AGI to the Moon, then melt all the GPUs on Earth’.
Leaving the Earth is probably overkill for defensive purposes, given the wide range of defensive options nanotech would open up (and the increasing capabilities gap as more time passes and more tasks become alignable). But it provides another proof of concept that this is a much, much simpler engineering feat than aligning CEV and solving the whole of human values.
Separately, I do in fact think it’s plausible that the entire world would roll over (at least for ten years or so) in response to an overwhelming display of force of that kind, surprising and counter-intuitive as that sounds.
I would feel much better about a plan that doesn’t require this assumption; but there are historical precedents for world powers being surprisingly passive and wary-of-direct-conflict in cases like this.
yeah, I probably overstated. Nevertheless:
“CEV seems way harder to me than …”
yes, I agree it seems way harder, and I’m assuming we won’t need to do it and that we could instead “run CEV” by just actually continuing human society and having humans figure out what they want, etc. It currently seems to me that the end game is to get to an AI security service (in analogy to state security services) that protects the world from misaligned AI, and then let humanity figure out what it wants (CEV). The default is just to do CEV directly by actual human brains, but we could instead use AI, but once you’re making that choice you’ve already won. i.e. the victory condition is having a permanent defense against misaligned AI using some AI-nanotech security service, how you do CEV after that is a luxury problem. My point about your further clarification of the “melt all the GPU’s option is that it seemed to me (upon first thinking about it), that once you are able to do that, you can basically instead just make this permanent security service. (This is what I meant by “the whole alignment problem”, but I shouldn’t have put it that way). I’m not confident though, because it might be that such a security service is in fact much harder due to having to constantly monitor software for misaligned AI.
Summary: My original interpretation of “melt the GPUs” was that it buys us a bit of extra time, but now I’m thinking it might be so involved and hard that if you can do that safely, you almost immediately can just create AI security services to permanently defend against misaligned AI (which seems to me to be the victory condition). (But not confident, I haven’t thought about it much).
Part of my intuition is, in order to create such a system safely, you have to (in practice, not literally logically necessary) be able to monitor an AI system for misalignment (in order to make sure your GPU melter doesn’t kill everyone), and do fully general scientific research. EDIT: maybe this doesn’t need you to do worst-case monitoring of misalignment though, so maybe that is what makes a GPU melter easier than fully general AI security services....
Who is “you”? What sequence of events are you imagining resulting in a permanent security service (= a global surveillance and peacekeeping force?) that prevents AGI from destroying the world, without an AGI-enabled pivotal act occurring?
“you” obviously is whoever would be building the AI system that ended up burning all the GPU’s (and ensuring no future GPU’s are created). I don’t know such sequence of events just as I don’t know the sequence of events for building the “burn all GPU’s” system, except at the level of granularity of “Step 1. build a superintelligent AI system that can perform basically any easily human-specifiable task without destroying the world. Step 2. make that system burn all GPU’s indefintely/build security services that prevent misaligned AI from destroying the world”.
I basically meant to say that I don’t know that “burn all the GPU’s” isn’t already as difficult as building the security services, because they both require step 1, which is basically all of the problem (with the caveat that I’m not sure, and made an edit stating a reason why it might be far from true). I basically don’t see how you execute the “burn all gpu’s” strategy without basically solving almost the entire problem.
I’d guess this is orders of magnitude harder than, e.g., ‘build an AGI that can melt all the GPUs, build you a rocket to go to the Moon, and build you a Moon base with 10+ years of supplies’.
Both sound hard, but ‘any easily human-specifiable task’ is asking for a really mature alignment science in your very first AGI systems—both in terms of ‘knowing how to align such a wide variety of tasks’ (e.g., you aren’t depending on ‘the system isn’t modeling humans’ as a safety assumption), and in terms of ‘being able to actually do the required alignment work on fairly short timescales’.
If we succeed in deploying aligned AGI systems, I expect the first such systems to be very precariously aligned—just barely able to safely perform a very minimal, limited set of tasks.
I expect humanity, if it survives at all, to survive by the skin of our teeth. Adding any extra difficulty to the task (e.g., an extra six months of work) could easily turn a realistic success scenario into a failure scenario, IMO. So I actually expect it to matter quite a lot exactly how much extra research and engineering work and testing we require; we may not be able to afford to waste a month.
I’m surprised if I haven’t made this clear yet, but the thing that (from my perspective) seems different between my and your view is not that Step 1 seems easier to me than it seems to you, but that the “melt the GPUs” strategy (and possibly other pivotal acts one might come up with) seems way harder to me than it seems to you. You don’t have to convince me of “‘any easily human-specifiable task’ is asking for a really mature alignment”, because in my model this is basically equivalent to fully solving the hard problem of AI alignment.
Some reasons:
I don’t see how you can do “melt the GPUs” without having an AI that models humans. What if a government decides to send a black ops team to kill this new terrorist organization (your alignment research team), or send a bunch of icbms at your research lab, or do any of a handful of other violent things? Surely the AI needs to understand humans to a significant degree? Maybe you think we can intentionally restrict the AI’s model of humans to be only about precisely those abstractions that this alignment team considers safe and covers all the human-generated threat models such as “a black ops team comes to kill your alignment team” (e.g. the abstraction of a human as a soldier with a gun).
What if global public opinion among scientists turns against you and all ideas about “AI alignment” are from now on considered to be megalomaniacal crackpottery? Maybe part of your alignment team even has this reaction after the event, so now you’re working with a small handful of people on alignment and the world is against you, and you’ve semi-premanently destroyed any opportunity that outside researchers can effectively collaborate on alignment research. Probably your team will fail to solve alignment by themselves. It seems to me this effect alone could be enough to make the whole plan predictably backfire. You must have thought of this effect before, so maybe you consider it to be unlikely enough to take the risk, or maybe you think it doesn’t matter somehow? To me it seems almost inevitable, and could only be prevented with basically a level of secrecy and propaganda that would require your AI to model humans anyway.
These two things alone make me think that this plan doesn’t work in practice in the real world, unless you basically solve Step 1 already. Although I must say the point which I just speculated you might have, that we could somehow control the AI’s model of humans to be restricted to particular abstractions, gives me some pause and maybe I end up being wrong via something like that. This doesn’t affect the second bullet point though.
Reminder to the reader: This whole discussion is about a thought experiment that neither party actually seriously proposed as a realistic option. I want to mention this because lines might be taken out of context to give the impression that we are actually discussing whether to do this, which we aren’t.
This seems very implausible to me. One task looks something like “figure out how to get an AGI to think about physics within a certain small volume of space, output a few specific complicated machines in that space, and not think about or steer the rest of the world”.
The other task looks something like “solve all of human psychology and moral philosophy, figure out how to get an AGI to do arbitrarily specific tasks across arbitrary cognitive domains with unlimited capabilities and free reign over the universe, and optimize the entire future light cone with zero opportunity to abort partway through if you screw anything up”.
The first task can be astoundingly difficult and still be far easier than that.
If you’re on the Moon, on Mars, deep in the Earth’s crust, etc., or if you’ve used AGI to build fast-running human whole-brain emulations, then you can go without AGI-assisted modeling like that for a very long time (and potentially indefinitely). None of the pivotal acts that seem promising to me involve any modeling of humans, beyond the level of modeling needed to learn a specific simple physics task like ‘build more advanced computing hardware’ or ‘build an artificial ribosome’.
If humanity has solved the weak alignment problem, escaped imminent destruction via AGI proliferation, and ended the acute existential risk period, then we can safely take our time arguing about what to do next, hashing out whether the pivotal act that prevented the death of humanity violated propriety, etc. If humanity wants to take twenty years to hash out that argument, or for that matter a hundred years, then go wild!
I feel optimistic about the long-term capacity of human civilization to figure things out, grow into maturity, and eventually make sane choices about the future, if we don’t destroy ourselves. I’m much more concerned with the “let’s not destroy ourselves” problem than with the finer points of PR and messaging when it comes to discussing afterwards whatever it was someone did to prevent our imminent deaths. Humanity will have time to sort that out, if someone does successfully save us all.
One small messaging point, though: not destroying the world isn’t “going rogue”. Destroying the world is “going rogue”. If you’re advancing AGI, the non-rogue option, the prosocial thing to do, is the thing that prevents the world from dying, not the thing that increases the probability of everyone dying.
Or, if we’re going to call ‘killing everyone’ “not going rogue”, and ‘preventing the non-rogues from killing everyone’ “going rogue”, then let’s at least be clear on the fact that going rogue is the obviously prosocial thing to do, and not going rogue (“building AGI with no remotely reasonable plan to effect pivotal outcomes”) is omnicidal and not a good idea.
I think I communicated unclearly and it’s my fault, sorry for that: I shouldn’t have used the phrase “any easily specifiable task” for what I meant, because I didn’t mean it to include “optimize the entire human lightcone w.r.t. human values”. In fact, I was being vague and probably there isn’t really a sensible notion that I was trying to point to. However, to clarify what I really was trying to say: What I mean by “hard problem of alignment” is : “develop an AI system that keeps humanity permanently safe from misaligned AI (and maybe other x risks), and otherwise leaves humanity to figure out what it wants and do what it wants without restricting it in much of any way except some relatively small volume of behaviour around ‘things that cause existential catastrophe’ ” (maybe this ends up being to develop a second version AI that then gets free reign to optimize the universe w.r.t. human values, but I’m a bit skeptical). I agree that “solve all of human psychology and moral …” is significantly harder than that (as a technical problem). (maybe I’d call this the “even harder problem”).
Ehh, maybe I am changing my mind and also agree that even what I’m calling the hard problem is significantly more difficult than the pivotal act you’re describing, if you can really do it without modelling humans, by going to mars and doing WBE. But then still the whole thing would have to rely on the WBE, and I find it implausible to do it without it (currently, but you’ve been updating me about lack of need of human modelling so maybe I’ll update here too). Basically the pivotal act is very badly described as merely “melt the gpus”, and is much more crazy than what I thought it was meant to refer to.
Regarding “rogue”: I just looked up the meaning and I thought it meant “independent from established authority”, but it seems to mean “cheating/dishonest/mischievous”, so I take back that statement about rogueness.
I’ll respond to the “public opinion” thing later.