It’s hard for me to argue with multiple people simultaneously. When I argue with someone I tend to adopt most of their assumptions in order to focus on what I think is the core disagreement, so to argue with someone else I have to “swap in” a different set of assumptions and related arguments. The OP was aimed mostly at Eliezer, so it assumed that intelligence explosion is relatively easy. (Would you agree that if intelligence explosion was easy, then it would be hard to achieve a good outcome in the way that you imagine, by incrementally solving “the AI control problem”?)
If we instead assume that intelligence explosion isn’t so easy, then I think the main problem we face is value drift and Malthusian outcomes caused by competitive evolution (made worse by brain emulations and AGIs that can be easily copied), which can only be prevented by building a singleton. (A secondary consideration involves other existential risks related to technological progress, such as physics/nanotech/biotech disasters.) I don’t think humanity as a whole is sufficiently strategic to solve this problem before it’s too late (meaning a lot of value drift has already occurred or building a singleton becomes impossible due to space colonization). I think the fact that you are much more optimistic about this accounts for much of our disagreement on overall strategy, and I wonder if you can explain why. I don’t mean to put the burden of proof on you, but perhaps you have some ready explanation at hand?
I don’t think that fast intelligence explosion ---> you have to solve the kind of hard philosophical problems that you are alluding to. You seem to grant that there are no particular hard philosophical problems we’ll have to solve, but you think that nevertheless every approach to the problem will require solving such problems. Is it easy to state why you expect this? Is it because approaches we can imagine in detail today involve solving hard problems?
Regarding the hardness of defining “remain in control,” it is not the case that you need to be able to define X formally in order to accomplish X. Again, perhaps such approaches require solving hard philosophical problems, but I don’t see why you would be confident (either about this particular approach or more broadly). Regarding my claim that we need to figure this out anyway, I mean that we need to implicitly accept some process of reflection and self-modification as we go on reflecting and self-modifying.
I don’t see why a singleton is necessary to avert value drift in any case; they seem mostly orthogonal. Is there a simple argument here? See e.g. Carl’s post on this and mine. I agree there is a problem to be solved, but it seems to involve faithfully transmitting hard-to-codify values (again, perhaps implicitly).
I’ll just respond to part of your comment since I’m busy today. I’ll respond to the rest later or when we meet.
I don’t see why a singleton is necessary to avert value drift in any case; they seem mostly orthogonal. Is there a simple argument here?
Not sure if this argument is original to me. I may have read it from Nick Bostrom or someone else. When I said “value drift” I meant value drift of humanity in aggregate, not necessarily value drift of any individual. Different people have values that have different levels of difficulty of transmitting. Or some just think that their values are easy to transmit, for example those who think they should turn the universe into hedonium, or should maximize “complexity”. Competitive evolution will favor (in the sense of maximizing descendents/creations of) such people since they can take advantage of new AGI or other progress more quickly than those who think their values are harder to transmit.
I think there’s an additional argument that says people who have shorter planning horizons will take advantage of new AGI progress more quickly because they don’t particularly mind not transmitting their values into the far future, but just care about short term benefits like gaining academic fame.
Yes, if it is impossible to remain in control of AIs then you will have value drift, and yes a singleton can help with this in the same way they can help with any technological risk, namely by blocking adoption of the offending technology. So I concede they aren’t completely orthogonal, in the sense that any risk of progress can be better addressed by a singleton + slow progress. (This argument is structurally identical to the argument for danger from biology progress, physics progress, or even early developments in conventional explosives.) But this is a very far cry from “can only be prevented by building a singleton.”
To restate how the situation seems to me: you say “the problems are so hard that any attempt to solve them is obviously doomed,” and I am asking for some indication that this is the case besides intuition and a small number of not-very-representation-examples, which seems unlikely to yield a very confident solution. Eliezer makes a similar claim, with you two disagreeing about how likely Eliezer is to solve the problems but not about how likely the problems are to get solved by people who aren’t Eliezer. I don’t understand either of your arguments too well; it seems like both of you are correct to disagree with the mainstream by identifying a problem and noticing that it may be an unusually challenging one, but I don’t see why either of you is so confident.
To isolate a concrete disagreement, if there was an intervention that sped up the onset of serious AI safety work twice as much as it sped up the arrival of AI, I would tentatively consider that a positive (and if it sped up the onset of serious AI work ten times as much as it sped up the arrival of AI it would seem like a clear win; I previously argued that 1.1x as much would also be a big win, but Carl convinced me to increase the cutoff with a very short discussion). You seem to be saying that you would consider it a loss at any ratio, because speeding up the arrival of AI is so much worse than speeding up the onset of serious thought about AI safety, because it is so confidently doomed.
Yes, if it is impossible to remain in control of AIs then you will have value drift
Wait, that’s not my argument. I was saying that while people like you are trying to develop technologies that let you “remain in control”, others with shorter planning horizons or think they have simple, easy to transmit values will already be deploying new AGI capabilities, so you’ll fall behind with every new development. This is what I’m suggesting only a singleton can prevent.
You could try to minimize this kind of value drift by speeding up “AI control” progress but it’s really hard for me to see how you can speed it up enough to not lose a competitive race with those who do not see a need to solve this problem, or think they can solve a much easier problem. The way I model AGI development in a slow-FOOM scenario is that AGI capability will come in spurts along with changing architectures, and it’s hard to do AI safety work “ahead of time” because of dependencies on AI architecture. So each time there is a big AGI capability development, you’ll be forced to spend time to develop new AI safety tech for that capability/architecture, while others will not wait to deploy it. Even a small delay can lead to a large loss since AIs can be easily copied and more capable but uncontrolled AIs would quickly take over economic niches occupied by existing humans and controlled AIs. Even assuming secure rights for what you already own on Earth, your share of the future universe will become smaller and smaller as most of the world’s new wealth goes to uncontrolled AIs or AIs with simple values.
Where do you see me going wrong here? If you think I’m just too confident in this model, what alternative scenario can you suggest, where people like you and I (or our values) get to keep a large share of the future universe just by speeding up the onset of serious AI safety work?
It’s hard for me to argue with multiple people simultaneously. When I argue with someone I tend to adopt most of their assumptions in order to focus on what I think is the core disagreement, so to argue with someone else I have to “swap in” a different set of assumptions and related arguments. The OP was aimed mostly at Eliezer, so it assumed that intelligence explosion is relatively easy. (Would you agree that if intelligence explosion was easy, then it would be hard to achieve a good outcome in the way that you imagine, by incrementally solving “the AI control problem”?)
If we instead assume that intelligence explosion isn’t so easy, then I think the main problem we face is value drift and Malthusian outcomes caused by competitive evolution (made worse by brain emulations and AGIs that can be easily copied), which can only be prevented by building a singleton. (A secondary consideration involves other existential risks related to technological progress, such as physics/nanotech/biotech disasters.) I don’t think humanity as a whole is sufficiently strategic to solve this problem before it’s too late (meaning a lot of value drift has already occurred or building a singleton becomes impossible due to space colonization). I think the fact that you are much more optimistic about this accounts for much of our disagreement on overall strategy, and I wonder if you can explain why. I don’t mean to put the burden of proof on you, but perhaps you have some ready explanation at hand?
I don’t think that fast intelligence explosion ---> you have to solve the kind of hard philosophical problems that you are alluding to. You seem to grant that there are no particular hard philosophical problems we’ll have to solve, but you think that nevertheless every approach to the problem will require solving such problems. Is it easy to state why you expect this? Is it because approaches we can imagine in detail today involve solving hard problems?
Regarding the hardness of defining “remain in control,” it is not the case that you need to be able to define X formally in order to accomplish X. Again, perhaps such approaches require solving hard philosophical problems, but I don’t see why you would be confident (either about this particular approach or more broadly). Regarding my claim that we need to figure this out anyway, I mean that we need to implicitly accept some process of reflection and self-modification as we go on reflecting and self-modifying.
I don’t see why a singleton is necessary to avert value drift in any case; they seem mostly orthogonal. Is there a simple argument here? See e.g. Carl’s post on this and mine. I agree there is a problem to be solved, but it seems to involve faithfully transmitting hard-to-codify values (again, perhaps implicitly).
I’ll just respond to part of your comment since I’m busy today. I’ll respond to the rest later or when we meet.
Not sure if this argument is original to me. I may have read it from Nick Bostrom or someone else. When I said “value drift” I meant value drift of humanity in aggregate, not necessarily value drift of any individual. Different people have values that have different levels of difficulty of transmitting. Or some just think that their values are easy to transmit, for example those who think they should turn the universe into hedonium, or should maximize “complexity”. Competitive evolution will favor (in the sense of maximizing descendents/creations of) such people since they can take advantage of new AGI or other progress more quickly than those who think their values are harder to transmit.
I think there’s an additional argument that says people who have shorter planning horizons will take advantage of new AGI progress more quickly because they don’t particularly mind not transmitting their values into the far future, but just care about short term benefits like gaining academic fame.
Yes, if it is impossible to remain in control of AIs then you will have value drift, and yes a singleton can help with this in the same way they can help with any technological risk, namely by blocking adoption of the offending technology. So I concede they aren’t completely orthogonal, in the sense that any risk of progress can be better addressed by a singleton + slow progress. (This argument is structurally identical to the argument for danger from biology progress, physics progress, or even early developments in conventional explosives.) But this is a very far cry from “can only be prevented by building a singleton.”
To restate how the situation seems to me: you say “the problems are so hard that any attempt to solve them is obviously doomed,” and I am asking for some indication that this is the case besides intuition and a small number of not-very-representation-examples, which seems unlikely to yield a very confident solution. Eliezer makes a similar claim, with you two disagreeing about how likely Eliezer is to solve the problems but not about how likely the problems are to get solved by people who aren’t Eliezer. I don’t understand either of your arguments too well; it seems like both of you are correct to disagree with the mainstream by identifying a problem and noticing that it may be an unusually challenging one, but I don’t see why either of you is so confident.
To isolate a concrete disagreement, if there was an intervention that sped up the onset of serious AI safety work twice as much as it sped up the arrival of AI, I would tentatively consider that a positive (and if it sped up the onset of serious AI work ten times as much as it sped up the arrival of AI it would seem like a clear win; I previously argued that 1.1x as much would also be a big win, but Carl convinced me to increase the cutoff with a very short discussion). You seem to be saying that you would consider it a loss at any ratio, because speeding up the arrival of AI is so much worse than speeding up the onset of serious thought about AI safety, because it is so confidently doomed.
Wait, that’s not my argument. I was saying that while people like you are trying to develop technologies that let you “remain in control”, others with shorter planning horizons or think they have simple, easy to transmit values will already be deploying new AGI capabilities, so you’ll fall behind with every new development. This is what I’m suggesting only a singleton can prevent.
You could try to minimize this kind of value drift by speeding up “AI control” progress but it’s really hard for me to see how you can speed it up enough to not lose a competitive race with those who do not see a need to solve this problem, or think they can solve a much easier problem. The way I model AGI development in a slow-FOOM scenario is that AGI capability will come in spurts along with changing architectures, and it’s hard to do AI safety work “ahead of time” because of dependencies on AI architecture. So each time there is a big AGI capability development, you’ll be forced to spend time to develop new AI safety tech for that capability/architecture, while others will not wait to deploy it. Even a small delay can lead to a large loss since AIs can be easily copied and more capable but uncontrolled AIs would quickly take over economic niches occupied by existing humans and controlled AIs. Even assuming secure rights for what you already own on Earth, your share of the future universe will become smaller and smaller as most of the world’s new wealth goes to uncontrolled AIs or AIs with simple values.
Where do you see me going wrong here? If you think I’m just too confident in this model, what alternative scenario can you suggest, where people like you and I (or our values) get to keep a large share of the future universe just by speeding up the onset of serious AI safety work?
Did you talk about this at the recent workshop? If you’re willing to share publicly, I’d be curious about the outcome of this discussion.
A singleton (even if it is a world government) is argued to be a good thing for humanity by Bostrom here and here