Good points and I agree with pretty much all of them, but for the sake of argument I’ll try to write the strongest response I can:
It seems to me that your view of value is a little bit mystical. Our minds can only estimate the value of situations that are close to normal. There’s no unique way to extend a messy function from [0,1] to [-100,100]. I know you want to use philosophy to extend the domain, but I don’t trust our philosophical abilities to do that, because whatever mechanism created them could only test them on normal situations. We already see different people’s philosophies disagreeing much more on abnormal situations than normal ones. If I got an email from an uplifted version of me saying he found an abnormal situation that’s really valuable, I wouldn’t trust it much, because it’s too sensitive to arbitrary choices made during uplifting (even choices made by me).
That’s why it makes sense to try to come up with a normal situation that’s as good as we can imagine, without looking at abnormal situations too much. (We can push the boundaries of normal and allow some mind modification, but not too much because that invites risk.) That was a big part of the motivation for my post.
If the idea of unmodified human brains living in a coarse-grained VR utopia doesn’t appeal to you, I guess a more general version is describing some other kind of nice universe, and using an arbitrary strong AI to run that universe on top of ours as described in the post. Solving population ethics etc. can probably wait until we’ve escaped immediate disaster. Astronomical waste is a problem, but not an extreme one, because we can use up all computation in the host universe if we want. So the problem comes down to describing a nice universe, which is similar to FAI but easier, because it doesn’t require translating preferences from one domain to another (like with the blue-minimizing robot).
Solving population ethics etc. can probably wait until we’ve escaped immediate disaster.
So there will be some way for people living inside the VR to change the AI’s values later, it won’t just be a fixed utility function encoding whatever philosophical views the people building the AI have? If that’s the case (and you’ve managed to avoid bugs and potential issues like value drift and AI manipulating the people’s philosophical reasoning) then I’d be happy with that. But I don’t see why it’s easier than FAI. Sure, you don’t need to figure out how to translate preferences from one domain to another in order to implement it, but then you don’t need to do that to implement CEV either. You can let CEV try to figure that out, and if CEV can’t, it can do the same thing you’re suggesting here, have the FAI implement a VR universe on top of the physical one.
Your idea actually seems harder than CEV in at least one respect because you have to solve how human-like consciousness relates to underlying physics for arbitrary laws of physics (otherwise what happens if your AI discovers that the laws of physics are not what we think they are), which doesn’t seem necessary to implement CEV.
The idea that CEV is simpler (because you can “let it figure things out”) is new to me! I always felt CEV was very complex and required tons of philosophical progress, much more than solving the problem of consciousness. If you think it requires less, can you sketch the argument?
I think you may have misunderstood my comment. I’m not saying CEV is simpler overall, I’m saying it’s not clear to me why your idea is simpler, if you’re including the “feature” of allowing people inside the VR to change the AI’s values. That seems to introduce problems that are analogous to the kinds of problems that CEV has. Basically you have to design your VR universe to guarantee that people who live inside them will avoid value drift and eventually reach correct conclusions about what their values are. That’s where the main difficulty in CEV lies also, at least in my view. What do you think are some of the philosophical progress that CEV requires that your idea avoids?
The way I imagined it, people inside the VR wouldn’t be able to change the AI’s values. Population ethics seems like a problem that people can solve by themselves, negotiating with each other under the VR’s rules, without help from AI.
CEV requires extracting all human preferences, extrapolating them, determining coherence, and finding a general way to map them to physics. (We need to either do it ourselves, or teach the AI how to do it, the difference doesn’t matter to the argument.) The approach in my post skips most of these tasks, by letting humans describe a nice normal world directly, and requires mapping only one thing (consciousness) to physics. Though I agree with you that the loss of potential utility is huge, the idea is intended as a kind of lower bound.
Good points and I agree with pretty much all of them, but for the sake of argument I’ll try to write the strongest response I can:
It seems to me that your view of value is a little bit mystical. Our minds can only estimate the value of situations that are close to normal. There’s no unique way to extend a messy function from [0,1] to [-100,100]. I know you want to use philosophy to extend the domain, but I don’t trust our philosophical abilities to do that, because whatever mechanism created them could only test them on normal situations. We already see different people’s philosophies disagreeing much more on abnormal situations than normal ones. If I got an email from an uplifted version of me saying he found an abnormal situation that’s really valuable, I wouldn’t trust it much, because it’s too sensitive to arbitrary choices made during uplifting (even choices made by me).
That’s why it makes sense to try to come up with a normal situation that’s as good as we can imagine, without looking at abnormal situations too much. (We can push the boundaries of normal and allow some mind modification, but not too much because that invites risk.) That was a big part of the motivation for my post.
If the idea of unmodified human brains living in a coarse-grained VR utopia doesn’t appeal to you, I guess a more general version is describing some other kind of nice universe, and using an arbitrary strong AI to run that universe on top of ours as described in the post. Solving population ethics etc. can probably wait until we’ve escaped immediate disaster. Astronomical waste is a problem, but not an extreme one, because we can use up all computation in the host universe if we want. So the problem comes down to describing a nice universe, which is similar to FAI but easier, because it doesn’t require translating preferences from one domain to another (like with the blue-minimizing robot).
So there will be some way for people living inside the VR to change the AI’s values later, it won’t just be a fixed utility function encoding whatever philosophical views the people building the AI have? If that’s the case (and you’ve managed to avoid bugs and potential issues like value drift and AI manipulating the people’s philosophical reasoning) then I’d be happy with that. But I don’t see why it’s easier than FAI. Sure, you don’t need to figure out how to translate preferences from one domain to another in order to implement it, but then you don’t need to do that to implement CEV either. You can let CEV try to figure that out, and if CEV can’t, it can do the same thing you’re suggesting here, have the FAI implement a VR universe on top of the physical one.
Your idea actually seems harder than CEV in at least one respect because you have to solve how human-like consciousness relates to underlying physics for arbitrary laws of physics (otherwise what happens if your AI discovers that the laws of physics are not what we think they are), which doesn’t seem necessary to implement CEV.
The idea that CEV is simpler (because you can “let it figure things out”) is new to me! I always felt CEV was very complex and required tons of philosophical progress, much more than solving the problem of consciousness. If you think it requires less, can you sketch the argument?
I think you may have misunderstood my comment. I’m not saying CEV is simpler overall, I’m saying it’s not clear to me why your idea is simpler, if you’re including the “feature” of allowing people inside the VR to change the AI’s values. That seems to introduce problems that are analogous to the kinds of problems that CEV has. Basically you have to design your VR universe to guarantee that people who live inside them will avoid value drift and eventually reach correct conclusions about what their values are. That’s where the main difficulty in CEV lies also, at least in my view. What do you think are some of the philosophical progress that CEV requires that your idea avoids?
The way I imagined it, people inside the VR wouldn’t be able to change the AI’s values. Population ethics seems like a problem that people can solve by themselves, negotiating with each other under the VR’s rules, without help from AI.
CEV requires extracting all human preferences, extrapolating them, determining coherence, and finding a general way to map them to physics. (We need to either do it ourselves, or teach the AI how to do it, the difference doesn’t matter to the argument.) The approach in my post skips most of these tasks, by letting humans describe a nice normal world directly, and requires mapping only one thing (consciousness) to physics. Though I agree with you that the loss of potential utility is huge, the idea is intended as a kind of lower bound.