I’m inclined towards the view that we shouldn’t even try to capture all human complexity of value. Instead, we should just build a simple utility function that captures some value that we consider important, and sacrifices everything else. If humans end up unhappy with this, the AI is allowed to modify us so that we become happy with it.
Yes, being turned to orgasmium is in a sense much worse than having an AI satisfying all the fun theory criteria. But surely it’s still much better than just getting wiped out, and it should be considerably easier to program than something like CEV.
Instead, we should just build a simple utility function that captures some value that we consider important, and sacrifices everything else.
I actually wrote a post on this idea. But I consider it to be a contingency plan for “moral philosophy turns out to be easy” (i.e., we solve ‘morality’ ourselves without having to run CEV and can determine with some precision how much worse turning the universe into orgasmium is, compared to the best possible outcome, and how much better it is compared to just getting wiped out). I don’t think it’s a good backup plan for “seed AI turns out to be easy”, because for one thing you’ll probably have trouble finding enough AI researchers/programmers willing to deliberately kill everyone for the sake of turning the universe into orgasmium, unless it’s really clear that’s the right thing to do.
If we posit a putative friendly AI as one which e.g. kills no one as a base rule AND is screened by competent AI researchers for any maximizing functions then any remaining “nice to haves” can just be put to a vote.
I think it sounds worse. If an AI more friendly than that turns out to be impossible I’d probably go for the negative utilitarian route and give the AI a goal of minimizing anything that might have any kind of subjective experience. Including itself once it’s done.
“But surely it’s still much better than just getting wiped out”
I think that is the key here. If “just getting wiped out” is the definition of unfriendly then “not gettting wiped out” should be the MINIMUM goal for a putative “friendly” AI.
i.e. “kill no humans”.
It starts to get complex after that. For example: Is it OK to kill all humans, but freeze their dead bodies at the point of death and then resurrect one or more of them later? Is it OK to kill all humans by destructively scanning them and then running them as software inside simulations? What about killing all humans but keeping a facility of frozen embryos to be born at a later date?
I’m inclined towards the view that we shouldn’t even try to capture all human complexity of value. Instead, we should just build a simple utility function that captures some value that we consider important, and sacrifices everything else. If humans end up unhappy with this, the AI is allowed to modify us so that we become happy with it.
Yes, being turned to orgasmium is in a sense much worse than having an AI satisfying all the fun theory criteria. But surely it’s still much better than just getting wiped out, and it should be considerably easier to program than something like CEV.
I actually wrote a post on this idea. But I consider it to be a contingency plan for “moral philosophy turns out to be easy” (i.e., we solve ‘morality’ ourselves without having to run CEV and can determine with some precision how much worse turning the universe into orgasmium is, compared to the best possible outcome, and how much better it is compared to just getting wiped out). I don’t think it’s a good backup plan for “seed AI turns out to be easy”, because for one thing you’ll probably have trouble finding enough AI researchers/programmers willing to deliberately kill everyone for the sake of turning the universe into orgasmium, unless it’s really clear that’s the right thing to do.
Maybe you already have the answer Wei Dai.
If we posit a putative friendly AI as one which e.g. kills no one as a base rule AND is screened by competent AI researchers for any maximizing functions then any remaining “nice to haves” can just be put to a vote.
I think it sounds worse. If an AI more friendly than that turns out to be impossible I’d probably go for the negative utilitarian route and give the AI a goal of minimizing anything that might have any kind of subjective experience. Including itself once it’s done.
“But surely it’s still much better than just getting wiped out”
I think that is the key here. If “just getting wiped out” is the definition of unfriendly then “not gettting wiped out” should be the MINIMUM goal for a putative “friendly” AI.
i.e. “kill no humans”.
It starts to get complex after that. For example: Is it OK to kill all humans, but freeze their dead bodies at the point of death and then resurrect one or more of them later? Is it OK to kill all humans by destructively scanning them and then running them as software inside simulations? What about killing all humans but keeping a facility of frozen embryos to be born at a later date?