I think that to pull this off well, you would need to match pretty closely to reality.
Genome based AI, start with the human genome, simulate that growing into a person, sounds easier.
Once you replace evolution with SGD, replace DNA and proteins with something easier to simulate, replace learning memories with downloading them, replace the ancestral environment with some video game. Then the approximation is so crude that you are basically training a neural net to do things that seem nice, and hoping for the best.
If you could rerun evolution starting from chimps, you may well get creatures with fairly similar values. If you rerun evolution, and then post select on various pieces of text, very similar values.
If you start from the first RNA, getting near human values is hard.
Then consider that human values can vary by culture a fair bit.
Consider the question of whether or not simulations of human minds are morally important.
Answer yes and you get endless virtual utopia. The person who answered no sees humanity wiped out and the universe filled with worthless computers.
Answer no and you get a smaller and less fun real utopia, plus people simulating whatever they feel like. Quite possibly the vast majority of human minds live unpleasant lives as characters in violence filled video games.
Now consider that you will probably find both positions on lesswrong. This isn’t a cultural difference between us and ancient mongols. This is a cultural difference between people that are very culturally similar.
Now you can say that one side is right. You can optimize some combination and get a world that both sides like.
On a sufficiently basic level, most humans value tasty food (some people will refuse it for all sorts of reasons)
Far from the day to day world, human values are unconstrained by survival constraints. (Evolution so far has not selected for any particular view on whether simulations are morally you.)
There may be a single truth that all humans are converging towards. But maybe not.
If you just simulate the whole world, and put an “exit simulation” button that only an ASI could press, then these aliens have no better shot at alignment than us.
If you zoom in on the world, picking out the alien equivalent of MIRI, and giving them extra help over the careless aliens creating UFAI, then you need to locate the alien MIRI, when the aliens speak an alien language. They still might screw up anyway.
I think that to pull this off well, you would need to match pretty closely to reality.
Genome based AI, start with the human genome, simulate that growing into a person, sounds easier.
Once you replace evolution with SGD, replace DNA and proteins with something easier to simulate, replace learning memories with downloading them, replace the ancestral environment with some video game. Then the approximation is so crude that you are basically training a neural net to do things that seem nice, and hoping for the best.
If you could rerun evolution starting from chimps, you may well get creatures with fairly similar values. If you rerun evolution, and then post select on various pieces of text, very similar values.
If you start from the first RNA, getting near human values is hard.
Then consider that human values can vary by culture a fair bit.
Consider the question of whether or not simulations of human minds are morally important.
Answer yes and you get endless virtual utopia. The person who answered no sees humanity wiped out and the universe filled with worthless computers.
Answer no and you get a smaller and less fun real utopia, plus people simulating whatever they feel like. Quite possibly the vast majority of human minds live unpleasant lives as characters in violence filled video games.
Now consider that you will probably find both positions on lesswrong. This isn’t a cultural difference between us and ancient mongols. This is a cultural difference between people that are very culturally similar.
Now you can say that one side is right. You can optimize some combination and get a world that both sides like.
On a sufficiently basic level, most humans value tasty food (some people will refuse it for all sorts of reasons)
Far from the day to day world, human values are unconstrained by survival constraints. (Evolution so far has not selected for any particular view on whether simulations are morally you.)
There may be a single truth that all humans are converging towards. But maybe not.
If you just simulate the whole world, and put an “exit simulation” button that only an ASI could press, then these aliens have no better shot at alignment than us.
If you zoom in on the world, picking out the alien equivalent of MIRI, and giving them extra help over the careless aliens creating UFAI, then you need to locate the alien MIRI, when the aliens speak an alien language. They still might screw up anyway.