A hellworld is ultimately a world that is against our values. However, our values are underdefined and changeable. So to have any chance of saying what these values are, we need to either extract key invariant values, synthesise our contradictory values into some complete whole, or use some extrapolation procedure (eg CEV). In any case, there is a procedure for establishing our values (or else the very concept of “hellworld” makes no sense).
It feels worth distinguishing between two cases of “hellworld”:
1. A world which is not aligned with the values of that world’s inhabitants themselves. One could argue that in order to merit the designation “hellworld”, the world has to be out of alignment with the values of its inhabitants in such a way as to cause suffering. Assuming that we can come up with a reasonable definition of suffering, then detecting these kinds of worlds seems relatively straightforward: we can check whether they contain immense amounts of suffering.
2. A world whose inhabitants do not suffer, but which we might consider hellish according to our values. For example, something like a Brave New World scenario, where people generally consider themselves happy but where that happiness comes at the cost of suppressing individuality and promoting superficial pleasures.
It’s for detecting an instance of the second case that we need to understand our values better. But it’s not clear to me that such a world should qualify as a “hellworld”, which to me sounds like a world with negative value. While I don’t find the notion of being the inhabitant of a Brave New World particularly appealing, a world where most people are happy but only in a superficial way sounds more like “overall low positive value” than “negative value” to me. Assuming that you’ve internalized its values and norms, existing in a BNW doesn’t seem like a fate worse than death, it just sounds like a future that could have gone better.
Of course, there is an argument that even if a BNW would be okay to its inhabitants once we got there, getting there might cause a lot of suffering: for instance, if there were lots of people who were forced against their will to adapt to the system. Since many of us might find the BNW to be a fate worse than death, then conditional on us surviving to live in the BNW, it’s a hellworld (at least to us). But again this doesn’t seem like it requires a thorough understanding of our values to detect: it just requires detecting the fact that if we survive to live in the BNW, we will experience a lot of suffering due to being in a world which is contrary to our values.
Assuming that we can come up with a reasonable definition of suffering
Checking whether there is a large amount of suffering in a deliberately obfuscated world seems hard, or impossible if a superintelligent has done the obfuscating.
True, not disputing that. Only saying that it seems like an easier problem than solving human values first, and then checking whether those values are satisfied.
It feels worth distinguishing between two cases of “hellworld”:
1. A world which is not aligned with the values of that world’s inhabitants themselves. One could argue that in order to merit the designation “hellworld”, the world has to be out of alignment with the values of its inhabitants in such a way as to cause suffering. Assuming that we can come up with a reasonable definition of suffering, then detecting these kinds of worlds seems relatively straightforward: we can check whether they contain immense amounts of suffering.
2. A world whose inhabitants do not suffer, but which we might consider hellish according to our values. For example, something like a Brave New World scenario, where people generally consider themselves happy but where that happiness comes at the cost of suppressing individuality and promoting superficial pleasures.
It’s for detecting an instance of the second case that we need to understand our values better. But it’s not clear to me that such a world should qualify as a “hellworld”, which to me sounds like a world with negative value. While I don’t find the notion of being the inhabitant of a Brave New World particularly appealing, a world where most people are happy but only in a superficial way sounds more like “overall low positive value” than “negative value” to me. Assuming that you’ve internalized its values and norms, existing in a BNW doesn’t seem like a fate worse than death, it just sounds like a future that could have gone better.
Of course, there is an argument that even if a BNW would be okay to its inhabitants once we got there, getting there might cause a lot of suffering: for instance, if there were lots of people who were forced against their will to adapt to the system. Since many of us might find the BNW to be a fate worse than death, then conditional on us surviving to live in the BNW, it’s a hellworld (at least to us). But again this doesn’t seem like it requires a thorough understanding of our values to detect: it just requires detecting the fact that if we survive to live in the BNW, we will experience a lot of suffering due to being in a world which is contrary to our values.
Checking whether there is a large amount of suffering in a deliberately obfuscated world seems hard, or impossible if a superintelligent has done the obfuscating.
True, not disputing that. Only saying that it seems like an easier problem than solving human values first, and then checking whether those values are satisfied.