I strongly believe that (1) well-being is objective, (2) well-being is quantifiable, and (3) Open Individualism is true (i.e., the concept of identity isn’t well-defined, and you’re subjectively no less continuous with the future self if any other person than your own future self).
If (1-3) are all true, then utilitronium is the optimal outcome for everyone even if they’re entirely selfish. Furthermore, I expect an AGI to figure this out, and to the extent that it’s aligned, it should communicate that if it’s asked. (I don’t think an AGI will therefore decide to do the right thing, so this is entirely compatible with everyone dying if alignment isn’t solved.)
In the scenario where people get to talk to the AGI freely and it’s aligned, two concrete mechanisms I see are (a) people just ask the AGI what is morally correct and it tells them, and (b) they get some small taste of what utilitronium would feel like, which would make it less scary. (A crucial piece is that they can rationally expect to experience this themselves in the utilitronium future.)
In the scenario where people don’t get to talk to the AGI, who knows. It’s certainly possible that we have singleton scenario with a few people in charge of the AGI, and they decide to censor questions about ethics because they find the answers scary.
The only org I know of that works on this and shares my philosophical views is QRI. Their goal is to (a) come up with a mathematical space (probably a topological one, mb a Hilbert space) that precisely describes the subjective experience of someone, (b) find a way to put someone in the scanner and create that space, and (c) find a property of that space that corresponds to their well-being in that moment. The flag ship theory is that this property is symmetry. Their model is stronger than (1-3), but if it’s correct, you could get hard evidence on this before AGI since it would make strong testable predictions about people’s well-being (and they think it could also point to easy interventions, though I don’t understand how that works). Whether it’s feasible to do this before AGI is a different question. I’d bet against it, but I think I give it better odds than any specific alignment proposal. (And I happen to know that Mike agrees that the future is dominated by concerns about AI and thinks this is the best thing to work on.)
So, I think their research is the best bet for getting more people on board with utilitronium since it can provide evidence on (1) and (2). (Also has the nice property that it won’t work if (1) or (2) are false, so there’s low risk of outrage.) Other than that, write posts arguing for moral realism and/or for Open Individualism.
Quantifying suffering before AGI would also plausibly help with alignment, since at least you can formally specify a broad space of outcomes you don’t want. though it certainly doesn’t solve it, e.g. because of inner optimizers.
I’m interested in your view on this, plus what we can potentially do to push the future in this direction.
I strongly believe that (1) well-being is objective, (2) well-being is quantifiable, and (3) Open Individualism is true (i.e., the concept of identity isn’t well-defined, and you’re subjectively no less continuous with the future self if any other person than your own future self).
If (1-3) are all true, then utilitronium is the optimal outcome for everyone even if they’re entirely selfish. Furthermore, I expect an AGI to figure this out, and to the extent that it’s aligned, it should communicate that if it’s asked. (I don’t think an AGI will therefore decide to do the right thing, so this is entirely compatible with everyone dying if alignment isn’t solved.)
In the scenario where people get to talk to the AGI freely and it’s aligned, two concrete mechanisms I see are (a) people just ask the AGI what is morally correct and it tells them, and (b) they get some small taste of what utilitronium would feel like, which would make it less scary. (A crucial piece is that they can rationally expect to experience this themselves in the utilitronium future.)
In the scenario where people don’t get to talk to the AGI, who knows. It’s certainly possible that we have singleton scenario with a few people in charge of the AGI, and they decide to censor questions about ethics because they find the answers scary.
The only org I know of that works on this and shares my philosophical views is QRI. Their goal is to (a) come up with a mathematical space (probably a topological one, mb a Hilbert space) that precisely describes the subjective experience of someone, (b) find a way to put someone in the scanner and create that space, and (c) find a property of that space that corresponds to their well-being in that moment. The flag ship theory is that this property is symmetry. Their model is stronger than (1-3), but if it’s correct, you could get hard evidence on this before AGI since it would make strong testable predictions about people’s well-being (and they think it could also point to easy interventions, though I don’t understand how that works). Whether it’s feasible to do this before AGI is a different question. I’d bet against it, but I think I give it better odds than any specific alignment proposal. (And I happen to know that Mike agrees that the future is dominated by concerns about AI and thinks this is the best thing to work on.)
So, I think their research is the best bet for getting more people on board with utilitronium since it can provide evidence on (1) and (2). (Also has the nice property that it won’t work if (1) or (2) are false, so there’s low risk of outrage.) Other than that, write posts arguing for moral realism and/or for Open Individualism.
Quantifying suffering before AGI would also plausibly help with alignment, since at least you can formally specify a broad space of outcomes you don’t want. though it certainly doesn’t solve it, e.g. because of inner optimizers.