Finding principles for AI “behavioural engineering” that reduces people’s desire to engage in risky races (e.g. because they find the principles acceptable) seems highly valuable
To the extent permitted by 1, pursuing something CEV like (“we’re happier with the outcome in hindsight than we would’ve been with other outcomes”) seems desirable also
I sort of see the former as potentially encouraging diversity (because different groups want different things, and are most likely to agree to “everyone gets what they want”), but the latter may in fact suggest convergence (because, perhaps, there are fairly universal answers to “what makes people happy with the benefit of hindsight?”).
You stress the importance of having robust feedback procedures, but having overall goals like this can help to judge which procedures are actually doing what we want.
Doesn’t this hindsight-based definition of CEV that you offer here preclude using it as an objective to coordinate around? You can’t coordinate around something that you’ll only know in hindsight. And however you try to estimate it, if you try to get people to coordinate around your estimated CEV than some groups will feel like it doesn’t represent their view of how to make the world a better place, or it doesn’t prioritize X over Y in the way they would, etc.
I mean, even if you’re mostly pursuing a particular set of final values (which is not what you’re advocating here), there are probably strong reasons to make coordination a high priority (which is close to what you’re advocating here).
Well, I did say “to the extent permitted by 1”—there’s probably conflict here—but I wasn’t suggesting CEV as something that makes coordination easy. I’m saying it’s a good principle for judging final outcomes between two different paths that have similar levels of coordination. Ofc we’d have to estimate the “happiness in hindsight”, but this looks tractable to me.
I think:
Finding principles for AI “behavioural engineering” that reduces people’s desire to engage in risky races (e.g. because they find the principles acceptable) seems highly valuable
To the extent permitted by 1, pursuing something CEV like (“we’re happier with the outcome in hindsight than we would’ve been with other outcomes”) seems desirable also
I sort of see the former as potentially encouraging diversity (because different groups want different things, and are most likely to agree to “everyone gets what they want”), but the latter may in fact suggest convergence (because, perhaps, there are fairly universal answers to “what makes people happy with the benefit of hindsight?”).
You stress the importance of having robust feedback procedures, but having overall goals like this can help to judge which procedures are actually doing what we want.
Not sure what you mean by this comment.
Doesn’t this hindsight-based definition of CEV that you offer here preclude using it as an objective to coordinate around? You can’t coordinate around something that you’ll only know in hindsight. And however you try to estimate it, if you try to get people to coordinate around your estimated CEV than some groups will feel like it doesn’t represent their view of how to make the world a better place, or it doesn’t prioritize X over Y in the way they would, etc.
I mean, even if you’re mostly pursuing a particular set of final values (which is not what you’re advocating here), there are probably strong reasons to make coordination a high priority (which is close to what you’re advocating here).
Well, I did say “to the extent permitted by 1”—there’s probably conflict here—but I wasn’t suggesting CEV as something that makes coordination easy. I’m saying it’s a good principle for judging final outcomes between two different paths that have similar levels of coordination. Ofc we’d have to estimate the “happiness in hindsight”, but this looks tractable to me.