Here’s my summary: reward uncertainty through some extension of a CIRL-like setup, accounting for human irrationality through our scientific knowledge, doing aggregate preference utilitarianism for all of the humans on the planet, discounting people by how well their beliefs map to reality, perhaps downweighting motivations such as envy (to mitigate the problem of everyone wanting positional goods).
Perhaps a dumb question, but is “reward” being used as a noun or verb here? Are we rewarding uncertainty, or is “reward uncertainty” a goal we’re trying to achieve?
As a noun: “reward uncertainty” refers to uncertainty about how valuable various states of the world are, and usually also implies some way of updating beliefs about that based on something like ‘human actions’, under the assumption that humans to some degree/in some way know which states of the world are more valuable and act accordingly.
Perhaps a dumb question, but is “reward” being used as a noun or verb here? Are we rewarding uncertainty, or is “reward uncertainty” a goal we’re trying to achieve?
As a noun: “reward uncertainty” refers to uncertainty about how valuable various states of the world are, and usually also implies some way of updating beliefs about that based on something like ‘human actions’, under the assumption that humans to some degree/in some way know which states of the world are more valuable and act accordingly.
See also Reward Uncertainty.