(I don’t follow it all, for instance I don’t recall why it’s important that the former view assumes that utility is computable.)
Partly because the “reductive utility” view is made a bit more extreme than it absolutely had to be. Partly because I think it’s extremely natural, in the “LessWrong circa 2014 view”, to say sentences like “I don’t even know what it would mean for humans to have uncomputable utility functions—unless you think the brain is uncomputable”. (I think there is, or at least was, a big overlap between the LW crowd and the set of people who like to assume things are computable.) Partly because the post was directly inspired by another alignment researcher saying words similar to those, around 2019.
Without this assumption, the core of the “reductive utility” view would be that it treats utility functions as actual functions from actual world-states to real numbers. These functions wouldn’t have to be computable, but since they’re a basic part of the ontology of agency, it’s natural to suppose they are—in exactly the same way it’s natural to suppose that an agent’s beliefs should be computable, and in a similar way to how it seems natural to suppose that physical laws should be computable.
Ah, I guess you could say that I shoved the computability assumption into the reductive view because I secretly wanted to make 3 different points:
We can define beliefs directly on events, rather than needing “worlds”, and this view seems more general and flexible (and closer to actual reasoning).
We can define utility directly on events, rather than “worlds”, too, and there seem to be similar advantages here.
In particular, uncomputable utility functions seem pretty strange if you think utility is a function on worlds; but if you think it’s defined as a coherent expectation on events, then it’s more natural to suppose that the underlying function on worlds (that would justify the event expectations) isn’t computable.
Rather than make these three points separately, I set up a false dichotomy for illustration.
Also worth highlighting that, like my post Radical Probabilism, this post is mostly communicating insights that it seems Richard Jeffrey had several decades ago.
Partly because the “reductive utility” view is made a bit more extreme than it absolutely had to be. Partly because I think it’s extremely natural, in the “LessWrong circa 2014 view”, to say sentences like “I don’t even know what it would mean for humans to have uncomputable utility functions—unless you think the brain is uncomputable”. (I think there is, or at least was, a big overlap between the LW crowd and the set of people who like to assume things are computable.) Partly because the post was directly inspired by another alignment researcher saying words similar to those, around 2019.
Without this assumption, the core of the “reductive utility” view would be that it treats utility functions as actual functions from actual world-states to real numbers. These functions wouldn’t have to be computable, but since they’re a basic part of the ontology of agency, it’s natural to suppose they are—in exactly the same way it’s natural to suppose that an agent’s beliefs should be computable, and in a similar way to how it seems natural to suppose that physical laws should be computable.
Ah, I guess you could say that I shoved the computability assumption into the reductive view because I secretly wanted to make 3 different points:
We can define beliefs directly on events, rather than needing “worlds”, and this view seems more general and flexible (and closer to actual reasoning).
We can define utility directly on events, rather than “worlds”, too, and there seem to be similar advantages here.
In particular, uncomputable utility functions seem pretty strange if you think utility is a function on worlds; but if you think it’s defined as a coherent expectation on events, then it’s more natural to suppose that the underlying function on worlds (that would justify the event expectations) isn’t computable.
Rather than make these three points separately, I set up a false dichotomy for illustration.
Also worth highlighting that, like my post Radical Probabilism, this post is mostly communicating insights that it seems Richard Jeffrey had several decades ago.