An Orthodox Case Against Utility Functions was a shocking piece to me. Abram spends the first half of the post laying out a view he suspects people hold, but he thinks is clearly wrong, which is a perspective that approaches things “from the starting-point of the universe”. I felt dread reading it, because it was a view I held at the time, and I used as a key background perspective when I discussed bayesian reasoning. The rest of the post lays out an alternative perspective that “starts from the standpoint of the agent”. Instead of my beliefs being about the universe, my beliefs are about my experiences and thoughts.
I generally nod along to a lot of the ‘scientific’ discussion in the 21st century about how the universe works and how reasonable the whole thing is. But I don’t feel I knew in-advance to expect the world around me to operate on simple mathematical principles and be so reasonable. I could’ve woken up in the Harry Potter universe of magic wands and spells. I know I didn’t, but if I did, I think I would be able to act in it? I wouldn’t constantly be falling over myself because I don’t understand how 1 + 1 = 2 anymore? There’s some place I’m starting from that builds up to an understanding of the universe, and doesn’t sneak it in as an ‘assumption’.
And this is what this new perspective does that Abram lays out in technical detail. (I don’t follow it all, for instance I don’t recall why it’s important that the former view assumes that utility is computable.) In conclusion, this piece is a key step from the existing philosophy of agents to the philosophy of embedded agents, or at least it was for me, and it changes my background perspective on rationality. It’s the only post in the early vote that I gave +9.
(I don’t follow it all, for instance I don’t recall why it’s important that the former view assumes that utility is computable.)
Partly because the “reductive utility” view is made a bit more extreme than it absolutely had to be. Partly because I think it’s extremely natural, in the “LessWrong circa 2014 view”, to say sentences like “I don’t even know what it would mean for humans to have uncomputable utility functions—unless you think the brain is uncomputable”. (I think there is, or at least was, a big overlap between the LW crowd and the set of people who like to assume things are computable.) Partly because the post was directly inspired by another alignment researcher saying words similar to those, around 2019.
Without this assumption, the core of the “reductive utility” view would be that it treats utility functions as actual functions from actual world-states to real numbers. These functions wouldn’t have to be computable, but since they’re a basic part of the ontology of agency, it’s natural to suppose they are—in exactly the same way it’s natural to suppose that an agent’s beliefs should be computable, and in a similar way to how it seems natural to suppose that physical laws should be computable.
Ah, I guess you could say that I shoved the computability assumption into the reductive view because I secretly wanted to make 3 different points:
We can define beliefs directly on events, rather than needing “worlds”, and this view seems more general and flexible (and closer to actual reasoning).
We can define utility directly on events, rather than “worlds”, too, and there seem to be similar advantages here.
In particular, uncomputable utility functions seem pretty strange if you think utility is a function on worlds; but if you think it’s defined as a coherent expectation on events, then it’s more natural to suppose that the underlying function on worlds (that would justify the event expectations) isn’t computable.
Rather than make these three points separately, I set up a false dichotomy for illustration.
Also worth highlighting that, like my post Radical Probabilism, this post is mostly communicating insights that it seems Richard Jeffrey had several decades ago.
An Orthodox Case Against Utility Functions was a shocking piece to me. Abram spends the first half of the post laying out a view he suspects people hold, but he thinks is clearly wrong, which is a perspective that approaches things “from the starting-point of the universe”. I felt dread reading it, because it was a view I held at the time, and I used as a key background perspective when I discussed bayesian reasoning. The rest of the post lays out an alternative perspective that “starts from the standpoint of the agent”. Instead of my beliefs being about the universe, my beliefs are about my experiences and thoughts.
I generally nod along to a lot of the ‘scientific’ discussion in the 21st century about how the universe works and how reasonable the whole thing is. But I don’t feel I knew in-advance to expect the world around me to operate on simple mathematical principles and be so reasonable. I could’ve woken up in the Harry Potter universe of magic wands and spells. I know I didn’t, but if I did, I think I would be able to act in it? I wouldn’t constantly be falling over myself because I don’t understand how 1 + 1 = 2 anymore? There’s some place I’m starting from that builds up to an understanding of the universe, and doesn’t sneak it in as an ‘assumption’.
And this is what this new perspective does that Abram lays out in technical detail. (I don’t follow it all, for instance I don’t recall why it’s important that the former view assumes that utility is computable.) In conclusion, this piece is a key step from the existing philosophy of agents to the philosophy of embedded agents, or at least it was for me, and it changes my background perspective on rationality. It’s the only post in the early vote that I gave +9.
(This review is taken from my post Ben Pace’s Controversial Picks for the 2020 Review.)
Partly because the “reductive utility” view is made a bit more extreme than it absolutely had to be. Partly because I think it’s extremely natural, in the “LessWrong circa 2014 view”, to say sentences like “I don’t even know what it would mean for humans to have uncomputable utility functions—unless you think the brain is uncomputable”. (I think there is, or at least was, a big overlap between the LW crowd and the set of people who like to assume things are computable.) Partly because the post was directly inspired by another alignment researcher saying words similar to those, around 2019.
Without this assumption, the core of the “reductive utility” view would be that it treats utility functions as actual functions from actual world-states to real numbers. These functions wouldn’t have to be computable, but since they’re a basic part of the ontology of agency, it’s natural to suppose they are—in exactly the same way it’s natural to suppose that an agent’s beliefs should be computable, and in a similar way to how it seems natural to suppose that physical laws should be computable.
Ah, I guess you could say that I shoved the computability assumption into the reductive view because I secretly wanted to make 3 different points:
We can define beliefs directly on events, rather than needing “worlds”, and this view seems more general and flexible (and closer to actual reasoning).
We can define utility directly on events, rather than “worlds”, too, and there seem to be similar advantages here.
In particular, uncomputable utility functions seem pretty strange if you think utility is a function on worlds; but if you think it’s defined as a coherent expectation on events, then it’s more natural to suppose that the underlying function on worlds (that would justify the event expectations) isn’t computable.
Rather than make these three points separately, I set up a false dichotomy for illustration.
Also worth highlighting that, like my post Radical Probabilism, this post is mostly communicating insights that it seems Richard Jeffrey had several decades ago.