This is interesting. My initial instinct was to disagree, then to think you’re pointing to something real… and now I’m unsure :)
First, I don’t think your examples directly disagree with what I’m saying. Saying that our preferences can be represented by a UF over histories is not to say that these preferences only care about the physical history of our universe—they can care about non-physical predictions too (desirable anthropic measures and universal-prior-based manipulations included).
So then I assume we say something like: ”This makes our UF representation identical to that of a set of preferences which does only care about the physical history of our universe. Therefore we’ve lost that caring-about-other-worlds aspect of our values. The UF might fully determine actions in accordance with our values, but it doesn’t fully express the values themselves.”
Strictly, this seems true to me—but in practice I think we might be guilty of ignoring much of the content of our UF. For example, our UF contains preferences over histories containing philosophy discussions.
Now I claim that it’s logically possible for a philosophy discussion to have no significant consequences outside the discussion (I realise this is hard to imagine, but please try). Our UF will say something about such discussions. If such a UF is both fully consistent with having particular preferences over [anthropic measures, acausal trade, universal-prior-based influence...], and prefers philosophical statements that argue for precisely these preferences, we seem to have to be pretty obtuse to stick with “this is still perfectly consistent with caring only about [histories of the physical world]”.
It’s always possible to interpret such a UF as encoding only preferences directly about histories of the physical world. It’s also possible to think that this post is in Russian, but contains many typos. I submit that это маловероятно.
If we say that the [preferences ‘of’ a UF] are the [distribution over preferences we’d ascribe to an agent acting according to that UF (over some large set of environments)], then I think we capture the “something substantive” with substantial probability mass in most cases. (not always through this kind of arguing-for-itself mechanism; the more general point is that the UF contains huge amounts of information, and it’ll be surprising if the expression of particular preferences doesn’t show up in a priori unlikely patterns)
If we’re still losing something, it feels like an epsilon’s worth in most cases. Perhaps there are important edge cases??
Note that I’m only claiming “almost precisely the information you’re talking about is in there somewhere”, not that the UF is necessarily a useful/efficient/clear way to present the information. This is exactly the role I endorse for other perspectives: avoiding offensively impractical encodings of things we care about.
A second note: in practice, we’re starting out with an uncertain world. Therefore, the inability of a UF over universe histories to express outside-the-universe-history preferences with certainty may not be of real-world relevance. Outside an idealised model, certainty won’t happen for any approach.
This is interesting. My initial instinct was to disagree, then to think you’re pointing to something real… and now I’m unsure :)
First, I don’t think your examples directly disagree with what I’m saying. Saying that our preferences can be represented by a UF over histories is not to say that these preferences only care about the physical history of our universe—they can care about non-physical predictions too (desirable anthropic measures and universal-prior-based manipulations included).
So then I assume we say something like:
”This makes our UF representation identical to that of a set of preferences which does only care about the physical history of our universe. Therefore we’ve lost that caring-about-other-worlds aspect of our values. The UF might fully determine actions in accordance with our values, but it doesn’t fully express the values themselves.”
Strictly, this seems true to me—but in practice I think we might be guilty of ignoring much of the content of our UF. For example, our UF contains preferences over histories containing philosophy discussions.
Now I claim that it’s logically possible for a philosophy discussion to have no significant consequences outside the discussion (I realise this is hard to imagine, but please try).
Our UF will say something about such discussions. If such a UF is both fully consistent with having particular preferences over [anthropic measures, acausal trade, universal-prior-based influence...], and prefers philosophical statements that argue for precisely these preferences, we seem to have to be pretty obtuse to stick with “this is still perfectly consistent with caring only about [histories of the physical world]”.
It’s always possible to interpret such a UF as encoding only preferences directly about histories of the physical world. It’s also possible to think that this post is in Russian, but contains many typos. I submit that это маловероятно.
If we say that the [preferences ‘of’ a UF] are the [distribution over preferences we’d ascribe to an agent acting according to that UF (over some large set of environments)], then I think we capture the “something substantive” with substantial probability mass in most cases.
(not always through this kind of arguing-for-itself mechanism; the more general point is that the UF contains huge amounts of information, and it’ll be surprising if the expression of particular preferences doesn’t show up in a priori unlikely patterns)
If we’re still losing something, it feels like an epsilon’s worth in most cases.
Perhaps there are important edge cases??
Note that I’m only claiming “almost precisely the information you’re talking about is in there somewhere”, not that the UF is necessarily a useful/efficient/clear way to present the information.
This is exactly the role I endorse for other perspectives: avoiding offensively impractical encodings of things we care about.
A second note: in practice, we’re starting out with an uncertain world. Therefore, the inability of a UF over universe histories to express outside-the-universe-history preferences with certainty may not be of real-world relevance. Outside an idealised model, certainty won’t happen for any approach.