Sure, every time you go more abstract there are fewer degrees of freedom. But there’s no free lunch—there are degrees of freedom in how the more-abstract variables are connected to less-abstract ones.
People who want different things might make different abstractions. E.g. if you’re calling some high level abstraction “eat good food,” it’s not that this is mathematically the same abstraction made by someone who thinks good food is pizza and someone else who thinks good food is fish. Not even if those people independently keep going higher in the abstraction hierarchy—they’ll never converge to the same object, because there’s always that inequivalence in how they’re translated back to the low level description.
Yes, at high levels of abstraction, humans can all recommend the same abstract action. But I don’t care about abstract actions, I care about real-world actions.
E.g. suppose we abstract the world to an ontology where there are two states, “good” and “bad,” and two actions—stay or swap. Lo and behold, ~everyone who abstracts the world to this ontology will converge to the same policy in terms of abstract actions: make the world good rather than bad. But if two people disagree utterly about which low-level states get mapped onto the “good” state, they’ll disagree utterly about which low-level actions get mapped onto the “swap from bad to good” action, and this abstraction hasn’t really bought us anything.
People who want different things might make different abstractions
That’s a direct rejection of the natural abstractions hypothesis. And some form of it increasingly seems just common-sensically true.
It’s indeed the case that one’s choice of what system to model is dependent on what they care about/where their values are housed (whether I care to model the publishing industry, say). But once the choice to model a given system is made, the abstractions are in the territory. They fall out of noticing to which simpler systems a given system can be reduced.
(Imagine you have a low-level description of a system defined in terms of individual gravitationally- and electromagnetically-interacting particles. Unbeknownst to you, the system describes two astronomical objects orbiting each other. Given some abstracting-up algorithm, we can notice that this system reduces to these two bodies orbiting each other (under some definition of approximation).
It’s not value-laden at all: it’s simply a true mathematical fact about the system’s dynamics.
The NAH is that this generalizes, very widely.)
Not even if those people independently keep going higher in the abstraction hierarchy—they’ll never converge to the same object, because there’s always that inequivalence in how they’re translated back to the low level description.
I mean, that’s clearly not how it works in practice? Take the example in the post literally: two people disagree on food preferences, but can agree on the “food” abstraction and on both of them having a preference for subjectively tasty ones.
suppose we abstract the world to an ontology where there are two states, “good” and “bad,”
If your model is assumed, i. e. that abstractions are inherently value-laden, then yes, this is possible. But that’s not how it’d work under the NAH and on my model, because “good” and “bad” are not objective high-level states a given system could be in.
It’d be something like State A and State B. And then the “human values converge” hypothesis is that all human values would converge to preferring one of these states.
Not even if those people independently keep going higher in the abstraction hierarchy—they’ll never converge to the same object, because there’s always that inequivalence in how they’re translated back to the low level description.
I mean, that’s clearly not how it works in practice? Take the example in the post literally: two people disagree on food preferences, but can agree on the “food” abstraction and on both of them having a preference for subjectively tasty ones.
I agree with the part of what you just said that’s the NAH, but disagree with your interpretation.
Both people can recognize that there’s a good abstraction here, where what they care about is subjectively tasty food. But this interpersonal abstraction is no longer an abstraction of their values, it simply happens to be about their values, sometimes. It can no longer be cashed out into specific recommendations of real-world actions in the way someone’s values can[1].
We have some system with a low-level state l, which can take on one of six values: {a,b,c,d,e,f}.
We can abstract over this system’s state and get a high-level state h, which can take on one of two states: {x,y}.
We have an objective abstracting-up function f(l)=h.
We have the following mappings between states:
∀l∈{a,b,c}:f(l)=x
∀l∈{d,e,f}:f(l)=y
We have an utility function UA(l), with a preference ordering of a>b>c≫d≈e≈f, and an utility function UB(l), with a preference ordering of c>b>a≫d≈e≈f.
We translate both utility functions to h, and get the same utility function: U(h) whose preference ordering is x>y.
Thus, both UA(l) and UB(l) can agree on which high-level state they would greatly prefer. No low-level state would maximally satisfy both of them, but they both would be happy enough with any low-level state that gets mapped to the high-level state of x. (b is the obvious compromise.)
I disagree that translating to x and y let you “reduce the degrees of freedom” or otherwise get any sort of discount lunch. At the end you still had to talk about the low level states again to say they should compromise on b (or not compromise and fight it out over c vs. a, that’s always an option).
At the end you still had to talk about the low level states again to say they should compromise on b
“Compromising on b” is a more detailed implementation that can easily be omitted. The load-bearing part is “both would be happy enough with any low-level state that gets mapped to the high-level state of x”.
For example, the policy of randomly sampling any l such that f(l)=x is something both utility functions can agree on, and doesn’t require doing any additional comparisons of low-level preferences, once the high-level state has been agreed upon. Rising tide lifts all boats, etc.
Suppose the two agents are me and a flatworm.
a = ideal world according to me
b = status quo
c = ideal world according to the flatworm
d, e, f = various deliberately-bad-to-both worlds
I’m not going to stop trying to improve the world just because the flatworm prefers the status quo, and I wouldn’t be “happy enough” if we ended up in flatworm utopia.
What bargains I would agree to, and how I would feel about them, are not safe to abstract away.
I wouldn’t be “happy enough” if we ended up in flatworm utopia
You would, presumably, be quite happy compared to “various deliberately-bad-to-both worlds”.
I’m not going to stop trying to improve the world just because the flatworm prefers the status quo
Because you don’t care about the flatworm and the flatworm is not perceived by you as having much bargaining power for you to bend to its preferences.
In addition, your model rules out more fine-grained ideas like “the cubic mile of terrain around the flatworm remains unchanged while I get the rest of the universe”. Which is plausibly what CEV would result in: everyone gets their own safe garden, with the only concession the knowledge that everyone else’s safe gardens also exist.
Sure, every time you go more abstract there are fewer degrees of freedom. But there’s no free lunch—there are degrees of freedom in how the more-abstract variables are connected to less-abstract ones.
People who want different things might make different abstractions. E.g. if you’re calling some high level abstraction “eat good food,” it’s not that this is mathematically the same abstraction made by someone who thinks good food is pizza and someone else who thinks good food is fish. Not even if those people independently keep going higher in the abstraction hierarchy—they’ll never converge to the same object, because there’s always that inequivalence in how they’re translated back to the low level description.
Yes, at high levels of abstraction, humans can all recommend the same abstract action. But I don’t care about abstract actions, I care about real-world actions.
E.g. suppose we abstract the world to an ontology where there are two states, “good” and “bad,” and two actions—stay or swap. Lo and behold, ~everyone who abstracts the world to this ontology will converge to the same policy in terms of abstract actions: make the world good rather than bad. But if two people disagree utterly about which low-level states get mapped onto the “good” state, they’ll disagree utterly about which low-level actions get mapped onto the “swap from bad to good” action, and this abstraction hasn’t really bought us anything.
That’s a direct rejection of the natural abstractions hypothesis. And some form of it increasingly seems just common-sensically true.
It’s indeed the case that one’s choice of what system to model is dependent on what they care about/where their values are housed (whether I care to model the publishing industry, say). But once the choice to model a given system is made, the abstractions are in the territory. They fall out of noticing to which simpler systems a given system can be reduced.
(Imagine you have a low-level description of a system defined in terms of individual gravitationally- and electromagnetically-interacting particles. Unbeknownst to you, the system describes two astronomical objects orbiting each other. Given some abstracting-up algorithm, we can notice that this system reduces to these two bodies orbiting each other (under some definition of approximation).
It’s not value-laden at all: it’s simply a true mathematical fact about the system’s dynamics.
The NAH is that this generalizes, very widely.)
I mean, that’s clearly not how it works in practice? Take the example in the post literally: two people disagree on food preferences, but can agree on the “food” abstraction and on both of them having a preference for subjectively tasty ones.
If your model is assumed, i. e. that abstractions are inherently value-laden, then yes, this is possible. But that’s not how it’d work under the NAH and on my model, because “good” and “bad” are not objective high-level states a given system could be in.
It’d be something like State A and State B. And then the “human values converge” hypothesis is that all human values would converge to preferring one of these states.
I agree with the part of what you just said that’s the NAH, but disagree with your interpretation.
Both people can recognize that there’s a good abstraction here, where what they care about is subjectively tasty food. But this interpersonal abstraction is no longer an abstraction of their values, it simply happens to be about their values, sometimes. It can no longer be cashed out into specific recommendations of real-world actions in the way someone’s values can[1].
For certain meanings of “values,” ofc.
Okay, let’s build a toy model.
We have some system with a low-level state l, which can take on one of six values: {a,b,c,d,e,f}.
We can abstract over this system’s state and get a high-level state h, which can take on one of two states: {x,y}.
We have an objective abstracting-up function f(l)=h.
We have the following mappings between states:
∀l∈{a,b,c}:f(l)=x
∀l∈{d,e,f}:f(l)=y
We have an utility function UA(l), with a preference ordering of a>b>c≫d≈e≈f, and an utility function UB(l), with a preference ordering of c>b>a≫d≈e≈f.
We translate both utility functions to h, and get the same utility function: U(h) whose preference ordering is x>y.
Thus, both UA(l) and UB(l) can agree on which high-level state they would greatly prefer. No low-level state would maximally satisfy both of them, but they both would be happy enough with any low-level state that gets mapped to the high-level state of x. (b is the obvious compromise.)
Which part of this do you disagree with?
I disagree that translating to x and y let you “reduce the degrees of freedom” or otherwise get any sort of discount lunch. At the end you still had to talk about the low level states again to say they should compromise on b (or not compromise and fight it out over c vs. a, that’s always an option).
“Compromising on b” is a more detailed implementation that can easily be omitted. The load-bearing part is “both would be happy enough with any low-level state that gets mapped to the high-level state of x”.
For example, the policy of randomly sampling any l such that f(l)=x is something both utility functions can agree on, and doesn’t require doing any additional comparisons of low-level preferences, once the high-level state has been agreed upon. Rising tide lifts all boats, etc.
Suppose the two agents are me and a flatworm.
a = ideal world according to me
b = status quo
c = ideal world according to the flatworm
d, e, f = various deliberately-bad-to-both worlds
I’m not going to stop trying to improve the world just because the flatworm prefers the status quo, and I wouldn’t be “happy enough” if we ended up in flatworm utopia.
What bargains I would agree to, and how I would feel about them, are not safe to abstract away.
You would, presumably, be quite happy compared to “various deliberately-bad-to-both worlds”.
Because you don’t care about the flatworm and the flatworm is not perceived by you as having much bargaining power for you to bend to its preferences.
In addition, your model rules out more fine-grained ideas like “the cubic mile of terrain around the flatworm remains unchanged while I get the rest of the universe”. Which is plausibly what CEV would result in: everyone gets their own safe garden, with the only concession the knowledge that everyone else’s safe gardens also exist.