Here’s an argument that alignment is difficult which uses complexity of value as a subpoint:
A1. If you try to manually specify what you want, you fail.
A2. Therefore, you want something algorithmically complex.
B1. When humanity makes an AGI, the AGI will have gotten values via some process; that process induces some probability distribution over what values the AGI ends up with.
B2. We want to affect the values-distribution, somehow, so that it ends up with our values.
B3. We don’t understand how to affect the values-distribution toward something specific.
B4. If we don’t affect the value-distribution toward something specific, then the values-distribution probably puts large penalties for absolute algorithmic complexity; any specific utility function with higher absolute algorithmic complexity will be less likely to be the one that the AGI ends up with.
C1. Because of A2 (our values are algorithmically complex) and B4 (a complex utility function is unlikely to show up in an AGI without us skillfully intervening), an AGI is unlikely to have our values without us skillfully intervening.
C2. Because of B3 (we don’t know how to skillfully intervene on an AGI’s values) and C1, an AGI is unlikely to have our values.
I think that you think that the argument under discussion is something like:
(same) A1. If you try to manually specify what you want, you fail.
(same) A2. Therefore, you want something algorithmically complex.
(same) B1. When humanity makes an AGI, the AGI will have gotten values via some process; that process induces some probability distribution over what values the AGI ends up with.
(same) B2. We want to affect the values-distribution, somehow, so that it ends up with our values.
B′3. The greater the complexity of our values, the harder it is to point at our values.
B′4. The harder it is to point at our values, the more work or difficulty is involved in B2.
C′1. By B′3 and B′4: the greater the complexity of our values, the more work or difficulty is involved in B2 (determining the AGI’s values).
C′2. Because of A2 (our values are algorithmically complex) and C′1, it would take a lot of work to make an AGI pursue our values.
These are different arguments, which make use of the complexity of values in different ways. You dispute B′3 on the grounds that it can be easy to point at complex values. B′3 isn’t used in the first argument though.
In the situation assumed by your first argument, AGI would be very unlikely to share our values even if our values were much simpler than they are.
Complexity makes things worse, yes, but the conclusion “AGI is unlikely to have our values” is already entailed by the other premises even if we drop the stuff about complexity.
Why: if we’re just sampling some function from a simplicity prior, we’re very unlikely to get any particular nontrivial function that we’ve decided to care about in advance of the sampling event. There are just too many possible functions, and probability mass has to get divided among them all.
In other words, if it takes N bits to specify human values, there are 2N ways that a bitstring of the same length could be set, and we’re hoping to land on just one of those through luck alone. (And to land on a bitstring of this specific length in the first place, of course.) Unless N is very small, such a coincidence is extremely unlikely.
And N is not going to be that small; even in the sort of naive and overly simple “hand-crafted” value specifications which EY has critiqued in this post and elsewhere, a lot of details have to be specified. (E.g. some proposals refer to “humans” and so a full algorithmic description of them would require an account of what is and isn’t a human.)
One could devise a variant of this argument that doesn’t have this issue, by “relaxing the problem” so that we have some control, just not enough to pin down the sampled function exactly. And then the remaining freedom is filled randomly with a simplicity bias. This partial control might be enough to make a simple function likely, while not being able to make a more complex function likely. (Hmm, perhaps this is just your second argument, or a version of it.)
This kind of reasoning might be applicable in a world where its premises are true, but I don’t think it’s premises are true in our world.
In practice, we apparently have no trouble getting machines to compute very complex functions, including (as Matthew points out) specifications of human value whose robustness would have seemed like impossible magic back in 2007. The main difficulty, if there is one, is in “getting the function to play the role of the AGI values,” not in getting the AGI to compute the particular function we want in the first place.
The main difficulty, if there is one, is in “getting the function to play the role of the AGI values,” not in getting the AGI to compute the particular function we want in the first place.
Right, that is the problem (and IDK of anyone discussing this who says otherwise).
Another position would be that it’s probably easy to influence a few bits of the AI’s utility function, but not others. For example, it’s conceivable that, by doing capabilities research in different ways, you could increase the probability that the AGI is highly ambitious—e.g. tries to take over the whole lightcone, tries to acausally bargain, etc., rather than being more satisficy. (IDK how to do that, but plausibly it’s qualitatively easier than alignment.) Then you could claim that it’s half a bit more likely that you’ve made an FAI, given that an FAI would probably be ambitious. In this case, it does matter that the utility function is complex.
Here’s an argument that alignment is difficult which uses complexity of value as a subpoint:
A1. If you try to manually specify what you want, you fail.
A2. Therefore, you want something algorithmically complex.
B1. When humanity makes an AGI, the AGI will have gotten values via some process; that process induces some probability distribution over what values the AGI ends up with.
B2. We want to affect the values-distribution, somehow, so that it ends up with our values.
B3. We don’t understand how to affect the values-distribution toward something specific.
B4. If we don’t affect the value-distribution toward something specific, then the values-distribution probably puts large penalties for absolute algorithmic complexity; any specific utility function with higher absolute algorithmic complexity will be less likely to be the one that the AGI ends up with.
C1. Because of A2 (our values are algorithmically complex) and B4 (a complex utility function is unlikely to show up in an AGI without us skillfully intervening), an AGI is unlikely to have our values without us skillfully intervening.
C2. Because of B3 (we don’t know how to skillfully intervene on an AGI’s values) and C1, an AGI is unlikely to have our values.
I think that you think that the argument under discussion is something like:
(same) A1. If you try to manually specify what you want, you fail.
(same) A2. Therefore, you want something algorithmically complex.
(same) B1. When humanity makes an AGI, the AGI will have gotten values via some process; that process induces some probability distribution over what values the AGI ends up with.
(same) B2. We want to affect the values-distribution, somehow, so that it ends up with our values.
B′3. The greater the complexity of our values, the harder it is to point at our values.
B′4. The harder it is to point at our values, the more work or difficulty is involved in B2.
C′1. By B′3 and B′4: the greater the complexity of our values, the more work or difficulty is involved in B2 (determining the AGI’s values).
C′2. Because of A2 (our values are algorithmically complex) and C′1, it would take a lot of work to make an AGI pursue our values.
These are different arguments, which make use of the complexity of values in different ways. You dispute B′3 on the grounds that it can be easy to point at complex values. B′3 isn’t used in the first argument though.
In the situation assumed by your first argument, AGI would be very unlikely to share our values even if our values were much simpler than they are.
Complexity makes things worse, yes, but the conclusion “AGI is unlikely to have our values” is already entailed by the other premises even if we drop the stuff about complexity.
Why: if we’re just sampling some function from a simplicity prior, we’re very unlikely to get any particular nontrivial function that we’ve decided to care about in advance of the sampling event. There are just too many possible functions, and probability mass has to get divided among them all.
In other words, if it takes N bits to specify human values, there are 2N ways that a bitstring of the same length could be set, and we’re hoping to land on just one of those through luck alone. (And to land on a bitstring of this specific length in the first place, of course.) Unless N is very small, such a coincidence is extremely unlikely.
And N is not going to be that small; even in the sort of naive and overly simple “hand-crafted” value specifications which EY has critiqued in this post and elsewhere, a lot of details have to be specified. (E.g. some proposals refer to “humans” and so a full algorithmic description of them would require an account of what is and isn’t a human.)
One could devise a variant of this argument that doesn’t have this issue, by “relaxing the problem” so that we have some control, just not enough to pin down the sampled function exactly. And then the remaining freedom is filled randomly with a simplicity bias. This partial control might be enough to make a simple function likely, while not being able to make a more complex function likely. (Hmm, perhaps this is just your second argument, or a version of it.)
This kind of reasoning might be applicable in a world where its premises are true, but I don’t think it’s premises are true in our world.
In practice, we apparently have no trouble getting machines to compute very complex functions, including (as Matthew points out) specifications of human value whose robustness would have seemed like impossible magic back in 2007. The main difficulty, if there is one, is in “getting the function to play the role of the AGI values,” not in getting the AGI to compute the particular function we want in the first place.
Right, that is the problem (and IDK of anyone discussing this who says otherwise).
Another position would be that it’s probably easy to influence a few bits of the AI’s utility function, but not others. For example, it’s conceivable that, by doing capabilities research in different ways, you could increase the probability that the AGI is highly ambitious—e.g. tries to take over the whole lightcone, tries to acausally bargain, etc., rather than being more satisficy. (IDK how to do that, but plausibly it’s qualitatively easier than alignment.) Then you could claim that it’s half a bit more likely that you’ve made an FAI, given that an FAI would probably be ambitious. In this case, it does matter that the utility function is complex.