Matthew Barnett comments on The Hidden Complexity of Wishes

Matthew Barnett Oct 19, 2024, 10:38 PM
8 points
1
Matthew is not disputing this point, as far as I can tell.
Instead, he is trying to critique some version of^[1] the “larger argument” (mentioned in the May 2024 update to this post) in which this point plays a role.
I’ll confirm that I’m not saying this post’s exact thesis is false. This post seems to be largely a parable about a fictional device, rather than an explicit argument with premises and clear conclusions. I’m not saying the parable is wrong. Parables are rarely “wrong” in a strict sense, and I am not disputing this parable’s conclusion.
However, I am saying: this parable presumably played some role in the “larger” argument that MIRI has made in the past. What role did it play? Well, I think a good guess is that it portrayed the difficulty of precisely specifying what you want or intend, for example when explicitly designing a utility function. This problem was often alleged to be difficult because, when you want something complex, it’s difficult to perfectly delineate potential “good” scenarios and distinguish them from all potential “bad” scenarios. This is the problem I was analyzing in my original comment.
While the term “outer alignment” was not invented to describe this exact problem until much later, I was using that term purely as descriptive terminology for the problem this post clearly describes, rather than claiming that Eliezer in 2007 was deliberately describing something that he called “outer alignment” at the time. Because my usage of “outer alignment” was merely descriptive in this sense, I reject the idea that my comment was anachronistic.
And again: I am not claiming that this post is inaccurate in isolation. In both my above comment, and in my 2023 post, I merely cited this post as portraying an aspect of the problem that I was talking about, rather than saying something like “this particular post’s conclusion is wrong”. I think the fact that the post doesn’t really have a clear thesis in the first place means that it can’t be wrong in a strong sense at all. However, the post was definitely interpreted as explaining some part of why alignment is hard — for a long time by many people — and I was critiquing the particular application of the post to this argument, rather than the post itself in isolation.
- TsviBT Oct 21, 2024, 7:49 AM
  29 points
  2
  Parent
  Here’s an argument that alignment is difficult which uses complexity of value as a subpoint:
  - A1. If you try to manually specify what you want, you fail.
  - A2. Therefore, you want something algorithmically complex.
  - B1. When humanity makes an AGI, the AGI will have gotten values via some process; that process induces some probability distribution over what values the AGI ends up with.
  - B2. We want to affect the values-distribution, somehow, so that it ends up with our values.
  - B3. We don’t understand how to affect the values-distribution toward something specific.
  - B4. If we don’t affect the value-distribution toward something specific, then the values-distribution probably puts large penalties for absolute algorithmic complexity; any specific utility function with higher absolute algorithmic complexity will be less likely to be the one that the AGI ends up with.
  - C1. Because of A2 (our values are algorithmically complex) and B4 (a complex utility function is unlikely to show up in an AGI without us skillfully intervening), an AGI is unlikely to have our values without us skillfully intervening.
  - C2. Because of B3 (we don’t know how to skillfully intervene on an AGI’s values) and C1, an AGI is unlikely to have our values.
  I think that you think that the argument under discussion is something like:
  - (same) A1. If you try to manually specify what you want, you fail.
  - (same) A2. Therefore, you want something algorithmically complex.
  - (same) B1. When humanity makes an AGI, the AGI will have gotten values via some process; that process induces some probability distribution over what values the AGI ends up with.
  - (same) B2. We want to affect the values-distribution, somehow, so that it ends up with our values.
  - B′3. The greater the complexity of our values, the harder it is to point at our values.
  - B′4. The harder it is to point at our values, the more work or difficulty is involved in B2.
  - C′1. By B′3 and B′4: the greater the complexity of our values, the more work or difficulty is involved in B2 (determining the AGI’s values).
  - C′2. Because of A2 (our values are algorithmically complex) and C′1, it would take a lot of work to make an AGI pursue our values.
  These are different arguments, which make use of the complexity of values in different ways. You dispute B′3 on the grounds that it can be easy to point at complex values. B′3 isn’t used in the first argument though.
  - nostalgebraist Oct 21, 2024, 11:27 PM
    8 points
    0
    Parent
    In the situation assumed by your first argument, AGI would be very unlikely to share our values even if our values were much simpler than they are.
    Complexity makes things worse, yes, but the conclusion “AGI is unlikely to have our values” is already entailed by the other premises even if we drop the stuff about complexity.
    Why: if we’re just sampling some function from a simplicity prior, we’re very unlikely to get any particular nontrivial function that we’ve decided to care about in advance of the sampling event. There are just too many possible functions, and probability mass has to get divided among them all.
    In other words, if it takes $N$ bits to specify human values, there are $2^{N}$ ways that a bitstring of the same length could be set, and we’re hoping to land on just one of those through luck alone. (And to land on a bitstring of this specific length in the first place, of course.) Unless $N$ is very small, such a coincidence is extremely unlikely.
    And $N$ is not going to be that small; even in the sort of naive and overly simple “hand-crafted” value specifications which EY has critiqued in this post and elsewhere, a lot of details have to be specified. (E.g. some proposals refer to “humans” and so a full algorithmic description of them would require an account of what is and isn’t a human.)
    One could devise a variant of this argument that doesn’t have this issue, by “relaxing the problem” so that we have some control, just not enough to pin down the sampled function exactly. And then the remaining freedom is filled randomly with a simplicity bias. This partial control might be enough to make a simple function likely, while not being able to make a more complex function likely. (Hmm, perhaps this is just your second argument, or a version of it.)
    This kind of reasoning might be applicable in a world where its premises are true, but I don’t think it’s premises are true in our world.
    In practice, we apparently have no trouble getting machines to compute very complex functions, including (as Matthew points out) specifications of human value whose robustness would have seemed like impossible magic back in 2007. The main difficulty, if there is one, is in “getting the function to play the role of the AGI values,” not in getting the AGI to compute the particular function we want in the first place.
    - TsviBT Oct 22, 2024, 3:58 PM
      13 points
      11
      Parent
      
      The main difficulty, if there is one, is in “getting the function to play the role of the AGI values,” not in getting the AGI to compute the particular function we want in the first place.
      
      Right, that is the problem (and IDK of anyone discussing this who says otherwise).
      
      Another position would be that it’s probably easy to influence a few bits of the AI’s utility function, but not others. For example, it’s conceivable that, by doing capabilities research in different ways, you could increase the probability that the AGI is highly ambitious—e.g. tries to take over the whole lightcone, tries to acausally bargain, etc., rather than being more satisficy. (IDK how to do that, but plausibly it’s qualitatively easier than alignment.) Then you could claim that it’s half a bit more likely that you’ve made an FAI, given that an FAI would probably be ambitious. In this case, it does matter that the utility function is complex.