johnswentworth comments on The Big Picture Of Alignment (Talk Part 1)

johnswentworth 14 May 2022 17:45 UTC
2 points
Getting bits of information about human values, and being able to aim an AGI at anything, are different problems.
I think these are the same problem? Like, ability-to-narrow-down-a-search-space-or-behavior-space-by-a-factor-of-two is what a bit of information is. If we can’t use the information to narrow down a search space closer to the thing-the-information-is-supposedly-about, then we don’t actually have any information about that thing.
- TekhneMakre 14 May 2022 18:09 UTC
  4 points
  Parent
  >Like, ability-to-narrow-down-a-search-space-or-behavior-space-by-a-factor-of-two is what a bit of information is.
  Information is an upper bound, not a lower bound. The capacity of a channel gives you an upper bound on how many distinct messages you can send, not a lower bound on your performance on some task using messages sent over the channel. If you have a very high info-capacity channel with someone who speaks a different language from you, you don’t have an informational problem, you have some other problem (a translation problem).
  >If we can’t use the information to narrow down a search space closer to the thing-the-information-is-supposedly-about, then we don’t actually have any information about that thing.
  This seems to render the word “information” equivalent to “what we know how to do”, which is not the technical meaning of information. Do you mean to do that? If so, why? It seems like a misframing of the problem, because what’s hard about the problem is that you don’t know how to do something, and don’t know how to gather data about how to do that thing, because you don’t have a clear space of possibilities with a shattering set of clear observable implications of those possibilities. When you don’t know how to do something and don’t have a clear space of possibilities, the sort of pieces of progress you want to make aren’t fungible with each other the way information is fungible with other information.
  [ETA: Like, if the space in question is the space of which “human values” is a member, then I’m saying, our problem isn’t locating human values in that space, our problem is that none of the points in the space are things we can actually implement, because we don’t know how to give any particular values to an AGI.]
  - johnswentworth 15 May 2022 1:24 UTC
    2 points
    Parent
    The Shannon formula doesn’t define what information is, it it quantifies amount of information. People occasionally point this out as being kind of philosophically funny—we know how to measure amount of information, but we don’t really have a good definition of what information is. Talking about what information is immediately runs into the question of what the information is about, how the information relates to the thing(s) it’s about, etc.
    Those are basically similar to the problems one runs into when talking about e.g. an AI’s objective and whether it’s “aligned with” something in the physical world. Like, this mathematical function (the objective) is supposed to talk about something out in the world, presumably it should relate to those things in the world somehow, etc. I claim it’s basically the same problem: how do we get symbolic information/functions/math-things to reliably “point to” particular things in the world?
    (This is what Yudkowsky, IIUC, would call the “pointer problem”.)
    Framed as a bits-of-information problem, the difficulty is not so much getting enough bits as getting bits which are actually “about” “human values”. (Presumably that’s why my explanations seem so confusing.)
    - TekhneMakre 15 May 2022 3:32 UTC
      2 points
      Parent
      If natural abstractions are a thing, in what sense is “make this AGI have particular effect X” trying to be about human values, if X is expressed using natural abstractions?
      - johnswentworth 15 May 2022 6:05 UTC
        2 points
        Parent
        In that case, it’s not about human values, which is one of the very nice things the natural abstraction hypothesis buys us.