Richard_Ngo comments on Gradations of Inner Alignment Obstacles

Richard_Ngo 30 Apr 2021 16:12 UTC
LW: 2 AF: 1
AF
To me it sounds like you’re describing (some version of) agency, and so the most natural term to use would be mesa-agent.
I’m a bit confused about the relationship between “optimiser” and “agent”, but I tend to think of the latter as more compressed, and so insofar as we’re talking about policies it seems like “agent” is appropriate. Also, mesa-optimiser is taken already (under a definition which assumes that optimisation is equivalent to some kind of internal search).
- abramdemski 1 May 2021 15:31 UTC
  LW: 4 AF: 3
  AF Parent
  I tend to think of the latter as more compressed,
  I’m not sure what you meant by “more compressed”.
  I used to define “agent” as “both a searcher and a controller”, IE, something which uses an internal selection/search of some kind to accomplish an external control task. This might be too restrictive, though.
  - Richard_Ngo 6 May 2021 16:00 UTC
    LW: 2 AF: 1
    AF Parent
    I used to define “agent” as “both a searcher and a controller”
    Oh, I really like this definition. Even if it’s too restrictive, it seems like it gets at something important.
    I’m not sure what you meant by “more compressed”.
    Sorry, that was quite opaque. I guess what I mean is that evolution is an optimiser but isn’t an agent, and in part this has to do with how it’s a very distributed process with no clear boundary around it. Whereas when you have the same problem being solved in a single human brain, then that compression makes it easier to point to the human as being an agent separate from its environment.
    The rest of this comment is me thinking out loud in a somewhat incoherent way; no pressure to read/respond.
    It seems like calling something a “searcher” describes only a very simple interface: at the end of the search, there needs to be some representation of the output which it has found. But that output may be very complex.
    Whereas calling something a “controller” describes a much more complex interface between it and its environment: you need to be able to point not just to outcomes, but also to observations and actions. But each of those actions is usually fairly simple for a pure controller; if it’s complex, then you need search to find which action to take at each step.
    Now, it seems useful to sometimes call evolution a controller. For example, suppose you’re trying to wipe out a virus, but it keeps mutating. Then there’s a straightforward sense in which evolution is “steering” the world towards states where the virus still exists, in the short term. You could also say that it’s steering the world towards states where all organisms have high fitness in the long term, but organisms are so complex that it’s easier to treat them as selected outcomes, and abstract away from the many “actions” by evolution which led to this point.
    In other words, evolution searches using a process of iterative control. Whereas humans control using a process of iterative search.
    (As a side note, I’m now thinking that “search” isn’t quite the right word, because there are other ways to do selection than search. For example, if I construct a mathematical proof (or a poem) by writing it one line at a time, letting my intuition guide me, then it doesn’t really seem accurate to say that I’m searching over the space of proofs/poems. Similarly, a chain of reasoning may not branch much, but still end up finding a highly specific conclusion. Yet “selection” also doesn’t really seem like the right word either, because it’s at odds with normal usage, which involves choosing from a preexisting set of options—e.g. you wouldn’t say that a poet is “selecting” a poem. How about “design” as an alternative? Which allows us to be agnostic about how the design occurred—whether it be via a control process like evolution, or a process of search, or a process of reasoning.)