johnswentworth comments on Why You Should Care About Goal-Directedness

johnswentworth 13 Nov 2020 0:31 UTC
LW: 11 AF: 7
AF
A few other ways in which goal-directedness intersects with abstraction:
- abstraction as an instrumentally convergent tool: to the extent that computation is limited but the universe is local, we’d expect abstraction to be used internally by optimizers of many different goals.
- instrumental convergence to specific abstract models: the specific abstract model used should be relatively insensitive to variation in the goal.
- type signature of the goal: to the extent that humans are goal-directed, our goals involve high-level objects (like cars or trees), not individual atoms.
- embedded agency = abstraction + generality + goal-directedness. Roughly speaking, an embedded agent is a low-level system which abstracts into a goal-directed system, and that goal-directed system can operate across a wide range of environments requiring different behaviors.
what can be thrown out of the perfect model to get a simpler non-self-referential model (an abstraction) that is useful for a specific purpose?
Kind of tangential, but it’s actually the other way around. The low-level world is “non-self-referential”; the universe itself is just one big causal DAG. In order to get a compact representation of it (i.e. a small enough representation to fit in our heads, which are themselves inside the low-level world), we sometimes throw away information in a way which leaves a simpler “self-referential” abstract model. This is a big part of how I think about agenty things in a non-agenty underlying world.
- adamShimi 13 Nov 2020 18:00 UTC
  LW: 1 AF: 1
  AF Parent
  Thanks for the additional ideas! I especially concur about the type signature of goals and the instrumental convergence to abstract models.
  Kind of tangential, but it’s actually the other way around. The low-level world is “non-self-referential”; the universe itself is just one big causal DAG. In order to get a compact representation of it (i.e. a small enough representation to fit in our heads, which are themselves inside the low-level world), we sometimes throw away information in a way which leaves a simpler “self-referential” abstract model. This is a big part of how I think about agenty things in a non-agenty underlying world.
  But there’s a difference between the low-level world and a perfect model of the low-level world embedded inside the world, isn’t it? Also, I don’t see how the compact representation is self-referential. If you mean that they can be embedded into the world, that’s not what I meant.
  - johnswentworth 13 Nov 2020 18:39 UTC
    LW: 4 AF: 2
    AF Parent
    I’m not quite clear on what you’re asking, so I’ll say some things which sound relevant.
    I’m embedded in the world, so my world model needs to contain a model of me, which means my world model needs to contain a copy of itself. That’s the sense in which my own world model is self-referential.
    Practically speaking, this basically means taking the tricks from Writing Causal Models Like We Write Programs, and then writing the causal-model-version of a quine. It’s relatively straightforward; the main consequence is that the model is necessarily lazily evaluated (since I’m “too small” to expand the whole thing), and then the interesting question is which queries to the model I can actually answer (even in principle) and how fast I can answer them.
    In particular, based on how game theory works, there’s probably a whole class of optimization queries which can be efficiently answered in-principle within this self-embedded model, but it’s unclear exactly how to set them up so that the algorithm is both correct and always halts.
    My world model is necessarily “high-level” in the sense that I don’t have direct access to all the low-level physics of the real world; I expect that the real world (approximately) abstracts into my model, at least within the regimes I’ve encountered. I probably also have multiple levels of abstraction within my world model, in order to quickly answer a broad range of queries.
    Did that answer the question? If not, can you give an example or two to illustrate what you mean by self-reference?
    - adamShimi 13 Nov 2020 18:57 UTC
      LW: 1 AF: 1
      AF Parent
      Thanks a lot! I think my misunderstanding came from collapsing the computational complexity issues of self-referential simulation (expanding the model costs too much, as you mention) and the pure mathematical issue of defining such a model. In the latter sense, you can definitely have a self referential embedded model.
      I’m embedded in the world, so my world model needs to contain a model of me, which means my world model needs to contain a copy of itself. That’s the sense in which my own world model is self-referential.
      I’m not sure why the last “need” is true. Is it because we’re assuming my world model is good/useful? Because I can imagine a world model where I’m a black box, and so I don’t need to model my own world model.
      - johnswentworth 13 Nov 2020 19:23 UTC
        LW: 2 AF: 1
        AF Parent
        In theory I could treat myself as a black box, though even then I’m going to need at least a functional self model (i.e. model of what outputs yield what inputs) in order to get predictions out of the model for anything in my future light cone.
        But usually I do assume that we want a “complete” world model, in the sense that we’re not ignoring any parts by fiat. We can be uncertain about what my internal structure looks like, but that still leaves us open to update if e.g. we see some FMRI data. What I don’t want is to see some FMRI data and then go “well, can’t do anything with that, because this here black box is off-limits”. When that data comes in, I want to be able to update on it somehow.