Edouard Harris comments on Re-Define Intent Alignment?

Edouard Harris 4 Aug 2021 17:46 UTC
LW: 1 AF: 1
AF
Ah I see! Thanks for clarifying.
Yes, the point about the Cartesian boundary is important. And it’s completely true that any agent / environment boundary we draw will always be arbitrary. But that doesn’t mean one can’t usefully draw such a boundary in the real world — and unless one does, it’s hard to imagine how one could ever generate a working definition of something like a mesa-objective. (Because you’d always be unable to answer the legitimate question: “the mesa-objective of what?”)
Of course the right question will always be: “what is the whole universe optimizing for?” But it’s hard to answer that! So in practice, we look at bits of the whole universe that we pretend are isolated. All I’m saying is that, to the extent you can meaningfully ask the question, “what is this bit of the universe optimizing for?”, you should be able to clearly demarcate which bit you’re asking about.
(i.e. I agree with you that duality is a useful fiction, just saying that we can still use it to construct useful definitions.)
- abramdemski 4 Aug 2021 18:16 UTC
  LW: 3 AF: 3
  AF Parent
  I would further add that looking for difficulties created by the simplification seems very intellectually productive. (Solving “embedded agency problems” seems to genuinely allow you to do new things, rather than just soothing philosophical worries.) But yeah, I would agree that if we’re defining mesa-objective anyway, we’re already in the business of assuming some agent/environment boundary.
  - Edouard Harris 4 Aug 2021 18:37 UTC
    LW: 1 AF: 1
    AF Parent
    I would further add that looking for difficulties created by the simplification seems very intellectually productive.
    Yep, strongly agree. And a good first step to doing this is to actually build as robust a simplification as you can, and then see where it breaks. (Working on it.)
- jbkjr 5 Aug 2021 10:49 UTC
  LW: 1 AF: 1
  AF Parent
  
  (Because you’d always be unable to answer the legitimate question: “the mesa-objective of what?”)
  
  All I’m saying is that, to the extent you can meaningfully ask the question, “what is this bit of the universe optimizing for?”, you should be able to clearly demarcate which bit you’re asking about.
  
  I totally agree with this; I guess I’m just (very) wary about being able to “clearly demarcate” whichever bit we’re asking about and therefore fairly pessimistic we can “meaningfully” ask the question to begin with? Like, if you start asking yourself questions like “what am ‘I’ optimizing for?,” and then try to figure out exactly what the demarcation is between “you” and “everything else” in order to answer that question, you’re gonna have a real tough time finding anything close to a satisfactory answer.
  - Edouard Harris 5 Aug 2021 13:19 UTC
    LW: 1 AF: 1
    AF Parent
    Yeah I agree this is a legitimate concern, though it seems like it is definitely possible to make such a demarcation in toy universes (like in the example I gave above). And therefore it ought to be possible in principle to do so in our universe.
    To try to understand a bit better: does your pessimism about this come from the hardness of the technical challenge of querying a zillion-particle entity for its objective function? Or does it come from the hardness of the definitional challenge of exhaustively labeling every one of those zillion particles to make sure the demarcation is fully specified? Or is there a reason you think constructing any such demarcation is impossible even in principle? Or something else?
    - jbkjr 5 Aug 2021 14:18 UTC
      LW: 1 AF: 1
      AF Parent
      
      To try to understand a bit better: does your pessimism about this come from the hardness of the technical challenge of querying a zillion-particle entity for its objective function? Or does it come from the hardness of the definitional challenge of exhaustively labeling every one of those zillion particles to make sure the demarcation is fully specified? Or is there a reason you think constructing any such demarcation is impossible even in principle? Or something else?
      
      Probably something like the last one, although I think “even in principle” is doing some probably doing something suspicious in that statement. Like, sure, “in principle,” you can pretty much construct any demarcation you could possibly imagine, including the Cartesian one, but what I’m trying to say is something like, “all demarcations, by their very nature, exist only in the map, not the territory.” Carving reality is an operation that could only make sense within the context of a map, as reality simply is. Your concept of “agent” is defined in terms of other representations that similarly exist only within your world-model; other humans have a similar concept of “agent” because they have a similar representation built from correspondingly similar parts. If an AI is to understand the human notion of “agency,” it will need to also understand plenty of other “things” which are also only abstractions or latent variables within our world models, as well as what those variables “point to” (at least, what variables in the AI’s own world model they ‘point to,’ as by now I hope you’re seeing the problem with trying to talk about “things they point to” in external/‘objective’ reality!).
      - Edouard Harris 6 Aug 2021 19:55 UTC
        LW: 1 AF: 1
        AF Parent
        I’m with you on this, and I suspect we’d agree on most questions of fact around this topic. Of course demarcation is an operation on maps and not on territories.
        But as a practical matter, the moment one starts talking about the definition of something such as a mesa-objective, one has already unfolded one’s map and started pointing to features on it. And frankly, that seems fine! Because historically, a great way to make forward progress on a conceptual question has been to work out a sequence of maps that give you successive degrees of approximation to the territory.
        I’m not suggesting actually trying to imbue an AI with such concepts — that would be dangerous (for the reasons you alluded to) even if it wasn’t pointless (because prosaic systems will just learn the representations they need anyway). All I’m saying is that the moment we started playing the game of definitions, we’d already started playing the game of maps. So using an arbitrary demarcation to construct our definitions might be bad for any number of legitimate reasons, but it can’t be bad just because it caused us to start using maps: our earlier decision to talk about definitions already did that.
        (I’m not 100% sure if I’ve interpreted your objection correctly, so please let me know if I haven’t.)