Gordon Seidoh Worley comments on Figuring out what Alice wants, part I

Gordon Seidoh Worley 1 Aug 2018 22:33 UTC
2 points
Herein I’m thinking about this and the sequel post and trying to understand why you might be interested in this since it doesn’t feel to me like you spell it out.

It seems we might care about model fragments if we think we can’t build complete models of other agents/things but can instead build partial models. The “we” building these models might be literally us, but also an AI or a composite agent like humanity. Having a theory of what to do with these model fragments is useful if we want to address at least two questions, then, that we might be worried about around these parts: how do we decide an AI is safe based on our fragmentary models of it, and how does an AI model humanity based on its fragmentary models of humans.
- Stuart_Armstrong 4 Aug 2018 21:27 UTC
  2 points
  Parent
  I’m looking at how humans model each other based on their fragmentary models, and using this to get to their values.
  - Gordon Seidoh Worley 6 Aug 2018 20:01 UTC
    2 points
    Parent
    Thinking a bit more, it seems a big problem we may face in using model fragments is that they are fragments and we will have to find a way to stitch them together so that they fill the gaps between the models, perhaps requiring something like model interpolation. Of course, maybe this isn’t necessary if we think of fragments as mostly overlapping (although probably inconsistent in the overlaps) or of new fragments to fill gaps as available on demand if we discover we need them and don’t have them.
    - Stuart_Armstrong 7 Aug 2018 13:27 UTC
      2 points
      Parent
      For contradictions: https://www.lesswrong.com/posts/Y2LhX3925RodndwpC/resolving-human-values-completely-and-adequately
      - Gordon Seidoh Worley 7 Aug 2018 17:37 UTC
        3 points
        Parent
        I suspect dealing adequately with contradictions will be significantly more complicated than you propose, but haven’t written about that in depth yet. When I get around to addressing what I view as necessary in this area (practicing moral particularism that will be robust to false positives) I definitely look forward to talking with you more about it.
        Stuart_Armstrong 8 Aug 2018 17:06 UTC
        4 points
        Parent
        I agree with you to some extent. That post is mainly a placeholder that tells me that the contradictions problem is not intrinsically unsolvable, so I can put it aside while I concentrate on this problem for the moment.