Herein I’m thinking about this and the sequel post and trying to understand why you might be interested in this since it doesn’t feel to me like you spell it out.
It seems we might care about model fragments if we think we can’t build complete models of other agents/things but can instead build partial models. The “we” building these models might be literally us, but also an AI or a composite agent like humanity. Having a theory of what to do with these model fragments is useful if we want to address at least two questions, then, that we might be worried about around these parts: how do we decide an AI is safe based on our fragmentary models of it, and how does an AI model humanity based on its fragmentary models of humans.
Thinking a bit more, it seems a big problem we may face in using model fragments is that they are fragments and we will have to find a way to stitch them together so that they fill the gaps between the models, perhaps requiring something like model interpolation. Of course, maybe this isn’t necessary if we think of fragments as mostly overlapping (although probably inconsistent in the overlaps) or of new fragments to fill gaps as available on demand if we discover we need them and don’t have them.
I suspect dealing adequately with contradictions will be significantly more complicated than you propose, but haven’t written about that in depth yet. When I get around to addressing what I view as necessary in this area (practicing moral particularism that will be robust to false positives) I definitely look forward to talking with you more about it.
I agree with you to some extent. That post is mainly a placeholder that tells me that the contradictions problem is not intrinsically unsolvable, so I can put it aside while I concentrate on this problem for the moment.
Herein I’m thinking about this and the sequel post and trying to understand why you might be interested in this since it doesn’t feel to me like you spell it out.
It seems we might care about model fragments if we think we can’t build complete models of other agents/things but can instead build partial models. The “we” building these models might be literally us, but also an AI or a composite agent like humanity. Having a theory of what to do with these model fragments is useful if we want to address at least two questions, then, that we might be worried about around these parts: how do we decide an AI is safe based on our fragmentary models of it, and how does an AI model humanity based on its fragmentary models of humans.
I’m looking at how humans model each other based on their fragmentary models, and using this to get to their values.
Thinking a bit more, it seems a big problem we may face in using model fragments is that they are fragments and we will have to find a way to stitch them together so that they fill the gaps between the models, perhaps requiring something like model interpolation. Of course, maybe this isn’t necessary if we think of fragments as mostly overlapping (although probably inconsistent in the overlaps) or of new fragments to fill gaps as available on demand if we discover we need them and don’t have them.
For contradictions: https://www.lesswrong.com/posts/Y2LhX3925RodndwpC/resolving-human-values-completely-and-adequately
I suspect dealing adequately with contradictions will be significantly more complicated than you propose, but haven’t written about that in depth yet. When I get around to addressing what I view as necessary in this area (practicing moral particularism that will be robust to false positives) I definitely look forward to talking with you more about it.
I agree with you to some extent. That post is mainly a placeholder that tells me that the contradictions problem is not intrinsically unsolvable, so I can put it aside while I concentrate on this problem for the moment.