Yes, though I think the better way to put this is that I wouldn’t spend effort hiding it. It’s not clear I’d actively choose to reveal it, since there’s no incentive in either direction once I think I have no influence on your decision. (I do think this is ok, since it’s the active efforts to deceive we’re most worried about)
Agreed
Sure, but the case I’m thinking about is where the LCDT agent itself is little more than a wrapper around an opaque implementation of HCH. I.e. the LCDT agent’s causal model is essentially: [data] --> [Argmax HCH function] --> [action].
I assume this isn’t what you’re thinking of, but it’s not clear to me what constraints we’d apply to get the kind of thing you are thinking of. E.g. if our causal model is allowed to represent an individual human as a black-box, then why not HCH as a black-box? If we’re not allowing a human as a black-box, then how far must things be broken down into lower-level gears (at fine enough granularity I’m not sure a causal model is much clearer than a NN)?
Quite possibly there are sensible constraints we could apply to get an interpretable model. It’s just not currently clear to me what kind of thing you’re imagining—and I assume they’d come at some performance penalty.
I need to think more about it, but my personal mental image is that to be competitive, the LCDT agent must split the human at a lower level than just one distribution (even more for HCH which is more complicated). As for why such a low-level causal model would be more interpretable than a NN:
First we know which part of the causal model correspond to the human, which is not the case in the NN
The human will be modeled only by variables on this part of the causal graph, whereas it could be completely distributed over a NN
I don’t know how to formulate it, but a causal model seems to give way more information than a NN, because it encodes the causal relationship, whereas a NN could completely compute causal relationships in a weird and counterintuitive way.
First we know which part of the causal model correspond to the human, which is not the case in the NN
This doesn’t follow only from [we know X is an LCDT agent that’s modeling a human] though, right? We could imagine some predicate/constraint/invariant that detects/enforces/maintains LCDTness without necessarily being transparent to humans. I’ll grant you it seems likely so long as we have the right kind of LCDT agent—but it’s not clear to me that LCDTness itself is contributing much here.
The human will be modeled only by variables on this part of the causal graph, whereas it could be completely distributed over a NN
At first sight this seems at least mostly right—but I do need to think about it more. E.g. it seems plausible that most of the work of modeling a particular human H fairly accurately is in modeling [humans-in-general] and then feeding H’s properties into that. The [humans-in-general] part may still be distributed. I agree that this is helpful. However, I do think it’s important not to assume things are so nicely spatially organised as they would be once you got down to a molecular level model.
a causal model seems to give way more information than a NN, because it encodes the causal relationship, whereas a NN could completely compute causal relationships in a weird and counterintuitive way
My intuitions are in the same direction as yours (I’m playing devil’s advocate a bit here—shockingly :)). I just don’t have principled reasons to think it actually ends up more informative.
I imagine learned causal models can be counter-intuitive too, and I think I’d expect this by default. I agree that it seems much cleaner so long as it’s using a nice ontology with nice abstractions… - but is that likely? Would you guess it’s easier to get the causal model to do things in a ‘nice’, ‘natural’ way than it would be for an NN? Quite possibly it would be.
Agreed
I need to think more about it, but my personal mental image is that to be competitive, the LCDT agent must split the human at a lower level than just one distribution (even more for HCH which is more complicated). As for why such a low-level causal model would be more interpretable than a NN:
First we know which part of the causal model correspond to the human, which is not the case in the NN
The human will be modeled only by variables on this part of the causal graph, whereas it could be completely distributed over a NN
I don’t know how to formulate it, but a causal model seems to give way more information than a NN, because it encodes the causal relationship, whereas a NN could completely compute causal relationships in a weird and counterintuitive way.
Me too!
This doesn’t follow only from [we know X is an LCDT agent that’s modeling a human] though, right? We could imagine some predicate/constraint/invariant that detects/enforces/maintains LCDTness without necessarily being transparent to humans.
I’ll grant you it seems likely so long as we have the right kind of LCDT agent—but it’s not clear to me that LCDTness itself is contributing much here.
At first sight this seems at least mostly right—but I do need to think about it more. E.g. it seems plausible that most of the work of modeling a particular human H fairly accurately is in modeling [humans-in-general] and then feeding H’s properties into that. The [humans-in-general] part may still be distributed.
I agree that this is helpful. However, I do think it’s important not to assume things are so nicely spatially organised as they would be once you got down to a molecular level model.
My intuitions are in the same direction as yours (I’m playing devil’s advocate a bit here—shockingly :)). I just don’t have principled reasons to think it actually ends up more informative.
I imagine learned causal models can be counter-intuitive too, and I think I’d expect this by default. I agree that it seems much cleaner so long as it’s using a nice ontology with nice abstractions… - but is that likely? Would you guess it’s easier to get the causal model to do things in a ‘nice’, ‘natural’ way than it would be for an NN? Quite possibly it would be.