Would you consider this a valid counter to the third strategy (have humans adopt the optimal Bayes net using imitative generalization), as alternative to ontology mismatch?
Counter: In the worst case, imitative generalization / learning the human prior is not competitive. In particular, it might just be harder for a model to match the human inference y∣x,Z than to simply learn y∣x. Here Z is the set of instructions as in learning the prior (I think in the context of ELK Z would be the proposed change to the human Bayes net?)
Here’s a few more questions about the same strategy:
If I understand correctly, the IG strategy is to learn a joint model for observations and actions pθ(v,a;Z), where v, a, and Z are video, actions, and proposed change to the Bayes net, respectively. Then we do inference using pθ(v,a;Z∗), where Z∗ is optimized for predictive usefulness.
This fails because there’s no easy way to get P(diamond is in the vault) from pθ.
A simple way around this would be to learn pθ(v,a,y;Z) instead, where y=1 if the diamond is in the vault and 0 otherwise.
Is my understanding correct?
If so, I would guess that my simple workaround doesn’t count as a strategy because we can only use this to predict whether the diamond is in the vault (or some other set of questions that must be fixed at training time), as opposed to any question we want an answer to. Is this correct? Is there some other reason this wouldn’t count, or does it in fact count?
Would you consider this a valid counter to the third strategy (have humans adopt the optimal Bayes net using imitative generalization), as alternative to ontology mismatch?
Here’s a few more questions about the same strategy:
If I understand correctly, the IG strategy is to learn a joint model for observations and actions pθ(v,a;Z), where v, a, and Z are video, actions, and proposed change to the Bayes net, respectively. Then we do inference using pθ(v,a;Z∗), where Z∗ is optimized for predictive usefulness.
This fails because there’s no easy way to get P(diamond is in the vault) from pθ.
A simple way around this would be to learn pθ(v,a,y;Z) instead, where y=1 if the diamond is in the vault and 0 otherwise.
Is my understanding correct?
If so, I would guess that my simple workaround doesn’t count as a strategy because we can only use this to predict whether the diamond is in the vault (or some other set of questions that must be fixed at training time), as opposed to any question we want an answer to. Is this correct? Is there some other reason this wouldn’t count, or does it in fact count?