Lauro Langosco comments on Prizes for ELK proposals

Lauro Langosco 3 Feb 2022 14:44 UTC
1 point
Here’s a few more questions about the same strategy:

If I understand correctly, the IG strategy is to learn a joint model for observations and actions $p_{θ} (v, a; Z)$ , where $v$ , $a$ , and $Z$ are video, actions, and proposed change to the Bayes net, respectively. Then we do inference using $p_{θ} (v, a; Z^{*})$ , where $Z^{*}$ is optimized for predictive usefulness.

This fails because there’s no easy way to get $P (diamond is in the vault)$ from $p_{θ}$ .

A simple way around this would be to learn $p_{θ} (v, a, y; Z)$ instead, where $y = 1$ if the diamond is in the vault and $0$ otherwise.
1. Is my understanding correct?
2. If so, I would guess that my simple workaround doesn’t count as a strategy because we can only use this to predict whether the diamond is in the vault (or some other set of questions that must be fixed at training time), as opposed to any question we want an answer to. Is this correct? Is there some other reason this wouldn’t count, or does it in fact count?