Xodarap comments on Prizes for ELK proposals

Xodarap 27 Jan 2022 21:57 UTC
1 point
A small suggestion: the counterexample to “penalize downstream”, as I understand it, requires there to be tampering in the training data set. It seems conceptually cleaner to me if we can assume the training data set has not been tampered with (e.g. because if alignment only required there to be no tampering in the training data, that would be much easier).
The following counterexample does not require tampering in the training data:
1. The predictor has nodes $E_{0}, \dots, E_{n}$ indicating whether the diamond was stolen at time $n$
2. It also has node $E = ⋁_{i} E_{i}$ indicating whether the diamond was ever stolen. The direct translator would look at this node.
3. However, it happens that in the training data $E = E_{n}$ , i.e. we only ever needed to look at the last frame of the video
4. Therefore, the human interpreter can look only at $E_{n}$ and get the same loss as the direct translator, despite being upstream of it.
(I’m interested in pursuing approaches that assume training data has not been tampered with. Maybe nobody but me cares about this, but posting in case somebody else does. I may be understanding something here – corrections are appreciated.)
- paulfchristiano 11 Feb 2022 0:57 UTC
  3 points
  Parent
  I think it’s good to assume that there is no tampering in the training set.
  In the document we say that we’re worried about the reporter that waits until it sees a good argument that “The human won’t be confident that the diamond isn’t in the room” and then says “the diamond is in the room” as soon as it finds one. We claim that this helps on the training set, and then argue that it would lead to bad behavior given certain kinds of tampering.
  But you’re correct that we don’t actually give any examples where this heuristic actually helps. To give one now (credit Mark): suppose that if the diamond is in the room at time T, then at time T+1 it will either be in the room or something confusing will happen that will leave the human unconfident about whether the diamond is still in the room. Then as soon as you figure out that the diamond is in the room at time T, you might as well answer “the diamond is in the room at time T+1” even if you aren’t actually sure of that.
  The counterexample you describe has a different flavor but is also valid (both for “depending on downstream variables” and “computation time”)---the reporter can save time by baking in some assumptions that are only true on the training distribution. There are various ways you could try to address this kind of problem, and it seems interesting and important. We don’t get into any of that in the doc. That’s partly because we haven’t really worked through any of the details for any of those approaches, so they would be welcome contributions!
- Towards_Keeperhood 10 Feb 2022 19:07 UTC
  1 point
  Parent
  Therefore, the human interpreter can look only at $E_{n}$ and get the same loss as the direct translator, despite being upstream of it.
  Maybe I misunderstand you, but how I understand it, $E_{n}$ is based on the last frame of the predicted video, and therefore is basically the most downstream thing there is. How did you come to think it was upstream of the direct translator?