adamShimi comments on paulfchristiano’s Shortform

adamShimi 2 Jul 2021 14:57 UTC
LW: 2 AF: 1
0
AF
Ok, so you optimize the circuit both for speed and for small loss on human answers/comparisons, hoping that it generalizes to more questions while not being complex enough to be deceptive. Is that what you mean?
- paulfchristiano 3 Jul 2021 18:29 UTC
  LW: 4 AF: 3
  0
  AF Parent
  I’m mostly worried about parameter sharing between the human models in the environment and the QA procedure (which leads the QA to generalize like a human instead of correctly). You could call that deception but I think it’s a somewhat simpler phenomenon.