William_S comments on Improbable Oversight, An Attempt at Informed Oversight

William_S 2 Aug 2016 1:49 UTC
LW: 1 AF: 1
AF
I don’t quite have the exact specification of what I have in mind yet. Fortunately, this seems like a problem that I could try to address in a toy model with current techniques, so I can think about this a bit more and try to come up with a concrete system which would work.

I think that it should be possible to construct a reinforcement learning which can make use of side information. One proposal would be something like: construct A so that it internally estimates (with good uncertainty models) $F (a, r)$ and $B (a)$ , and uses those estimates to predict and maximize $E [B^{'} (a)]$ (perhaps using something as simple as a monte-carlo simulation that can run for a large number of draws). Then allow A to, as an action or a prelude to acting, ask questions about the value of $F (a, r)$ for specific $(a, r)$ pairs, or for estimates of $F (a)$ (with some model that estimates of $F (a)$ might be incorrect). If A performs correct value of information calculations, it should value asking these questions in the training process and correctly learn values, even if it never experiences a situation where it is caught and punished.