William_S comments on Improbable Oversight, An Attempt at Informed Oversight

William_S 1 Aug 2016 17:38 UTC
0 points
Another method for dealing with this case: Prior to the algorithm, pick a “safe set” S of (input, action) pairs which seem, on some reflection, that they should obviously not be forbidden. Then only punish actions which have a larger set of values r for which F(a,r) is true than, say, the mean of all actions in S (could be done by setting the evaluation of the null action $a_{n u l l}$ to equal the evaluation of some random element in S). This means that if A will choose actions that are at least as safe as those in S (which might be suboptimally influenced by the size of the advice set $R_{F}$ , but would not lead to paralysis). This could compromise safety if S is chosen poorly, or if there are pathological cases where some unsafe action has fewer reasons to think it unsafe than the actions in S (this seems unlikely at first glance).