Given a distribution A over policies that ε-close to a benign policy for some ε ≪ 1, can we implement a distribution A⁺ over policies which is δ-close to a benign policy of similar capability, for some δ ≪ ε?
a “benign” policy has to be benign for all inputs. (See also security amplification, stating the analogous problem where a policy is “mostly” benign but may fail on a “small” fraction of inputs.)
These mechanisms rely on independence of results, don’t they? You can’t take 3 identical agents and combine them usefully.
Yes, when I say:
a “benign” policy has to be benign for all inputs. (See also security amplification, stating the analogous problem where a policy is “mostly” benign but may fail on a “small” fraction of inputs.)