Yes, decoupling seems to address a broad class of incentive problems in safety, which includes the shutdown problem and various forms of tampering / wireheading. Other examples of decoupling include causal counterfactual agents and counterfactual reward modeling.
Yes, decoupling seems to address a broad class of incentive problems in safety, which includes the shutdown problem and various forms of tampering / wireheading. Other examples of decoupling include causal counterfactual agents and counterfactual reward modeling.