Goodhart boundary encloses the situations where the person/agent making the decisions has accurate proxy utility function and proxy probability distribution (so that the available in practice tractable judgements are close to the normative actually-correct ones). Goodhart’s Curse is the catastrophy where an expected utility maximizer operating under proxy probutility (probability+utility) would by default venture outside the goodhart boundary (into the crashspace), set the things that proxy utility overvalues or doesn’t care about to extreme values, and thus ruin the outcome from the point of view of the intractable normative utility.
Pascal’s Mugging seems like a case of venturing outside the goodhart boundary in the low-proxy-probability direction rather than in the high-proxy-utility direction. But it illustrates the same point, that if all you have are proxy utility/probability, not the actual ones, then pursuing any kind of expected utility maximization is always catastrophic misalignment.
One must instead optimize mildly and work on extending the goodhart boundary, improving robustness of utility/prior proxy to unusual situations (rescuing them under an ontological shift) in a way that keeps it close to their normative/intended content. In case of Pascal’s Mugging, that means better prediction of low-probability events (in Bostrom’s framing where utility values don’t get too ridiculous), or also better understanding of high-utility events (in Yudkowsky’s framing with 3^^^^3 lives being at stake), and avoiding situations that call for such decisions until after that understanding is already available.
Incidentally, it seems like the bureaucracies of HCH can be thought of as a step in that direction, with individual bureaucracies capturing novel concepts needed to cope with unusual situations, HCH’s “humans” keeping the whole thing grounded (within the original goodhart boundary that humans are robust to), and episode structure arranging such bureaucracies/concepts like words in a sentence.
Thank you for this high quality comment and all the pointers in it. I think these two framings are isomorphic, yes? You have nicely compressed it all into the one paragraph.
Goodhart boundary encloses the situations where the person/agent making the decisions has accurate proxy utility function and proxy probability distribution (so that the available in practice tractable judgements are close to the normative actually-correct ones). Goodhart’s Curse is the catastrophy where an expected utility maximizer operating under proxy probutility (probability+utility) would by default venture outside the goodhart boundary (into the crash space), set the things that proxy utility overvalues or doesn’t care about to extreme values, and thus ruin the outcome from the point of view of the intractable normative utility.
Pascal’s Mugging seems like a case of venturing outside the goodhart boundary in the low-proxy-probability direction rather than in the high-proxy-utility direction. But it illustrates the same point, that if all you have are proxy utility/probability, not the actual ones, then pursuing any kind of expected utility maximization is always catastrophic misalignment.
One must instead optimize mildly and work on extending the goodhart boundary, improving robustness of utility/prior proxy to unusual situations (rescuing them under an ontological shift) in a way that keeps it close to their normative/intended content. In case of Pascal’s Mugging, that means better prediction of low-probability events (in Bostrom’s framing where utility values don’t get too ridiculous), or also better understanding of high-utility events (in Yudkowsky’s framing with 3^^^^3 lives being at stake), and avoiding situations that call for such decisions until after that understanding is already available.
Incidentally, it seems like the bureaucracies of HCH can be thought of as a step in that direction, with individual bureaucracies capturing novel concepts needed to cope with unusual situations, HCH’s “humans” keeping the whole thing grounded (within the original goodhart boundary that humans are robust to), and episode structure arranging such bureaucracies/concepts like words in a sentence.
Thank you for this high quality comment and all the pointers in it. I think these two framings are isomorphic, yes? You have nicely compressed it all into the one paragraph.