Gordon Seidoh Worley comments on Goodhart’s Curse and Limitations on AI Alignment

Gordon Seidoh Worley 21 Aug 2019 13:05 UTC
2 points
Hmm, maybe you are misunderstanding how the optimizer’s curse works? It’s powered by selecting based on a measure with error in a way that biases us to pick specific actions based on their measure when the measure errs such that the measure is on average higher rather than lower than its true value. You are mistaken, then, to not care about E, because E is the only reliable and comparable way you have to check if C satisfies T (if there’s another one that’s reliable and comparable, then use it instead). It’s literally the only option, assuming you picked the “best” E (another chance for Goodhart’s curse to bite you), for picking C_max that seems better unless you want very high quantilization such that, say, you only act when things appear orders of magnitude better with error bounds small enough that you will only be wrong once in trillions of years.
- Steven Byrnes 21 Aug 2019 13:38 UTC
  1 point
  Parent
  I do think I understand that. I see E as a means to an end. It’s a way to rank-order choices and thus make good choices. If I apply an affine transformation to E, e.g. I’m way too optimistic about absolutely everything in a completely uniform way, then I still make the same choice, and the choice is what matters. I just want my AGI to do the right thing.
  
  Here, I’ll try to put what I’m thinking more starkly. Let’s say I somehow design a comparative AGI. This is a system which can take a merit function U, and two choices C_A and C_B, and it can predict which of the two choices C_A or C_B would be better according to merit function U, but it has no idea how good either of those two choices actually are on any absolute scale. It doesn’t know whether C_A is wonderful while C_B is even better, or whether C_A is awful while C_B is merely so-so, both of those just return the same answer, “C_B is better”. Assume it’s not omniscient, so its comparisons are not always correct, but that it’s still impressively superintelligent.
  
  A comparative AGI does not suffer the optimizer’s curse, right? It never forms any beliefs about how good its choices will turn out, so it couldn’t possibly be systematically disappointed. There’s always noise and uncertainty, so there will be times when its second-highest-ranked choice would actually turn out better than its highest-ranked choice. But that happens less often than chance. There’s no systematic problem: in expectation, the best thing to do (as measure by U) is always to take its top-ranked choice.
  
  Now, it seems to me that, if I go to the AGIs-R-Us store, and I see a normal AGI and a comparative AGI side-by-side on the shelf, I would have no strong opinion about which one of them I should buy. If I ask either one to do something, they’ll take the same sequence of actions in the same order, and get the same result. They’ll invest my money in the same stocks, offer me the same advice, etc. etc. In particular, I would worry about Goodhart’s law (i.e. giving my AGI the wrong function U) with either of these AGIs to the exact same extent and for the exact same reason...even though one is subject to optimizer’s curse and the other isn’t.
  - Gordon Seidoh Worley 21 Aug 2019 17:36 UTC
    4 points
    Parent
    Right, if you don’t have a measure you can’t have Goodhart’s curse on technical grounds, but I’m also pretty sure something like it is still there, it’s just as far as I know no one has tried to show that something like the optimizers curse continues to function when you only have an ordering and not a measure. I think it does, and I think others think it does, and this is part of the generalization to Goodharting, but I don’t know that a formal proof demonstrating that has been generated even though I strongly suspect it’s true.