I don’t think it’s related to mild optimization. Pick a target T that can be exceeded (wonderful future, even if it’s not the absolute theoretically best possible future). Estimate what choice Cmax is (as far as we can tell) the #1 very best by that metric. We expect Cmax to give value E, and it turns out to be V<E, but V is still likely to exceed T, or at least likelier than any other choice. (Insofar as that’s not true, it’s Goodhart.) Optimizer curse, i.e. V<E, does not seem to be a problem, or even relevant, because I don’t ultimately care about E. Maybe the AI doesn’t even tell me what E is. Maybe the AI doesn’t even bother guessing what E is, it only calculates that Cmax seems to be better than any other choice.
Hmm, maybe you are misunderstanding how the optimizer’s curse works? It’s powered by selecting based on a measure with error in a way that biases us to pick specific actions based on their measure when the measure errs such that the measure is on average higher rather than lower than its true value. You are mistaken, then, to not care about E, because E is the only reliable and comparable way you have to check if C satisfies T (if there’s another one that’s reliable and comparable, then use it instead). It’s literally the only option, assuming you picked the “best” E (another chance for Goodhart’s curse to bite you), for picking C_max that seems better unless you want very high quantilization such that, say, you only act when things appear orders of magnitude better with error bounds small enough that you will only be wrong once in trillions of years.
I do think I understand that. I see E as a means to an end. It’s a way to rank-order choices and thus make good choices. If I apply an affine transformation to E, e.g. I’m way too optimistic about absolutely everything in a completely uniform way, then I still make the same choice, and the choice is what matters. I just want my AGI to do the right thing.
Here, I’ll try to put what I’m thinking more starkly. Let’s say I somehow design a comparative AGI. This is a system which can take a merit function U, and two choices C_A and C_B, and it can predict which of the two choices C_A or C_B would be better according to merit function U, but it has no idea how good either of those two choices actually are on any absolute scale. It doesn’t know whether C_A is wonderful while C_B is even better, or whether C_A is awful while C_B is merely so-so, both of those just return the same answer, “C_B is better”. Assume it’s not omniscient, so its comparisons are not always correct, but that it’s still impressively superintelligent.
A comparative AGI does not suffer the optimizer’s curse, right? It never forms any beliefs about how good its choices will turn out, so it couldn’t possibly be systematically disappointed. There’s always noise and uncertainty, so there will be times when its second-highest-ranked choice would actually turn out better than its highest-ranked choice. But that happens less often than chance. There’s no systematic problem: in expectation, the best thing to do (as measure by U) is always to take its top-ranked choice.
Now, it seems to me that, if I go to the AGIs-R-Us store, and I see a normal AGI and a comparative AGI side-by-side on the shelf, I would have no strong opinion about which one of them I should buy. If I ask either one to do something, they’ll take the same sequence of actions in the same order, and get the same result. They’ll invest my money in the same stocks, offer me the same advice, etc. etc. In particular, I would worry about Goodhart’s law (i.e. giving my AGI the wrong function U) with either of these AGIs to the exact same extent and for the exact same reason...even though one is subject to optimizer’s curse and the other isn’t.
Right, if you don’t have a measure you can’t have Goodhart’s curse on technical grounds, but I’m also pretty sure something like it is still there, it’s just as far as I know no one has tried to show that something like the optimizers curse continues to function when you only have an ordering and not a measure. I think it does, and I think others think it does, and this is part of the generalization to Goodharting, but I don’t know that a formal proof demonstrating that has been generated even though I strongly suspect it’s true.
I don’t think it’s related to mild optimization. Pick a target T that can be exceeded (wonderful future, even if it’s not the absolute theoretically best possible future). Estimate what choice Cmax is (as far as we can tell) the #1 very best by that metric. We expect Cmax to give value E, and it turns out to be V<E, but V is still likely to exceed T, or at least likelier than any other choice. (Insofar as that’s not true, it’s Goodhart.) Optimizer curse, i.e. V<E, does not seem to be a problem, or even relevant, because I don’t ultimately care about E. Maybe the AI doesn’t even tell me what E is. Maybe the AI doesn’t even bother guessing what E is, it only calculates that Cmax seems to be better than any other choice.
Hmm, maybe you are misunderstanding how the optimizer’s curse works? It’s powered by selecting based on a measure with error in a way that biases us to pick specific actions based on their measure when the measure errs such that the measure is on average higher rather than lower than its true value. You are mistaken, then, to not care about E, because E is the only reliable and comparable way you have to check if C satisfies T (if there’s another one that’s reliable and comparable, then use it instead). It’s literally the only option, assuming you picked the “best” E (another chance for Goodhart’s curse to bite you), for picking C_max that seems better unless you want very high quantilization such that, say, you only act when things appear orders of magnitude better with error bounds small enough that you will only be wrong once in trillions of years.
I do think I understand that. I see E as a means to an end. It’s a way to rank-order choices and thus make good choices. If I apply an affine transformation to E, e.g. I’m way too optimistic about absolutely everything in a completely uniform way, then I still make the same choice, and the choice is what matters. I just want my AGI to do the right thing.
Here, I’ll try to put what I’m thinking more starkly. Let’s say I somehow design a comparative AGI. This is a system which can take a merit function U, and two choices C_A and C_B, and it can predict which of the two choices C_A or C_B would be better according to merit function U, but it has no idea how good either of those two choices actually are on any absolute scale. It doesn’t know whether C_A is wonderful while C_B is even better, or whether C_A is awful while C_B is merely so-so, both of those just return the same answer, “C_B is better”. Assume it’s not omniscient, so its comparisons are not always correct, but that it’s still impressively superintelligent.
A comparative AGI does not suffer the optimizer’s curse, right? It never forms any beliefs about how good its choices will turn out, so it couldn’t possibly be systematically disappointed. There’s always noise and uncertainty, so there will be times when its second-highest-ranked choice would actually turn out better than its highest-ranked choice. But that happens less often than chance. There’s no systematic problem: in expectation, the best thing to do (as measure by U) is always to take its top-ranked choice.
Now, it seems to me that, if I go to the AGIs-R-Us store, and I see a normal AGI and a comparative AGI side-by-side on the shelf, I would have no strong opinion about which one of them I should buy. If I ask either one to do something, they’ll take the same sequence of actions in the same order, and get the same result. They’ll invest my money in the same stocks, offer me the same advice, etc. etc. In particular, I would worry about Goodhart’s law (i.e. giving my AGI the wrong function U) with either of these AGIs to the exact same extent and for the exact same reason...even though one is subject to optimizer’s curse and the other isn’t.
Right, if you don’t have a measure you can’t have Goodhart’s curse on technical grounds, but I’m also pretty sure something like it is still there, it’s just as far as I know no one has tried to show that something like the optimizers curse continues to function when you only have an ordering and not a measure. I think it does, and I think others think it does, and this is part of the generalization to Goodharting, but I don’t know that a formal proof demonstrating that has been generated even though I strongly suspect it’s true.