This line of reasoning is interesting and I think it deserves some empirical exploration, which could be done with modern LLMs and agents.
E.g. make a complicated process that generates a distribution of agents via RLHF on a variety of base models and a variety of RLHF datasets, and then test all those agents on some simple tests. Pick the best agents according to their mean average scores on those simple tests, and then vet them with some much more thorough tests.
I think such an experiment could be done more easily than that: simply apply standard Bayesian learning to a test set of observations and a large set of hypotheses, some of which are themselves probabilistic, yeilding a situation with both Knightian and statistical uncertainty, in which you would normally expect to be able to observe Regressional Goodhart/the Look-Elsewhere Efect. Repeat this, and confirm that that does indeed occur without this statistical adjustment, and then that applying this makes it go away (at least to second order).
However, I’m a little unclear why you feel the need to experimentally confirm a fairly well-known statistical technique: correctly compensating for the Look-Elsewhere Effect is standard procedure in the statistical analysis of experimental High-Energy Physics — which is of course a Bayesian learning process where you have both statistical uncertainty within individual hypotheses and Knightian uncertainty across alternative hypotheses, so exactly the situation in which this applies.
This line of reasoning is interesting and I think it deserves some empirical exploration, which could be done with modern LLMs and agents.
E.g. make a complicated process that generates a distribution of agents via RLHF on a variety of base models and a variety of RLHF datasets, and then test all those agents on some simple tests. Pick the best agents according to their mean average scores on those simple tests, and then vet them with some much more thorough tests.
I think such an experiment could be done more easily than that: simply apply standard Bayesian learning to a test set of observations and a large set of hypotheses, some of which are themselves probabilistic, yeilding a situation with both Knightian and statistical uncertainty, in which you would normally expect to be able to observe Regressional Goodhart/the Look-Elsewhere Efect. Repeat this, and confirm that that does indeed occur without this statistical adjustment, and then that applying this makes it go away (at least to second order).
However, I’m a little unclear why you feel the need to experimentally confirm a fairly well-known statistical technique: correctly compensating for the Look-Elsewhere Effect is standard procedure in the statistical analysis of experimental High-Energy Physics — which is of course a Bayesian learning process where you have both statistical uncertainty within individual hypotheses and Knightian uncertainty across alternative hypotheses, so exactly the situation in which this applies.