That’s all basically right, but if we’re sticking to causal Goodhart, the “without further assumptions” may be where we differ. I think that if the uncertainty is over causal structures, the “correct” structure will be more likely to increase all metrics than most others.
(I’m uncertain how to do this, but) it would be interesting to explore this over causal graphs, where a system has control over a random subset of nodes, and a metric correlated to the unobservable goal is chosen. In most cases, I’d think that leads to causal goodhart quickly, but if the set of nodes potentially used for the metric includes some that are directly causing the goal, and others than can be intercepted creating causal goodhart, uncertainty over the metric would lead to less Causal-goodharting, since targeting the actual cause should improve the correlated metrics, while the reverse is not true.
That’s all basically right, but if we’re sticking to causal Goodhart, the “without further assumptions” may be where we differ. I think that if the uncertainty is over causal structures, the “correct” structure will be more likely to increase all metrics than most others.
(I’m uncertain how to do this, but) it would be interesting to explore this over causal graphs, where a system has control over a random subset of nodes, and a metric correlated to the unobservable goal is chosen. In most cases, I’d think that leads to causal goodhart quickly, but if the set of nodes potentially used for the metric includes some that are directly causing the goal, and others than can be intercepted creating causal goodhart, uncertainty over the metric would lead to less Causal-goodharting, since targeting the actual cause should improve the correlated metrics, while the reverse is not true.