Hmm, you quoted your own comment on instability of a wrapper action under the small change of the “value function”:
A powerful optimizer, with no checks or moderating influences on it, will tend to make extreme Goodharted choices that look good according to its exact value function, and very bad (because extreme) according to almost any other value function.
Does it mean that if we require a stable attractor to exist in the map value → actions, then we end up with a more tame version of a wrapper, or even with a non-wrapper structure?
Hmm, you quoted your own comment on instability of a wrapper action under the small change of the “value function”:
Does it mean that if we require a stable attractor to exist in the map value → actions, then we end up with a more tame version of a wrapper, or even with a non-wrapper structure?