The concept of reachability lets you amplify a policy, select a policy worse than the amplification, then amplify this worse policy. The problem is that a policy that is worse may actually amplify better than a policy that is better. So I don’t find this definition very useful unless we also have an algorithm for selecting the optimal worse policy to amplify. Unfortunately, that problem doesn’t seem very tractable at all.
The concept of reachability lets you amplify a policy, select a policy worse than the amplification, then amplify this worse policy. The problem is that a policy that is worse may actually amplify better than a policy that is better. So I don’t find this definition very useful unless we also have an algorithm for selecting the optimal worse policy to amplify. Unfortunately, that problem doesn’t seem very tractable at all.