Another useful perspective on the conditions the advisor must satisfy, is regarding the environment w.r.t. which these conditions are defined as the belief state of the advisor rather than the true environment. This is difficult to do with the current formalism that requires MDPs, but would be possible with POMDPs for example. Indeed, I took this perspective in an earlier essay about a different setting that allows general environments (see Corollary 1 in that essay). This would lead to a performance guarantee which shows that the agent achieves optimal expected utility w.r.t. the belief state of the advisor. Obviously, this is not as good as optimal expected utility w.r.t. the true environment, however, this means that from the perspective of the advisor, building such an agent is the best possible strategy.
Another useful perspective on the conditions the advisor must satisfy, is regarding the environment w.r.t. which these conditions are defined as the belief state of the advisor rather than the true environment. This is difficult to do with the current formalism that requires MDPs, but would be possible with POMDPs for example. Indeed, I took this perspective in an earlier essay about a different setting that allows general environments (see Corollary 1 in that essay). This would lead to a performance guarantee which shows that the agent achieves optimal expected utility w.r.t. the belief state of the advisor. Obviously, this is not as good as optimal expected utility w.r.t. the true environment, however, this means that from the perspective of the advisor, building such an agent is the best possible strategy.