Yes, for example you can penalize the (initially Solomonoff-ish) prior probability of every hypothesis by a factor of e−β(Umax−Umin) where β>0 is some constant, Umax is the maximal expected utility of this hypothesis over all policies, and Umin is the minimal (and you’d have to discard hypotheses for which one of those is already divergent, except maybe in cases where the difference is renormalizable somehow). This kind of thing was referred to as “leverage penalty” in a previous discussion. Personally I’m quite skeptical it’s useful, but maaaybe?
Yes, for example you can penalize the (initially Solomonoff-ish) prior probability of every hypothesis by a factor of e−β(Umax−Umin) where β>0 is some constant, Umax is the maximal expected utility of this hypothesis over all policies, and Umin is the minimal (and you’d have to discard hypotheses for which one of those is already divergent, except maybe in cases where the difference is renormalizable somehow). This kind of thing was referred to as “leverage penalty” in a previous discussion. Personally I’m quite skeptical it’s useful, but maaaybe?