If we can’t observe O, we could always just guess a particular value of O and then do whatever’s optimal for that value. Then with probability P[O], we’ll be right, and our performance is lower bounded by P[O]*(whatever optimization pressure we’re able to apply if we guess correctly).
The log-number of different policies bounds the log-number of different outcome-distributions we can achieve. And observing one additional bit doubles the log-number of different policies.
The scaling would also be unbounded, at least that would be my default assumption without solid proof otherwise.
In other words I don’t see any reason to assume there must be any hard cap, whether at 2x or 10x or 100x, etc...
Here are two intuitive arguments:
If we can’t observe O, we could always just guess a particular value of O and then do whatever’s optimal for that value. Then with probability P[O], we’ll be right, and our performance is lower bounded by P[O]*(whatever optimization pressure we’re able to apply if we guess correctly).
The log-number of different policies bounds the log-number of different outcome-distributions we can achieve. And observing one additional bit doubles the log-number of different policies.