I’m fairly sure you can get a result something like “it’s not necessary to put positive probability mass on two different functions that can’t be distinguished by observing only s bits”, so some functions can get zero probability, e.g. the XOR of all combinations of at least s+1 bits.
edit: The proof is easy. Let f1, f2 be two such indistinguishable functions that you place positive probability on, F be a random variable for the function, and F’ be F but with all probability mass for f2 replaced by f1. Then P(Y|F)=P(Y|F′). But this means H(Y|F)=H(Y|F′) and so I(Y,F)=H(Y)−H(Y|F)=H(Y)−H(Y,F′)=I(Y,F′). You don’t lose any channel capacity switching to F′.
I’m fairly sure you can get a result something like “it’s not necessary to put positive probability mass on two different functions that can’t be distinguished by observing only s bits”, so some functions can get zero probability, e.g. the XOR of all combinations of at least s+1 bits.
edit: The proof is easy. Let f1, f2 be two such indistinguishable functions that you place positive probability on, F be a random variable for the function, and F’ be F but with all probability mass for f2 replaced by f1. Then P(Y|F)=P(Y|F′). But this means H(Y|F)=H(Y|F′) and so I(Y,F)=H(Y)−H(Y|F)=H(Y)−H(Y,F′)=I(Y,F′). You don’t lose any channel capacity switching to F′.