You actually did do something subjective: you decided that the mean was the key statistic that should be taken into account.
I don’t think your decision to calculate a mean has anything to do with Maximum Entropy methods. One way of calculating the maximum entropy distribution is by using whatever moments of your data you have available, such as a mean. Or maybe you have only saved moments of the data in your data collection efforts. You can calculate the maxent distribution with those moments as constraints on the resulting distribution. I didn’t have call to use the method, but my recollection is that Jaynes described the method in general terms for whatever informational constraints you have.
Another issue with MEP is that it does not contain any intrinsic method to prevent overfitting.
Some people have talked about calculating Maxent distributions taking the uncertainty of the moments used into account, any uncertainty which would increase for the higher order moments. I’m not sure if that would prevent the kind of thing you’re worried about. In the usual case, of calculation of the maxent dist from moments, I don’t know that you get any different result than just using more moments as if they were accurate. Has anyone compared the two?
On the other hand, I’m not sure what overfitting means when you are assigning a probability distribution to represent your state of knowledge. To what do you think you’ve overfit?
I don’t think your decision to calculate a mean has anything to do with Maximum Entropy methods. One way of calculating the maximum entropy distribution is by using whatever moments of your data you have available, such as a mean. Or maybe you have only saved moments of the data in your data collection efforts. You can calculate the maxent distribution with those moments as constraints on the resulting distribution. I didn’t have call to use the method, but my recollection is that Jaynes described the method in general terms for whatever informational constraints you have.
Some people have talked about calculating Maxent distributions taking the uncertainty of the moments used into account, any uncertainty which would increase for the higher order moments. I’m not sure if that would prevent the kind of thing you’re worried about. In the usual case, of calculation of the maxent dist from moments, I don’t know that you get any different result than just using more moments as if they were accurate. Has anyone compared the two?
On the other hand, I’m not sure what overfitting means when you are assigning a probability distribution to represent your state of knowledge. To what do you think you’ve overfit?