“it depends on what the distributions are, but there is another simple stat you can computer from the Mi, which combined with their average, gives you all the info you need”
Yes, assuming it’s a maximum entropy distribution (e.g. normal, dirichlet, beta, exponential, geometric, hypergeometric, … basically all the distributions we typically use as fundamental building blocks). If it’s not a maximum entropy distribution, then the relevant information can’t be summarized by a simple statistic; we need to keep around the whole distribution P[X=x | M] for every possible value of x. In the maxent case, the summary statistics are sufficient to compute that distribution, which is why we don’t need to keep around anything else.
Yes, assuming it’s a maximum entropy distribution (e.g. normal, dirichlet, beta, exponential, geometric, hypergeometric, … basically all the distributions we typically use as fundamental building blocks). If it’s not a maximum entropy distribution, then the relevant information can’t be summarized by a simple statistic; we need to keep around the whole distribution P[X=x | M] for every possible value of x. In the maxent case, the summary statistics are sufficient to compute that distribution, which is why we don’t need to keep around anything else.