“each program is further weighted by its fit to all data observed so far. This gives you a weighted mixture of experts that can predict future bits.”
I don’t see it explained anywhere what algorithm is used to weight the experts for this measure. Does it matter? And how are the “fit” probabilities and “complexity” probabilities combined? Multiply and normalize?
“each program is further weighted by its fit to all data observed so far. This gives you a weighted mixture of experts that can predict future bits.”
I don’t see it explained anywhere what algorithm is used to weight the experts for this measure. Does it matter? And how are the “fit” probabilities and “complexity” probabilities combined? Multiply and normalize?
bayes theorem