Eliezer’s proposal was a different notation, not an actual change in the strength of Solomonoff Induction. The usual form of SI with deterministic hypotheses is already equivalent to one with probabilistic hypotheses. Because a single hypothesis with prior probability P that assigns uniform probability to each of 2^N different bitstrings, makes the same predictions as an ensemble of 2^N deterministic hypotheses each of which has prior probability P*2^-N and predicts one of the bitstrings with certainty; and a Bayesian update in the former case is equivalent to just discarding falsified hypotheses in the latter. Given any computable probability distribution, you can with O(1) bits of overhead convert it into a program that samples from that distribution when given a uniform random string as input, and then convert that into an ensemble of deterministic programs with different hardcoded values of the random string. (The other direction of the equivalence is obvious: a computable deterministic hypothesis is just a special case of a computable probability distribution.)
Yes, if you put a Solomonoff Inductor in an environment that contains a fair coin, it would come up with increasingly convoluted Turing machines. This is a problem only if you care about the value of an intermediate variable (posterior probability assigned to individual programs), rather than the variable that SI was actually designed to optimize, namely accurate predictions of sensory inputs. This manifests in AIXI’s limitation to using a sense-determined utility function. (Granted, a sense-determined utility function really isn’t a good formalization of my preferences, so you couldn’t build an FAI that way.)
I interpret Daniel_Burfoot’s idea as: “import java.util.*” makes subsequent mentions of List longer, since there are more symbols in scope that it has to be distinguished from.
But I don’t think that idea actually works. You can decompose the probability of a conjunction into a product of conditional probabilities, and you get the same number regardless of the order of said decomposition. Whatever probability (and corresponding total compressed size) you assign to a certain sequence of imports and symbols, you could just as well record the symbols first and then the imports. In which case by the time you get to the imports you already know that the program didn’t invoke any utils other than List and Map, so even being able to represent the distinction between the two forms of import is counterproductive for compression.
The intuition of “there are more symbols in scope that it has to be distinguished from” goes wrong because it fails to account for updating a probability distribution over what symbol comes next based on what symbols you’ve seen. Such an update can include knowledge of which symbols come in the same header, if that’s correlated with which symbols are likely to appear in the same program.