mukashi comments on Questions about Solomonoff induction

mukashi 10 Jan 2024 5:23 UTC
2 points
0
I think this is pointing to what I don’t understand: how do you account for hypotheses that explain data generated randomly? How do you compare a hypothesis which is a random number generator with some parameters against a hypothesis which has some deterministic component?
Is there a way to understand this without reading the original paper (which will probably take me quite long)?
When you understood this, how was your personal process that took you from knowing about probabilities and likelihood to understanding Solomonoff induction? Did you have to read the original sources or you found some good explanations somewhere?
I also don’t get if this is a calculation that you can do in a single step or if this is a continuous thing. In other words, Solomonoff induction would work only if we assume that we keep observing new data?
Sorry for the stupid questions, as you can see, I am confused.
- Charlie Steiner 10 Jan 2024 6:55 UTC
  3 points
  0
  Parent
  how do you account for hypotheses that explain data generated randomly?
  “Randomness,” in this way of thinking, isn’t a property of hypotheses. There is no one hypothesis that is “the random hypothesis.” Randomness is just what happens when the outcome depends sensitively on things you don’t know.
  When you understood this, how was your personal process that took you from knowing about probabilities and likelihood to understanding Solomonoff induction? Did you have to read the original sources or you found some good explanations somewhere?
  I mean, there are explanations of solomonoff induction on this site that are fine, but for actually getting a deep understanding you’ve probably gotta do stuff like read An Introduction To Kolmogorov Complexity by Li and Vitanyi.
  - mukashi 11 Jan 2024 3:20 UTC
    4 points
    0
    Parent
    I think that re-reading again your answer made something click. So thanks for that
    The observed data is not **random**, because random is not a property of the data itself.
    The hypotheses that we want to evaluate are not random either, because we are analysing Turing machines that generate those data deterministically.
    If the data is HTHTHT, we do not test a python script that is doing:
    random.choices([“H”,”T”], k=6)
    What we test instead is something more like
    [“H”] +[“T”]+[“H”]+[“T”]+[“H”]+[“T”]
    And
    [“HT”]*3
    In this case, this last script will be simpler and for that reason, will receive a higher prior.
    If we apply this is a Bayesian setting, the likelihood of all these hyptohesis is necessarily 1, so the posterior probabilty just becomes the prior (divided by some factor), which is proportional to the length of the program. This makes sense because it is in agreement with Occam’s razor.
    The thing I still struggle to see is how I connect this framework with probabilistic hypothesis that I want to test, such as the data was generated by a fair coin. One possibility that I see (but I am not sure this is the correct thing) is testing all the possible strings generated by an algorithm like this:
    i=0
    while True:
    random.seed(i)
    random.choices([“H”,”T”], k=6)
    The likelihood of the strings like HHTHTH is 0 so we remove them and then we are left only with the algorithms that are consistent with the data.
    Not totally sure of the last part