Solomonoff induction. I never took the time to understand it or related ideas.
It doesn’t work (as advertised, here) anyway.
There are three, intertwined, problems . One is the problem of realistic reference: how the candidate program in an SI makes statements about the external world -- that is realism in the sense of scientific realism, Scientific realism contrasts with scientific instrumentalism which Solomonoffs version of SI can obviously do.
The second is the issue of whether and in what sense an SI is calculating probabilities.
The third is the problem of how SI fulfils its basic role of predicting a series of observations.
On the face of it, Solomonoff Inductors contain computer programmes, not explanations, not hypotheses and not descriptions. (I am grouping explanations, hypotheses and beliefs as things which have a semantic interpretation, which say something about reality . In particular, physics has a semantic interpretation in a way that maths does not.)
The Yukdowskian version of Solomonoff switches from talking about programs to talking about hypotheses as if they are obviously equivalent. Is it obvious? There’s a vague and loose sense in which physical theories “are” maths, and computer programs “are” maths, and so on. But there are many difficulties in the details. Neither mathematical equations not computer programmes contain straightforward ontological assertions like “electrons exist”. The question of how to interpret physical equations is difficult and vexed. And a Solomonoff inductor contains programmes, not typical physics equations. whatever problems there are in interpreting maths ontologically are compounded when you have the additional stage of inferring maths from programmes.
In physics, the meanings of the symbols taught to students, rather than being discovered in the maths. Students are taught the in f=ma, f is force, is mass and a is acceleration. The equation itself , as pure maths, does not determine the meaning. For instance it has the same mathematical form as P=IV, which “means” something different. Physics and maths are not the same subject, and the fact that physics has a real-world semantics is one of the differences.
Similarly, the instructions in a programme have semantics related to programme operations, but not to the outside world. The issue is obscured by thinking in terms of source code. Source code often has meaningful symbol names , such as MASS or GRAVITY...but that’s to make it comprehensible to other programmers. The symbol names have no effect on the function and could be mangled into something meaningless but unique. And a SI executes machine code anyway..otherwise , you can’t meaningfully compare programme lengths. Note how the process of backtracking from machine code to meaningful source code is a difficult one. Programmers use meaningful symbols because you can’t easily figure out what real world problem a piece of machine code is solving from its function. One number is added to another..what does that mean? What do the quantifies represent?
Well, maybe programmes-as-descriptions doesn’t work on the basis that individual Al symbols or instructions have meanings in the way that natural language words do. Maybe the programme as a whole expresses a mathematician structure as a whole. But that makes the whole situation worse because it adds an extra step , the step of going from code to maths, to the existing problem of going from maths to ontology.
The process of reading ontological models from maths is not formal or algorithmic. It can’t be asserted that SI is the best formal epistemology we have and also that it is capable of automating scientific realism. Inasmuch as it is realistic , the step from formalism to realistic interpretation depends on human interpretation, and so is not formal. And if it SI is purely formal, it is not realistic.
But code already is maths, surely? In physics the fundamental equations are on a higher abstraction level than a calculation: generally need to be ” solved” for some set of circumstances, to obtain a more concrete equation you can calculate with. To get back to what would normally be considered a mathematical structure, you would have to reverse the original process. If you succeed in doing that, then SI is as good or bad as physics...remember, that physics still needs ontological interpretation. If you don’t succeed in doing that.. which you you well might. since there is no algorithm reliable method for doing so...then SI is strictly worse that ordinary science, since it has an extra step of translation from calculation to mathematical structure, in addition to the standard step of translation from mathematical structure to ontology.
Maybe an SI could work off source code. It’s obviously the case that some kinds of source code imply an ontology. The problem now is that, to be unbiased, would have to range over every kind of ontology-implying source code. If it just ran an OO language , then it will be biased towards an object ontology , and unable to express other ontologies. If you just ran a procedural language, it would be biased towards process ontology.
What it is also unclear is whether the criterion that SI uses to sort programmes has anything to do with probability, truth or correspondence.
Why use occams razor at all?
If we were only interested in empirical adequacy, the ability to make accurate predictions, simplicity only buys the ability to make predictions with fewer calculations. But SI, according to Yudkowsky, but not Solomonoff, doesn’t just make predictions, it tells you you true facts about the world .
If you are using a simplicity criterion to decide between theories that already known to be predictive , as in Solomonoff induction, then simplicity doesn’t buy you any extra predictiveness, so the extra factor it buys you is presumably truth.
There are multiple simplicity criteria, but not multiple truths. So you need the right simplicity criterion.
If you have a conceptually valid simplicity critetion, and you formalise it, then thats as good as it gets, you’ve ticked all the boxes.
If you formalise a simplicity criterion that has no known relationship to truth, then you haven’t achieved anything. So it is not enough to say that Solomonoff is “the” formal standard of simplicity. There are any number of ways of conceptualising simplicity, and you need the right one.
Consider this exchange, from “A semi technical introduction to Solomonoff Induction”
.
“ASHLEY: Uh, but you didn’t actually use the notion of computational simplicity to get that conclusion; you just required that the supply of probability mass is finite and the supply of potential complications is infinite. Any way of counting discrete complications would imply that conclusion, even if it went by surface wheels and gears.
BLAINE: Well, maybe. But it so happens that Yudkowsky did invent or reinvent that argument after pondering Solomonoff induction, and if it predates him (or Solomonoff) then Yudkowsky doesn’t know the source. Concrete inspiration for simplified arguments is also a credit to a theory, especially if the simplified argument didn’t exist before that.
ASHLEY: Fair enough.”
I think Ashley deserves an answer to “the objection “[a]ny way of counting discrete complications would imply that conclusion, even if it went by surface wheels and gears”, not a claim about who invented what first!
Or you could write a theory in English, and count the number of letters...that’s formal. But what has it to do with truth and reality. But what, equally, does a count of machine code instructions have to do with truth or probability?
There is one interpretation of Occam’s razor, the epistemic interpretation of it, that has the required properties. If you consider a theory as a conjunction if propositions having a probability less than one, then all else being equal, a higher count of propositions will be less probable. We already know that propositions are truth-apt , that they are capable of expressing something about the world, and it is reasonable to treat them probabilistically.
So that is the right simplicity criterion...except that it had nothing to do with SI.
SI is trying to process an infinite list of programmes in a finite time. To achieve this, as a matter of shear practicality, processes shorter candidate programmes first.
The Yudkowskian claim is that programme length is probability. One immediate concern is that SIs don’t use programme length because it’s probability, they use it to work at all.
If you have an infinite number of hypotheses to consider, then you need to to neglect the longer or more complex ones in order to be able to terminate at all. But that’s only a condition of termination, not of truth
In addition, there is more than one way of doing, whereas we would expect the true, corresponding map of reality
Programme length clearly isn’t probability in the frequentist sense, the probability of an event occurring. It is presumably a level of rational credibility that the hypothesis represented by the program models reality.
It’s uncontentious that hypotheses and beliefs have a probability of being true, of succeeding in corresponding. But what does it mean to say that one programme is more probable than another? The criterion SI uses s that it is short. A shorter programme is deemed more probable.
(Information theory talks about of probability and information content—but in the right sense?)
A shorter bitstring is more likely to be found in a random sequence, but what has that to do with constructing a true model of the universe?
Does Solomonoff induction prove that Many Worlds is simpler than Copenhagen?
In 2012, private_messaging made this argument against the claim that SI would prove MWI.
If the program running the SWE outputs information about all worlds on a single output tape, they are going to have to be concatenated or interleaved somehow. Which means that to make use of the information, you gave to identify the subset of bits relating to your world. That’s extra complexity which isn’t accounted for because it’s being done by hand, as it were..
The objection, made by Sawin is that a computational model of MWI is only more complex in space, which , for the purposes SI doesn’t count. But that misses the point:an SI isn’t just an ontological model, it has to match empirical data as well.
In fact, if you discount the complexity of the process by which one observer picks out their observations from a morass of data, MWI isn’t the preferred ontology. The easisest way of generating data that contains any substring is a PRNG, not MWI. If you count the process by which one observer picks out their observations from a morass of data as having zero cost, you basically ending up proving that “everything random” is the simplest explanation.
It doesn’t work (as advertised, here) anyway.
There are three, intertwined, problems . One is the problem of realistic reference: how the candidate program in an SI makes statements about the external world -- that is realism in the sense of scientific realism, Scientific realism contrasts with scientific instrumentalism which Solomonoffs version of SI can obviously do.
The second is the issue of whether and in what sense an SI is calculating probabilities.
The third is the problem of how SI fulfils its basic role of predicting a series of observations.
On the face of it, Solomonoff Inductors contain computer programmes, not explanations, not hypotheses and not descriptions. (I am grouping explanations, hypotheses and beliefs as things which have a semantic interpretation, which say something about reality . In particular, physics has a semantic interpretation in a way that maths does not.)
The Yukdowskian version of Solomonoff switches from talking about programs to talking about hypotheses as if they are obviously equivalent. Is it obvious? There’s a vague and loose sense in which physical theories “are” maths, and computer programs “are” maths, and so on. But there are many difficulties in the details. Neither mathematical equations not computer programmes contain straightforward ontological assertions like “electrons exist”. The question of how to interpret physical equations is difficult and vexed. And a Solomonoff inductor contains programmes, not typical physics equations. whatever problems there are in interpreting maths ontologically are compounded when you have the additional stage of inferring maths from programmes.
In physics, the meanings of the symbols taught to students, rather than being discovered in the maths. Students are taught the in f=ma, f is force, is mass and a is acceleration. The equation itself , as pure maths, does not determine the meaning. For instance it has the same mathematical form as P=IV, which “means” something different. Physics and maths are not the same subject, and the fact that physics has a real-world semantics is one of the differences.
Similarly, the instructions in a programme have semantics related to programme operations, but not to the outside world. The issue is obscured by thinking in terms of source code. Source code often has meaningful symbol names , such as MASS or GRAVITY...but that’s to make it comprehensible to other programmers. The symbol names have no effect on the function and could be mangled into something meaningless but unique. And a SI executes machine code anyway..otherwise , you can’t meaningfully compare programme lengths. Note how the process of backtracking from machine code to meaningful source code is a difficult one. Programmers use meaningful symbols because you can’t easily figure out what real world problem a piece of machine code is solving from its function. One number is added to another..what does that mean? What do the quantifies represent?
Well, maybe programmes-as-descriptions doesn’t work on the basis that individual Al symbols or instructions have meanings in the way that natural language words do. Maybe the programme as a whole expresses a mathematician structure as a whole. But that makes the whole situation worse because it adds an extra step , the step of going from code to maths, to the existing problem of going from maths to ontology.
The process of reading ontological models from maths is not formal or algorithmic. It can’t be asserted that SI is the best formal epistemology we have and also that it is capable of automating scientific realism. Inasmuch as it is realistic , the step from formalism to realistic interpretation depends on human interpretation, and so is not formal. And if it SI is purely formal, it is not realistic.
But code already is maths, surely? In physics the fundamental equations are on a higher abstraction level than a calculation: generally need to be ” solved” for some set of circumstances, to obtain a more concrete equation you can calculate with. To get back to what would normally be considered a mathematical structure, you would have to reverse the original process. If you succeed in doing that, then SI is as good or bad as physics...remember, that physics still needs ontological interpretation. If you don’t succeed in doing that.. which you you well might. since there is no algorithm reliable method for doing so...then SI is strictly worse that ordinary science, since it has an extra step of translation from calculation to mathematical structure, in addition to the standard step of translation from mathematical structure to ontology.
Maybe an SI could work off source code. It’s obviously the case that some kinds of source code imply an ontology. The problem now is that, to be unbiased, would have to range over every kind of ontology-implying source code. If it just ran an OO language , then it will be biased towards an object ontology , and unable to express other ontologies. If you just ran a procedural language, it would be biased towards process ontology.
What it is also unclear is whether the criterion that SI uses to sort programmes has anything to do with probability, truth or correspondence.
Why use occams razor at all? If we were only interested in empirical adequacy, the ability to make accurate predictions, simplicity only buys the ability to make predictions with fewer calculations. But SI, according to Yudkowsky, but not Solomonoff, doesn’t just make predictions, it tells you you true facts about the world .
If you are using a simplicity criterion to decide between theories that already known to be predictive , as in Solomonoff induction, then simplicity doesn’t buy you any extra predictiveness, so the extra factor it buys you is presumably truth.
There are multiple simplicity criteria, but not multiple truths. So you need the right simplicity criterion. If you have a conceptually valid simplicity critetion, and you formalise it, then thats as good as it gets, you’ve ticked all the boxes. If you formalise a simplicity criterion that has no known relationship to truth, then you haven’t achieved anything. So it is not enough to say that Solomonoff is “the” formal standard of simplicity. There are any number of ways of conceptualising simplicity, and you need the right one.
Consider this exchange, from “A semi technical introduction to Solomonoff Induction” .
“ASHLEY: Uh, but you didn’t actually use the notion of computational simplicity to get that conclusion; you just required that the supply of probability mass is finite and the supply of potential complications is infinite. Any way of counting discrete complications would imply that conclusion, even if it went by surface wheels and gears.
BLAINE: Well, maybe. But it so happens that Yudkowsky did invent or reinvent that argument after pondering Solomonoff induction, and if it predates him (or Solomonoff) then Yudkowsky doesn’t know the source. Concrete inspiration for simplified arguments is also a credit to a theory, especially if the simplified argument didn’t exist before that.
ASHLEY: Fair enough.”
I think Ashley deserves an answer to “the objection “[a]ny way of counting discrete complications would imply that conclusion, even if it went by surface wheels and gears”, not a claim about who invented what first!
Or you could write a theory in English, and count the number of letters...that’s formal. But what has it to do with truth and reality. But what, equally, does a count of machine code instructions have to do with truth or probability?
There is one interpretation of Occam’s razor, the epistemic interpretation of it, that has the required properties. If you consider a theory as a conjunction if propositions having a probability less than one, then all else being equal, a higher count of propositions will be less probable. We already know that propositions are truth-apt , that they are capable of expressing something about the world, and it is reasonable to treat them probabilistically.
So that is the right simplicity criterion...except that it had nothing to do with SI.
SI is trying to process an infinite list of programmes in a finite time. To achieve this, as a matter of shear practicality, processes shorter candidate programmes first.
The Yudkowskian claim is that programme length is probability. One immediate concern is that SIs don’t use programme length because it’s probability, they use it to work at all.
If you have an infinite number of hypotheses to consider, then you need to to neglect the longer or more complex ones in order to be able to terminate at all. But that’s only a condition of termination, not of truth
In addition, there is more than one way of doing, whereas we would expect the true, corresponding map of reality Programme length clearly isn’t probability in the frequentist sense, the probability of an event occurring. It is presumably a level of rational credibility that the hypothesis represented by the program models reality.
It’s uncontentious that hypotheses and beliefs have a probability of being true, of succeeding in corresponding. But what does it mean to say that one programme is more probable than another? The criterion SI uses s that it is short. A shorter programme is deemed more probable.
(Information theory talks about of probability and information content—but in the right sense?)
A shorter bitstring is more likely to be found in a random sequence, but what has that to do with constructing a true model of the universe?
Does Solomonoff induction prove that Many Worlds is simpler than Copenhagen?
In 2012, private_messaging made this argument against the claim that SI would prove MWI.
https://www.lesswrong.com/posts/6Lg8RWL9pEvoAeEvr/raising-safety-consciousness-among-agi-researchers?commentId=QtwqxiwnCSksCc566
If the program running the SWE outputs information about all worlds on a single output tape, they are going to have to be concatenated or interleaved somehow. Which means that to make use of the information, you gave to identify the subset of bits relating to your world. That’s extra complexity which isn’t accounted for because it’s being done by hand, as it were..
The objection, made by Sawin is that a computational model of MWI is only more complex in space, which , for the purposes SI doesn’t count. But that misses the point:an SI isn’t just an ontological model, it has to match empirical data as well.
In fact, if you discount the complexity of the process by which one observer picks out their observations from a morass of data, MWI isn’t the preferred ontology. The easisest way of generating data that contains any substring is a PRNG, not MWI. If you count the process by which one observer picks out their observations from a morass of data as having zero cost, you basically ending up proving that “everything random” is the simplest explanation.