I’m currently finishing a first degree in CS, and I’ve been reading LW for a few months now (since June). I’ve read through most of the Sequences and check the front page of the site for anything that looks interesting whenever I want to put off doing something, which is usually several times a day. I also need to get round to finishing Godel, Escher, Bach some time (I’m kinda slow).
I am, at the moment, a terrible rationalist—my goals aren’t even clearly defined, let alone acted on, and I have a strong background in tournament debating, which allows me to argue myself into believing whatever I feel like believing at any given moment. I think I’m getting better at that, but of course my own opinion is almost worthless as evidence on the subject.
On the other hand, reading this site (especially Yudkowsky’s stuff) at least made me stop being religious. I like to think I’d have got there in the end anyway, but seeing as I really didn’t enjoy it, I thank everyone here for pulling me out sooner rather than later.
Quick question: Does anyone know of a formal from-first-principles justification for Occam’s Razor (assigning prior probabilities in inverse proportion to the length of the model in universal description language)? Because I can’t find one, and frankly, if you can’t prove something, it’s probably not true. I’d rather not base my entire thought process on things that probably aren’t true.
Hoping to be able to contribute,
Ezekiel
PS Good grief, there’s an average of one introducing-yourself post every couple of days! Why the heck are all the front-page articles written by the same handful of people?
Does anyone know of a formal from-first-principles justification for Occam’s Razor (assigning prior probabilities in inverse proportion to the length of the model in universal description language)?
Quick question: Does anyone know of a formal from-first-principles justification for Occam’s Razor (assigning prior probabilities in inverse proportion to the length of the model in universal description language)?
Thanks, but that proof doesn’t work for the formulation of Occam’s Razor that I was talking about.
For example, if I have a boolean-output function, there are three “simplest possible” (2 bit long) minimum hypotheses as to what it is, before I see the evidence: [return 0], [return 1], and [return randomBit()]. But a “more complex” (longer than 2 bit) hypothesis, like [on call #i to function, return i mod 2] can’t be represented as being equivalent to [[one of the previous hypotheses] AND [something else]] so the conjunction rule doesn’t apply.
I think the conjunction-rule proof does work for the “minimum entities” formulation, but that one’s deeply problematic because, among other things, it assigns a higher prior probability to divine explanations (of complex systems) than physics-based ones.
assigning prior probabilities in inverse proportion to the length of the model in universal description language
What if instead of assigning prior probabilities to rules governing the universe in inverse proportion to the rules’ length, we assigned equal prior probabilities to rules governing the universe and assigned probabilities to states of the world based on the sum of the probability of each universe that could produce that state of the world times the probability that universe would produce it (as many universes would have randomized bits in their description)? I think the likelihood of outputting a string of a hundred ones in a row would then be greater than that of outputting 0001010010100110100010000100100010100100110101101000000101101111110110111101001001100010001011110000.
We could then revisit our assumption that in the rules’ world, all are equally likely regardless of length. After all, if there is a meta-rule world behind the rule world, each rule would not be equally likely as an output of the meta-rules because simpler rules are produced by more meta-rules; their relationship is as that of states of the world and rules above.
This would reverberate down the meta-rule chain and make simpler states of the world even more likely.
However, this might not make any sense. There would be no meta-meta-...meta-rule world to rule them all, and it would be turtles all the way down. It might not make sense to integrate over an infinity of rules in which none are given preferential weighing such that an infinite series of decreasing numbers can be constructed, nor to have effects reverberate down an infinite chain to reach a bottom state of the world.
Quick question: Does anyone know of a formal from-first-principles justification for Occam’s Razor (assigning prior probabilities in inverse proportion to the length of the model in universal description language)? Because I can’t find one, and frankly, if you can’t prove something, it’s probably not true. I’d rather not base my entire thought process on things that probably aren’t true.
I suspect you will never find one. To get the scientific process off the ground you have to start with the linked assumptions “the universe is lawful” and “simpler explanations are preferable to more complex ones”. Those are more like mathematical axioms than positions based on evidence.
The reason being, you can explain absolutely any observation with an unboundedly large set of theories if you are allowed to assume that the laws of the universe change or that complex explanations are kosher. The only way to squeeze the search space down to a manageable size is to check the simplest theories first.
Fortunately it turns out we live in a universe where this is a very fruitful strategy.
ETA: I’m relatively new here: Whoever downvoted this could you perhaps explain your thinking?
Upvoted for pointing out that Yudkowsky already dealt with the issue. I’d forgotten. I’m still not completely happy, but I guess sometimes you do hit rock bottom...
Hi, everyone.
I’m currently finishing a first degree in CS, and I’ve been reading LW for a few months now (since June). I’ve read through most of the Sequences and check the front page of the site for anything that looks interesting whenever I want to put off doing something, which is usually several times a day. I also need to get round to finishing Godel, Escher, Bach some time (I’m kinda slow).
I am, at the moment, a terrible rationalist—my goals aren’t even clearly defined, let alone acted on, and I have a strong background in tournament debating, which allows me to argue myself into believing whatever I feel like believing at any given moment. I think I’m getting better at that, but of course my own opinion is almost worthless as evidence on the subject.
On the other hand, reading this site (especially Yudkowsky’s stuff) at least made me stop being religious. I like to think I’d have got there in the end anyway, but seeing as I really didn’t enjoy it, I thank everyone here for pulling me out sooner rather than later.
Quick question: Does anyone know of a formal from-first-principles justification for Occam’s Razor (assigning prior probabilities in inverse proportion to the length of the model in universal description language)? Because I can’t find one, and frankly, if you can’t prove something, it’s probably not true. I’d rather not base my entire thought process on things that probably aren’t true.
Hoping to be able to contribute, Ezekiel
PS Good grief, there’s an average of one introducing-yourself post every couple of days! Why the heck are all the front-page articles written by the same handful of people?
Maybe Kevin T. Kelly’s work will fit your bill? Also see the discussion on LW.
http://wiki.lesswrong.com/wiki/Occam’s_razor Not sure if thats in depth enough, but I think it does a pretty good job. -edit the apostrophe seems to break the link, but the url is right.
Thanks, but that proof doesn’t work for the formulation of Occam’s Razor that I was talking about.
For example, if I have a boolean-output function, there are three “simplest possible” (2 bit long) minimum hypotheses as to what it is, before I see the evidence: [return 0], [return 1], and [return randomBit()]. But a “more complex” (longer than 2 bit) hypothesis, like [on call #i to function, return i mod 2] can’t be represented as being equivalent to [[one of the previous hypotheses] AND [something else]] so the conjunction rule doesn’t apply.
I think the conjunction-rule proof does work for the “minimum entities” formulation, but that one’s deeply problematic because, among other things, it assigns a higher prior probability to divine explanations (of complex systems) than physics-based ones.
What if instead of assigning prior probabilities to rules governing the universe in inverse proportion to the rules’ length, we assigned equal prior probabilities to rules governing the universe and assigned probabilities to states of the world based on the sum of the probability of each universe that could produce that state of the world times the probability that universe would produce it (as many universes would have randomized bits in their description)? I think the likelihood of outputting a string of a hundred ones in a row would then be greater than that of outputting 0001010010100110100010000100100010100100110101101000000101101111110110111101001001100010001011110000.
We could then revisit our assumption that in the rules’ world, all are equally likely regardless of length. After all, if there is a meta-rule world behind the rule world, each rule would not be equally likely as an output of the meta-rules because simpler rules are produced by more meta-rules; their relationship is as that of states of the world and rules above.
This would reverberate down the meta-rule chain and make simpler states of the world even more likely.
However, this might not make any sense. There would be no meta-meta-...meta-rule world to rule them all, and it would be turtles all the way down. It might not make sense to integrate over an infinity of rules in which none are given preferential weighing such that an infinite series of decreasing numbers can be constructed, nor to have effects reverberate down an infinite chain to reach a bottom state of the world.
I suspect you will never find one. To get the scientific process off the ground you have to start with the linked assumptions “the universe is lawful” and “simpler explanations are preferable to more complex ones”. Those are more like mathematical axioms than positions based on evidence.
The reason being, you can explain absolutely any observation with an unboundedly large set of theories if you are allowed to assume that the laws of the universe change or that complex explanations are kosher. The only way to squeeze the search space down to a manageable size is to check the simplest theories first.
Fortunately it turns out we live in a universe where this is a very fruitful strategy.
ETA: I’m relatively new here: Whoever downvoted this could you perhaps explain your thinking?
As I understand it, that is the justification.
Upvoted for pointing out that Yudkowsky already dealt with the issue. I’d forgotten. I’m still not completely happy, but I guess sometimes you do hit rock bottom...