How many bits does it take to say “mass times the second derivative of position”? An exact answer would depend on the coding system. But if mass, position, time, multiplication, and differentiation are already defined, then, not many bits. You’re defining force as ProductOf(mass,d[d(position)/d(time)]/d(time)).
See the discussion here on the bit complexity of physical laws. One thing missing from that discussion is how to code the rules for interpreting empirical data in terms of a hypothesis. A model is useless if you don’t know what empirical phenomena it is supposed to be describing, and that has to be specified somehow too.
I understand all that, I just want a worked example, not only hand-waving. After all, a formalization of Occam’s razor is supposed to be useful in order to be considered rational.
After all, a formalization of Occam’s razor is supposed to be useful in order to be considered rational.
Declaring a mathematical abstraction useless just because it is not practically applicable to whatever your purpose may be is pretty short-sighted. The concept of infinity isn’t useful to engineers, but it’s very useful to mathematicians. Does that make it irrational?
Remember, the Kolmogorov complexity depends on your “universal Turing machine”, so we should expect to only get estimates. Mitchell makes an estimate of ~50000 bits for the new minimal standard model. I’m not an expert on physics, but the mathematics required to explain what a Lagrangian is would seem to require much more than that. I think you would need Peano arithmetic and a lot of set theory just to construct the real numbers so that you could do calculus (of course people were doing calculus for over one hundred years before real numbers existed, but I have a hard time imagining a rigorous calculus without them...) I admit that 50000 bits is a lot of data, but I’m sceptical that it could rigorously code all that mathematics.
F=ma has the same problem, of course. Does the right hand side really make sense without calculus?
ETA: If you want a fleshed out example, I think a much better problem to start off with would be predicting the digits of pi, or the prime numbers.
My estimate was 27000 bits to “encode the standard model” in Mathematica. To define all the necessary special functions on a UTM might take 50 times that.
I just want something simple but useful. Gotta start small. Once we are clear on F=ma, we can start thinking about formalizing more complicated models, and maybe some day even quantum mechanics and MWI vs collapse.
Maybe start by showing how it works to predict a sequence like 010101..., then something more complicated like 011011… It starts to get interesting with a sequence like 01011011101111… - how long would it take to converge on the right model there? (which is that the subsequence of 1s is one bit longer each time).
True Solomonoff induction is uselessly slow. The relationship of Solomonoff induction to actual induction is like the relationship of Principia Mathematica to a pocket calculator; you don’t use Russell and Whitehead’s methods and notation to do practical arithmetic. Solomonoff induction is a brute-force scan of the whole space of possible computational models for the best fit. Actual induction tends to start apriori with a very narrow class of possible hypotheses, and only branches out to more elaborate ones if that doesn’t work.
Maybe start by showing how it works to predict a sequence like 010101..., then something more complicated like 011011… It starts to get interesting with a sequence like 01011011101111… -
This would be a derivation of F=ma, vs all other possible laws. I am not asking for that. My question is supposedly much simpler: write a binary string corresponding to just one model out of infinitely many, namely F=ma.
If we adopt the paradigm in the article—a totally passive predictor, which just receives a stream of data and makes a causal model of what’s producing the data—then “F=ma” can only be part of the model. The model will also have to posit particular forces, and particular objects with particular masses.
Suppose the input consists of time series for the positions in three dimensions of hundreds of point objects interacting according to Newtonian gravity. I’m sure you can imagine what a program capable of generating such output looks like; you may even have written such a program. If the predictor does its job, then its model of the data source should also be such a program, but encoded in a form readable by a UTM (or whatever computational system we use).
Such a model would possibly have a subroutine which computed the gravitational force exerted by one object on another, and then another subroutine which computed the change in the position of an object during one timestep, as a function of all the forces acting on it. “F=ma” would be implicit in the second subroutine.
Solomonoff induction accounts for all of the data, which is a binary sequence of sensory-level happenings. In its hypothesis, there would have to be some sub-routine that extracted objects from the sensory data, one that extracted a mass from these objects, et cetera. The actual F=ma part would be a really far abstraction, though it would still be a binary sequence.
Solomonoff induction can’t really deal with counterfactuals in the same way that a typical scientific theory can. That is, we can say, “What if Jupiter was twice as close?” and then calculate it. With Solomonoff induction, we’d have to understand the big binary sequence hypothesis and isolate the high-level parts of the program to use just those to calculate the consequence of counterfactuals.
We can already do MWI vs Collapse without being clear on F=ma. MWI is not even considered because MWI does not output a string that begins with the observed data, i.e. MWI will never be found when doing Solomonoff induction. MWI’s code may be a part of correct code, such as Copenhagen interpretation (which includes MWI’s code). Or something else may be found (my bet is on something else because general relativity). It is this bloody simple.
The irony is, you can rule MWI out with Solomonoff induction without even choosing the machine or having a halting oracle. Note: you can’t rule out existence of many worlds. But MWI simply does not provide the right output.
We can already do MWI vs Collapse without being clear on F=ma.
At this point I am not interested in human logic, I want a calculation of complexity. I want a string (an algorithm) corresponding to F=ma. Then we can build on that.
If F, m and a are true real numbers, its un-computable (can’t code it in TM) and not even considered by Solomonoff induction, so here.
My point was simply that MWI is not a theory that would pass the ‘input begins with string s’ criterion, and therefore, is not even part of the sum at all.
It’s pretty hilarious, the explanation is semi okay, but implied is that Solomonoff induction is awesome, and the topic is hard to think about, so people substitute “awesome” for Solomonoff induction, then all the imagined implications end up “not even wrong”. edit: also someone somehow thinks that Solomonoff induction finds probabilities for theories, while it just assigns 2^-length as probability for software code of such length, which is obviously absurd when applied to anything but brute force generated shortest pieces of code, because we don’t get codes to their minimum lengths (thats uncomputable) and the hypothesis generation process can have e.g. bloat up factor of 10 (plausible) or even a quadratic blow up factor, making the 2^-length prior be much much too strongly discriminating against actual theories in favour of simply having the copy of the data stored inside.
If F, m and a are true real numbers, its un-computable (can’t code it in TM) and not even considered by Solomonoff induction, so here.
That’s a cop-out, just discretize the relevant variables to deal with integers, Planck units, if you feel like it. The complexity should not depend on the step size.
Well it does depend on the step size. You end up with higher probability for larger discretization constant (and zero probability for no discretization at all), if you use the Turing machine model of computation (and if you use something that does reals you will not face such issue). I’m trying explain that in the ideal limit, it has certain huge shortcomings. The optimality proofs do not imply it is good, they only imply other stuff isn’t ‘everywhere as good and somewhere better’.
The primary use of this sort of thing—highly idealized induction that is uncomputable—is not to do induction but to find limits to induction.
Why are his comments in this thread getting downvoted? They show a quite nuanced understanding of S. I. and raise interesting points.
If there is no requirement for the observed data to be at the start of the string that is output, then the simplest program that explains absolutely everything that is computable is this:
Print random digits. (This was actually a tongue-in-cheek Schmidhuber result from the early 2000s, IIRC. The easiest program whose output will assuredly contain our universe somewhere along the line.)
Luckily there is such a requirement, and I don’t know how MWI could possibly fit into it. This unacknowledged tension has long bugged me, and I’m glad someone else is aware of it.
He identifies subtleties, but doesn’t look very hard to see whether other people could have reasonably supposed that the subtleties resolve in a different way than he thinks they “obviously” do. Then he starts pre-emptively campaigning viciously for contempt for everyone who draws a different conclusion than the one from his analysis. Very trigger-happy.
This needlessly pollutes discussion… that is to say, “needless” in the moral perspective of everyone who doesn’t already believe that most people who first appear wrong by that criterion that way in fact are wrong, and negligently and effectively incorrigibly so, such that there’d be nothing to lose by loosing broadside salvos before the discussion has even really started. (Incidentally, it also disincentivizes the people who could actually explain the alternative treatment of the subtleties from engaging with him, by demonstrating a disinclination to bother to suppose that their position might be reasonble.) This perception of needlessness, together with the usual assumption that he must already be on some level aware of other peoples’ belief in that needlessness but is disregarding that belief, is where most of the negative affect toward him comes from.
Also, his occasional previous lack of concern for solid English grammar didn’t help the picture of him as not really caring about the possibility that the people he was talking to might not deserve the contempt for them that third parties would inevitably come away with the impression that he was signaling.
(I wish LW had more people who were capable of explaining their objections understandably like this, instead of being stuck with a tangle of social intuitions which they aren’t capable of unpacking in any more sophisticated way than by hitting the “retaliate” button.)
Were capable of and bothered to, I suppose. I rarely bother to explain the reasons for my value judgments unless I’m specifically asked, and sometimes not even then. Especially not when it comes to value judgments of random people on the Internet. Low-value Internet interactions are fungible.
See edit on this post . Also, note that this tirade starts from going down to −6 votes (or so) on, you know, perfectly valid issue that Solomonoff induction has with expressing the MWI. You get the timeline backwards. I post something sensible, concise and clear, then I get it down −6 or so, THEN I can’t help it but assume utter lack of understanding. Can’t you see I have my retaliate button too? There’s more downvotes here to pointing out a technical issue normally without flaming anyone. This was early in my posting history (before me disagreeing with anyone massively thats it):
I’m sorry; I was referring to what I had perceived as a general pattern, from seeing snippets of discussions involving you while I was lurking off-and-on. The “pre-emptive” was meant to refer to within single exchanges, not to refer all the way back to (in this case) the original discussion about MWI (which I’m still hunting down). Now that I look more closely at your history, this has only been at all frequent within the past few months.
I don’t have any specific recollection of you from before that comment on the “detecting rationalization” post, but looking back through your comment history of that period, I’m mystified by what happened there too. It’s possible that someone thought you were Thomas covertly giving a self-justifying speech about a random red-herring explanation he’d invented to tell himself for why other people disagreed with him, and they wished to discourage him from thinking other people agreed with that explanation.
I responded privately earlier… I really don’t quite know why I still bother posting here. Also btw, there’s other thing: I posted several things that werequite seriously wrong, in the sense of wrong chain of reasoning (not outcome). Those were upvoted and agreed with a fair lot.
Also, on the MWI, I am a believer that as far as we know there can be many worlds out there, and even if quantum mechanics is wrong it is fairly natural for mathematics to work out to many worlds, and it is not like believing in the apple cake in the asteroid belt. I do not dislike the conclusion. I dislike the argument. Ditto for Solomonoff induction and theism, I am an atheist. I tend to be particularly negative towards the arguments that incorrectly argue in favour of what I believe, on the few times that I notice the incorrectness (obviously that got to be much less common than seeing incorrectness in the arguments in favour of what I don’t believe).
p.s. please note that the field is VERY full of impossibility proofs (derived from Halting Problem) which are of the form ‘this does not resolve” (and full of “even if you postulate that this resolves, then something else doesn’t resolve”) and I do not look very hard indeed into the possibility that the associated subtleties resolve. Please also note that the subtlety of ‘the output must begin with data’ is not the kind of subtlety that resolves in any way. I can sometimes be very certain that a subtlety does not resolve; most of the time I am not certain (this field is outside my field of expertise) and I seldom if ever comment on those points (e.g. I refrained from commenting on Solomonoff induction here in the beginning of my post history) and I do not comment negatively on those.
The Solomonoff induction is a very complicated topic and it requires very careful consideration before the speculations. Due to it’s high difficulty the probability of subtleties resolving is considerably less than in the less complicated topics.
The subtleties I first had in mind were the ones that should have (but didn’t) come up in the original earlier dicussion of MWI, having to do with the different numbers of bits in different parts of an observation-predicting program based on a physical theory, and which of those parts should have their bits be charged against the prior or likelihood of the physical theory itself and which of the parts should have their bits be taken for granted as intrinsic parts of the anthropic reasoning that any agent would need to be capable of (even if some physical theories didn’t use part of that anthropic reasoning “software”).
(I didn’t specify what the subtleties were, and you seem to have picked a reading of which subtleties I must have been referring to and what I must have meant by “resolve” that together made what I was saying make no sense at all. This might be another example of the proposed tendency of “not looking very hard to see whether other people could have reasonably supposed” etc. (whether or not other people in the same reference class from your point of view as me weren’t signaling that they understood the point either).)
Well, taking for granted some particular class of bits is very problematic when you have brute force search over values of those bits. You can have a reasonably short program (that can be shorter than physics) which would iterate all theories of physics and run them for a very very long time. Then if you are allowed to just search for the observers, this program will be the simplest theory, and you are effectively back to the square one; you don’t get anything useful (you got solomonoff induction inside solomonoff induction). Sorry I still can’t make sense out of it.
I also don’t see why I should have searched for possible resolutions, and assume that other party has good reason to expect such resolution, if I reasonably believe that such resolution would have a good chance at Fields medal (or even a good reason to expect such resolution would).
I also don’t like speculative conjectures via lack of counter argument as combined with posts like this (and all posts inspired by it’s style), and feel very disinclined to run the highly computationally expensive search for possible solutions by a person that would not have such considerations about the entire majority of scientists that he thinks believe in something other than MWI. (in so much as MWI is part of CI every CI endorser believes in MWI in a way, they just believe that it is invalid to believe in extra worlds that can’t be observed, or other philosophical stance that is a matter of opinion).
edit: that is to say, the stance often expressed on MWI here is normative: if you don’t believe in MWI you are wrong, and not just that, but the scientific community is wrong for not believing in MWI. That is the backgrounder. I do believe many worlds are a possibility, but I do not believe the above argument to be valid. And as you yourself have eloquently explained, one should not proclaim those believing in MWI to be stupid on basis of unresolved problems not being resolved (I do not do that). But this also goes with CI, and I am not the community that is being normative about what is rational and judges by MWI belief status whenever one’s rational. edit2: and I do not see how this strong-pro-MWI stance is compatible with awareness of the subtleties. Furthermore, at the point when you make so many implicit assumptions about subtleties resolving in your favour, you could as well say Occam’s razor; it is not appropriate to use specific term (Solomonoff induction) to refer to something fuzzy.
a troll is someone who posts inflammatory extraneous, or off-topic messages in an online community, such as an online discussion forum, chat room, or blog, with the primary intent of provoking readers into an emotional response or of otherwise disrupting normal on-topic discussion
Let’s see:
inflammatory: check
extraneous: sometimes, not in this case
off-topic: not exactly
intent to provoke/disrupt: not in my estimation
so, maybe 25-30% trollness.
Safely assume bad faith.
I never get this impression from his posts. They seem honest (if sometimes misguided) not malicious to me.
Intent makes a difference for me. private_messaging seems to want to get his point across (not counting an occasional rant), without regard to the way his comments come across. I did not detect any intent of riling people up for its own sake.
I suspect that others downvote private_messaging because of his notoriety. I did downvote his comment because he strayed away from my explicit (estimate the complexity of the Newton’s 2nd law) request and toward a flogged-to-death topic of MWI vs the world. Such a discussion has proven to be unproductive time and again in this forum.
I suspect that others downvote private_messaging because of his notoriety. I did downvote his comment because he strayed away from my explicit (estimate the complexity of the Newton’s 2nd law) request and toward a flogged-to-death topic of MWI vs the world. Such a discussion has proven to be unproductive time and again in this forum.
Likewise. (With the caveat that I endorse downvoting extreme cases based on notoriety so probably would have downvoted anyway.)
What is special instructions and what is general rules? The collapse in CI is ‘special instructions’ unless you count confused and straw-man variations of CI. It’s only rules in objective collapse theories.
Yes, no shit Sherlock, why? I do not know. It’s called ‘interpretations’ for a reason. You tell me. I am not the one drawing distinction with posts like this.
Also, in so much as the codes produce identical output, the Solomonoff induction using agent (like AIXI) behaves the same and it doesn’t matter what interpretation label you slap onto the shortest model in it.
edit: also, in physics it is important to minimize number of principles, even if that doesn’t minimize the predictor code. We want to derive apparent collapse from the de-coherence, even if predictor code is the same. That’s because we do look for understanding, you know, intellectual curiosity and all that.
MWI is not even considered because MWI does not output a string that begins with the observed data, i.e. MWI will never be found when doing Solomonoff induction.
The same observed that produced Copenhagen and de Broglie-Bohm produced MWI. You acknowledge as much when you state that Copenhagen extends MWI with more axioms. The observation string for MWI is then identical to Copenhagen, and there is no reason to select Copenhagen as preferred.
I think this post is making the mistake of allowing the hypothesis to be non-total. Definition: a total hypothesis explains everything, it’s a universe-predicting machine and equivalent to “the laws of physics”. A non-total hypothesis is like an unspecified total hypothesis with a piece of hypothesis tacked on. Neither what it does, nor any meaningful understanding of its length, can be derived without specifying what it’s to be attached to.
The number of bits required to express something in English is far from the information-theoretic complexity of the thing. “The woman over there is a witch; she did it.”
Depends on your choice of Universal Turing Machine. You can choose a human, which is a valid universal turing machine, and then the kolmogorov complexity is equal.
Hm, good point. I think there might still be a way to save the concept that witches are more complex than electromagnetism, though.
You need a very large overhead for a human. This overhead contains some of the complexity of “witch” but comparatively less of the complexity of “electromagnetism”. So Complexity(human)+Complexity(“witch”, context=human) is an upper bound on the “inherent complexity” of “witch”, and the latter term alone doesn’t mean as much.
I think people missed a joke here. I mean, seriously, EY is not so stupid as to think that it is 4 bits literally. And if it is 4 symbols, and symbols are arbitrary size, then it’s not ‘plus a constant’, it’s multiply by a log2(nsymbols in alphabet) plus a constant (suppose I make Turing machine with 8-symbol tape, then on this machine I can compress arbitrary long programs of other machine into third length plus a constant).
I think people missed a joke here. I mean, seriously, EY is not so stupid as to think that it is 4 bits literally. And if it is 4 symbols, and symbols are arbitrary size, then it’s not ‘plus a constant’, it’s multiply by a log2(nsymbols in alphabet) plus a constant (suppose I make Turing machine with 8-symbol tape, then on this machine I can compress arbitrary long programs of other machine into third length plus a constant).
No, really. It can be 4 literal bits and a sufficiently arbitrary constant. It’s still a joke and I rather liked it myself.
What’s the length of the hypothesis that F=ma?
How many bits does it take to say “mass times the second derivative of position”? An exact answer would depend on the coding system. But if mass, position, time, multiplication, and differentiation are already defined, then, not many bits. You’re defining force as ProductOf(mass,d[d(position)/d(time)]/d(time)).
See the discussion here on the bit complexity of physical laws. One thing missing from that discussion is how to code the rules for interpreting empirical data in terms of a hypothesis. A model is useless if you don’t know what empirical phenomena it is supposed to be describing, and that has to be specified somehow too.
I understand all that, I just want a worked example, not only hand-waving. After all, a formalization of Occam’s razor is supposed to be useful in order to be considered rational.
Declaring a mathematical abstraction useless just because it is not practically applicable to whatever your purpose may be is pretty short-sighted. The concept of infinity isn’t useful to engineers, but it’s very useful to mathematicians. Does that make it irrational?
Remember, the Kolmogorov complexity depends on your “universal Turing machine”, so we should expect to only get estimates. Mitchell makes an estimate of ~50000 bits for the new minimal standard model. I’m not an expert on physics, but the mathematics required to explain what a Lagrangian is would seem to require much more than that. I think you would need Peano arithmetic and a lot of set theory just to construct the real numbers so that you could do calculus (of course people were doing calculus for over one hundred years before real numbers existed, but I have a hard time imagining a rigorous calculus without them...) I admit that 50000 bits is a lot of data, but I’m sceptical that it could rigorously code all that mathematics.
F=ma has the same problem, of course. Does the right hand side really make sense without calculus?
ETA: If you want a fleshed out example, I think a much better problem to start off with would be predicting the digits of pi, or the prime numbers.
My estimate was 27000 bits to “encode the standard model” in Mathematica. To define all the necessary special functions on a UTM might take 50 times that.
I just want something simple but useful. Gotta start small. Once we are clear on F=ma, we can start thinking about formalizing more complicated models, and maybe some day even quantum mechanics and MWI vs collapse.
Maybe start by showing how it works to predict a sequence like 010101..., then something more complicated like 011011… It starts to get interesting with a sequence like 01011011101111… - how long would it take to converge on the right model there? (which is that the subsequence of 1s is one bit longer each time).
True Solomonoff induction is uselessly slow. The relationship of Solomonoff induction to actual induction is like the relationship of Principia Mathematica to a pocket calculator; you don’t use Russell and Whitehead’s methods and notation to do practical arithmetic. Solomonoff induction is a brute-force scan of the whole space of possible computational models for the best fit. Actual induction tends to start apriori with a very narrow class of possible hypotheses, and only branches out to more elaborate ones if that doesn’t work.
This would be a derivation of F=ma, vs all other possible laws. I am not asking for that. My question is supposedly much simpler: write a binary string corresponding to just one model out of infinitely many, namely F=ma.
If we adopt the paradigm in the article—a totally passive predictor, which just receives a stream of data and makes a causal model of what’s producing the data—then “F=ma” can only be part of the model. The model will also have to posit particular forces, and particular objects with particular masses.
Suppose the input consists of time series for the positions in three dimensions of hundreds of point objects interacting according to Newtonian gravity. I’m sure you can imagine what a program capable of generating such output looks like; you may even have written such a program. If the predictor does its job, then its model of the data source should also be such a program, but encoded in a form readable by a UTM (or whatever computational system we use).
Such a model would possibly have a subroutine which computed the gravitational force exerted by one object on another, and then another subroutine which computed the change in the position of an object during one timestep, as a function of all the forces acting on it. “F=ma” would be implicit in the second subroutine.
This is correct.
Solomonoff induction accounts for all of the data, which is a binary sequence of sensory-level happenings. In its hypothesis, there would have to be some sub-routine that extracted objects from the sensory data, one that extracted a mass from these objects, et cetera. The actual F=ma part would be a really far abstraction, though it would still be a binary sequence.
Solomonoff induction can’t really deal with counterfactuals in the same way that a typical scientific theory can. That is, we can say, “What if Jupiter was twice as close?” and then calculate it. With Solomonoff induction, we’d have to understand the big binary sequence hypothesis and isolate the high-level parts of the program to use just those to calculate the consequence of counterfactuals.
We can already do MWI vs Collapse without being clear on F=ma. MWI is not even considered because MWI does not output a string that begins with the observed data, i.e. MWI will never be found when doing Solomonoff induction. MWI’s code may be a part of correct code, such as Copenhagen interpretation (which includes MWI’s code). Or something else may be found (my bet is on something else because general relativity). It is this bloody simple.
The irony is, you can rule MWI out with Solomonoff induction without even choosing the machine or having a halting oracle. Note: you can’t rule out existence of many worlds. But MWI simply does not provide the right output.
At this point I am not interested in human logic, I want a calculation of complexity. I want a string (an algorithm) corresponding to F=ma. Then we can build on that.
If F, m and a are true real numbers, its un-computable (can’t code it in TM) and not even considered by Solomonoff induction, so here.
My point was simply that MWI is not a theory that would pass the ‘input begins with string s’ criterion, and therefore, is not even part of the sum at all.
It’s pretty hilarious, the explanation is semi okay, but implied is that Solomonoff induction is awesome, and the topic is hard to think about, so people substitute “awesome” for Solomonoff induction, then all the imagined implications end up “not even wrong”. edit: also someone somehow thinks that Solomonoff induction finds probabilities for theories, while it just assigns 2^-length as probability for software code of such length, which is obviously absurd when applied to anything but brute force generated shortest pieces of code, because we don’t get codes to their minimum lengths (thats uncomputable) and the hypothesis generation process can have e.g. bloat up factor of 10 (plausible) or even a quadratic blow up factor, making the 2^-length prior be much much too strongly discriminating against actual theories in favour of simply having the copy of the data stored inside.
That’s a cop-out, just discretize the relevant variables to deal with integers, Planck units, if you feel like it. The complexity should not depend on the step size.
Well it does depend on the step size. You end up with higher probability for larger discretization constant (and zero probability for no discretization at all), if you use the Turing machine model of computation (and if you use something that does reals you will not face such issue). I’m trying explain that in the ideal limit, it has certain huge shortcomings. The optimality proofs do not imply it is good, they only imply other stuff isn’t ‘everywhere as good and somewhere better’.
The primary use of this sort of thing—highly idealized induction that is uncomputable—is not to do induction but to find limits to induction.
Why are his comments in this thread getting downvoted? They show a quite nuanced understanding of S. I. and raise interesting points.
If there is no requirement for the observed data to be at the start of the string that is output, then the simplest program that explains absolutely everything that is computable is this:
Print random digits. (This was actually a tongue-in-cheek Schmidhuber result from the early 2000s, IIRC. The easiest program whose output will assuredly contain our universe somewhere along the line.)
Luckily there is such a requirement, and I don’t know how MWI could possibly fit into it. This unacknowledged tension has long bugged me, and I’m glad someone else is aware of it.
He identifies subtleties, but doesn’t look very hard to see whether other people could have reasonably supposed that the subtleties resolve in a different way than he thinks they “obviously” do. Then he starts pre-emptively campaigning viciously for contempt for everyone who draws a different conclusion than the one from his analysis. Very trigger-happy.
This needlessly pollutes discussion… that is to say, “needless” in the moral perspective of everyone who doesn’t already believe that most people who first appear wrong by that criterion that way in fact are wrong, and negligently and effectively incorrigibly so, such that there’d be nothing to lose by loosing broadside salvos before the discussion has even really started. (Incidentally, it also disincentivizes the people who could actually explain the alternative treatment of the subtleties from engaging with him, by demonstrating a disinclination to bother to suppose that their position might be reasonble.) This perception of needlessness, together with the usual assumption that he must already be on some level aware of other peoples’ belief in that needlessness but is disregarding that belief, is where most of the negative affect toward him comes from.
Also, his occasional previous lack of concern for solid English grammar didn’t help the picture of him as not really caring about the possibility that the people he was talking to might not deserve the contempt for them that third parties would inevitably come away with the impression that he was signaling.
(I wish LW had more people who were capable of explaining their objections understandably like this, instead of being stuck with a tangle of social intuitions which they aren’t capable of unpacking in any more sophisticated way than by hitting the “retaliate” button.)
Were capable of and bothered to, I suppose. I rarely bother to explain the reasons for my value judgments unless I’m specifically asked, and sometimes not even then. Especially not when it comes to value judgments of random people on the Internet. Low-value Internet interactions are fungible.
See edit on this post . Also, note that this tirade starts from going down to −6 votes (or so) on, you know, perfectly valid issue that Solomonoff induction has with expressing the MWI. You get the timeline backwards. I post something sensible, concise and clear, then I get it down −6 or so, THEN I can’t help it but assume utter lack of understanding. Can’t you see I have my retaliate button too? There’s more downvotes here to pointing out a technical issue normally without flaming anyone. This was early in my posting history (before me disagreeing with anyone massively thats it):
http://lesswrong.com/lw/ai9/how_do_you_notice_when_youre_rationalizing/5y3w
and it is me honestly expressing another angle on the ‘rationalization’, and it was at −7 at it’s lowest! I interpret it as ‘7 people disagree’.
edit: also by someone I tend to refer to ‘unknown persons’, i.e. downvoters of such posts.
I’m sorry; I was referring to what I had perceived as a general pattern, from seeing snippets of discussions involving you while I was lurking off-and-on. The “pre-emptive” was meant to refer to within single exchanges, not to refer all the way back to (in this case) the original discussion about MWI (which I’m still hunting down). Now that I look more closely at your history, this has only been at all frequent within the past few months.
I don’t have any specific recollection of you from before that comment on the “detecting rationalization” post, but looking back through your comment history of that period, I’m mystified by what happened there too. It’s possible that someone thought you were Thomas covertly giving a self-justifying speech about a random red-herring explanation he’d invented to tell himself for why other people disagreed with him, and they wished to discourage him from thinking other people agreed with that explanation.
I responded privately earlier… I really don’t quite know why I still bother posting here. Also btw, there’s other thing: I posted several things that werequite seriously wrong, in the sense of wrong chain of reasoning (not outcome). Those were upvoted and agreed with a fair lot.
Also, on the MWI, I am a believer that as far as we know there can be many worlds out there, and even if quantum mechanics is wrong it is fairly natural for mathematics to work out to many worlds, and it is not like believing in the apple cake in the asteroid belt. I do not dislike the conclusion. I dislike the argument. Ditto for Solomonoff induction and theism, I am an atheist. I tend to be particularly negative towards the arguments that incorrectly argue in favour of what I believe, on the few times that I notice the incorrectness (obviously that got to be much less common than seeing incorrectness in the arguments in favour of what I don’t believe).
p.s. please note that the field is VERY full of impossibility proofs (derived from Halting Problem) which are of the form ‘this does not resolve” (and full of “even if you postulate that this resolves, then something else doesn’t resolve”) and I do not look very hard indeed into the possibility that the associated subtleties resolve. Please also note that the subtlety of ‘the output must begin with data’ is not the kind of subtlety that resolves in any way. I can sometimes be very certain that a subtlety does not resolve; most of the time I am not certain (this field is outside my field of expertise) and I seldom if ever comment on those points (e.g. I refrained from commenting on Solomonoff induction here in the beginning of my post history) and I do not comment negatively on those.
The Solomonoff induction is a very complicated topic and it requires very careful consideration before the speculations. Due to it’s high difficulty the probability of subtleties resolving is considerably less than in the less complicated topics.
The subtleties I first had in mind were the ones that should have (but didn’t) come up in the original earlier dicussion of MWI, having to do with the different numbers of bits in different parts of an observation-predicting program based on a physical theory, and which of those parts should have their bits be charged against the prior or likelihood of the physical theory itself and which of the parts should have their bits be taken for granted as intrinsic parts of the anthropic reasoning that any agent would need to be capable of (even if some physical theories didn’t use part of that anthropic reasoning “software”).
(I didn’t specify what the subtleties were, and you seem to have picked a reading of which subtleties I must have been referring to and what I must have meant by “resolve” that together made what I was saying make no sense at all. This might be another example of the proposed tendency of “not looking very hard to see whether other people could have reasonably supposed” etc. (whether or not other people in the same reference class from your point of view as me weren’t signaling that they understood the point either).)
Well, taking for granted some particular class of bits is very problematic when you have brute force search over values of those bits. You can have a reasonably short program (that can be shorter than physics) which would iterate all theories of physics and run them for a very very long time. Then if you are allowed to just search for the observers, this program will be the simplest theory, and you are effectively back to the square one; you don’t get anything useful (you got solomonoff induction inside solomonoff induction). Sorry I still can’t make sense out of it.
I also don’t see why I should have searched for possible resolutions, and assume that other party has good reason to expect such resolution, if I reasonably believe that such resolution would have a good chance at Fields medal (or even a good reason to expect such resolution would).
I also don’t like speculative conjectures via lack of counter argument as combined with posts like this (and all posts inspired by it’s style), and feel very disinclined to run the highly computationally expensive search for possible solutions by a person that would not have such considerations about the entire majority of scientists that he thinks believe in something other than MWI. (in so much as MWI is part of CI every CI endorser believes in MWI in a way, they just believe that it is invalid to believe in extra worlds that can’t be observed, or other philosophical stance that is a matter of opinion).
edit: that is to say, the stance often expressed on MWI here is normative: if you don’t believe in MWI you are wrong, and not just that, but the scientific community is wrong for not believing in MWI. That is the backgrounder. I do believe many worlds are a possibility, but I do not believe the above argument to be valid. And as you yourself have eloquently explained, one should not proclaim those believing in MWI to be stupid on basis of unresolved problems not being resolved (I do not do that). But this also goes with CI, and I am not the community that is being normative about what is rational and judges by MWI belief status whenever one’s rational. edit2: and I do not see how this strong-pro-MWI stance is compatible with awareness of the subtleties. Furthermore, at the point when you make so many implicit assumptions about subtleties resolving in your favour, you could as well say Occam’s razor; it is not appropriate to use specific term (Solomonoff induction) to refer to something fuzzy.
private_messaging is a troll. Safely assume bad faith.
Wikipedia:
Let’s see:
inflammatory: check
extraneous: sometimes, not in this case
off-topic: not exactly
intent to provoke/disrupt: not in my estimation
so, maybe 25-30% trollness.
I never get this impression from his posts. They seem honest (if sometimes misguided) not malicious to me.
Can you say more about how you distinguish messages intended to provoke emotional response from those that are merely inflammatory?
Intent makes a difference for me. private_messaging seems to want to get his point across (not counting an occasional rant), without regard to the way his comments come across. I did not detect any intent of riling people up for its own sake.
(nods) That’s fair. Thanks for the clarification.
Hmm so you safely assume that I made up a requirement that the observed data be at the start of the string that is output?
I suspect that others downvote private_messaging because of his notoriety. I did downvote his comment because he strayed away from my explicit (estimate the complexity of the Newton’s 2nd law) request and toward a flogged-to-death topic of MWI vs the world. Such a discussion has proven to be unproductive time and again in this forum.
Likewise. (With the caveat that I endorse downvoting extreme cases based on notoriety so probably would have downvoted anyway.)
An interesting point—the algorithm would contain apparent collapses as special instructions even while it did not contain it as general rules.
I think leaving it out as a general rule damages the notion that it’s producing the Copenhagen Interpretation, though.
What is special instructions and what is general rules? The collapse in CI is ‘special instructions’ unless you count confused and straw-man variations of CI. It’s only rules in objective collapse theories.
In that case, Ci is MWI, so why draw the distinction?
Yes, no shit Sherlock, why? I do not know. It’s called ‘interpretations’ for a reason. You tell me. I am not the one drawing distinction with posts like this.
Also, in so much as the codes produce identical output, the Solomonoff induction using agent (like AIXI) behaves the same and it doesn’t matter what interpretation label you slap onto the shortest model in it.
edit: also, in physics it is important to minimize number of principles, even if that doesn’t minimize the predictor code. We want to derive apparent collapse from the de-coherence, even if predictor code is the same. That’s because we do look for understanding, you know, intellectual curiosity and all that.
The same observed that produced Copenhagen and de Broglie-Bohm produced MWI. You acknowledge as much when you state that Copenhagen extends MWI with more axioms. The observation string for MWI is then identical to Copenhagen, and there is no reason to select Copenhagen as preferred.
I think this post is making the mistake of allowing the hypothesis to be non-total. Definition: a total hypothesis explains everything, it’s a universe-predicting machine and equivalent to “the laws of physics”. A non-total hypothesis is like an unspecified total hypothesis with a piece of hypothesis tacked on. Neither what it does, nor any meaningful understanding of its length, can be derived without specifying what it’s to be attached to.
01000110001111010110110101100001
Source
The number of bits required to express something in English is far from the information-theoretic complexity of the thing. “The woman over there is a witch; she did it.”
Depends on your choice of Universal Turing Machine. You can choose a human, which is a valid universal turing machine, and then the kolmogorov complexity is equal.
Hm, good point. I think there might still be a way to save the concept that witches are more complex than electromagnetism, though.
You need a very large overhead for a human. This overhead contains some of the complexity of “witch” but comparatively less of the complexity of “electromagnetism”. So Complexity(human)+Complexity(“witch”, context=human) is an upper bound on the “inherent complexity” of “witch”, and the latter term alone doesn’t mean as much.
But where do we get Complexity(human)?
Four.
Plus a constant.
I think people missed a joke here. I mean, seriously, EY is not so stupid as to think that it is 4 bits literally. And if it is 4 symbols, and symbols are arbitrary size, then it’s not ‘plus a constant’, it’s multiply by a log2(nsymbols in alphabet) plus a constant (suppose I make Turing machine with 8-symbol tape, then on this machine I can compress arbitrary long programs of other machine into third length plus a constant).
No, really. It can be 4 literal bits and a sufficiently arbitrary constant. It’s still a joke and I rather liked it myself.
Yes. I was addressing what I thought might be sensible reason not to like the joke given that F=ma is 4 symbols (so is “four”).
Then my hypothesis is simpler: GOD.
Eliezer: My hypothesis is even simpler: ME!
As long as the GOD is simple and not too knowable. If he knows everything (about the Universe), he is even more complex than the Universe.
The same logic applies to EY’s comment :)