Bayes Slays Goodman’s Grue
This is a first stab at solving Goodman’s famous grue problem. I haven’t seen a post on LW about the grue paradox, and this surprised me since I had figured that if any arguments would be raised against Bayesian LW doctrine, it would be the grue problem. I haven’t looked at many proposed solutions to this paradox, besides some of the basic ones in “The New Problem of Induction”. So, I apologize now if my solution is wildly unoriginal. I am willing to put you through this dear reader because:
I wanted to see how I would fare against this still largely open, devastating, and classic problem, using only the arsenal provided to me by my minimal Bayesian training, and my regular LW reading.
I wanted the first LW article about the grue problem to attack it from a distinctly Lesswrongian aproach without the benefit of hindsight knowledge of the solutions of non-LW philosophy.
And lastly, because, even if this solution has been found before, if it is the right solution, it is to LW’s credit that its students can solve the grue problem with only the use of LW skills and cognitive tools.
I would also like to warn the savvy subjective Bayesian that just because I think that probabilities model frequencies, and that I require frequencies out there in the world, does not mean that I am a frequentest or a realist about probability. I am a formalist with a grain of salt. There are no probabilities anywhere in my view, not even in minds; but the theorems of probability theory when interpreted share a fundamental contour with many important tools of the inquiring mind, including both, the nature of frequency, and the set of rational subjective belief systems. There is nothing more to probability than that system which produces its theorems.
Lastly, I would like to say, that even if I have not succeeded here (which I think I have), there is likely something valuable that can be made from the leftovers of my solution after the onslaught of penetrating critiques that I expect form this community. Solving this problem is essential to LW’s methods, and our arsenal is fit to handle it. If we are going to be taken seriously in the philosophical community as a new movement, we must solve serious problems from academic philosophy, and we must do it in distinctly Lesswrongian ways.
“The first emerald ever observed was green.
The second emerald ever observed was green.
The third emerald ever observed was green.
… etc.
The nth emerald ever observed was green.
(conclusion):
There is a very high probability that a never before observed emerald will be green.”
That is the inference that the grue problem threatens, courtesy of Nelson Goodman. The grue problem starts by defining “grue”:
“An object is grue iff it is first observed before time T, and it is green, or it is first observed after time T, and it is blue.”
So you see that before time T, from the list of premises:
“The first emerald ever observed was green.
The second emerald ever observed was green.
The third emerald ever observed was green.
… etc.
The nth emerald ever observed was green.”
(we will call these the green premises)
it follows that:
“The first emerald ever observed was grue.
The second emerald ever observed was grue.
The third emerald ever observed was grue.
… etc.
The nth emerald ever observed was grue.”
(we will call these the grue premises)
The proposer of the grue problem asks at this point: “So if the green premises are evidence that the next emerald will be green, why aren’t the grue premises evidence for the next emerald being grue?” If an emerald is grue after time T, it is not green. Let’s say that the green premises brings the probability of “A new unobserved emerald is green.” to 99%. In the skeptic’s hypothesis, by symmetry it should also bring the probability of “A new unobserved emerald is grue.” to 99%. But of course after time T, this would mean that the probability of observing a green emerald is 99%, and the probability of not observing a green emerald is at least 99%, since these sentences have no intersection, i.e., they cannot happen together, to find the probability of their disjunction we just add their individual probabilities. This must give us a number at least as big as 198%, which is of course, a contradiction of the Komolgorov axioms. We should not be able to form a statement with a probability greater than one.
This threatens the whole of science, because you cannot simply keep this isolated to emeralds and color. We may think of the emeralds as trials, and green as the value of a random variable. Ultimately, every result of a scientific instrument is a random variable, with a very particular and useful distribution over its values. If we can’t justify inferring probability distributions over random variables based on their previous results, we cannot justify a single bit of natural science. This, of course, says nothing about how it works in practice. We all know it works in practice. “A philosopher is someone who say’s, ‘I know it works in practice, I’m trying to see if it works in principle.’”—Dan Dennett
We may look at an analogous problem. Let’s suppose that there is a table and that there are balls being dropped on this table, and that there is an infinitely thin line drawn perpendicular to the edge of the table somewhere which we are unaware of. The problem is to figure out the probability of the next ball being right of the line given the last results. Our first prediction should be that there is a 50% chance of the ball being right of the line, by symmetry. If we get the result that one ball landed right of the line, by Laplace’s rule of succession we infer that there is a 2/3ds chance that the next ball will be right of the line. After n trials, if every trial gives a positive result, the probability we should assign to the next trial being positive as well is n+1/n +2.
If this line was placed 2/3ds down the table, we should expect that the ratio of rights to lefts should approach 2:1. This gives us a 2/3ds chance of the next ball being a right, and the fraction of Rights out of trials approaches 2/3ds ever more closely as more trials are performed.
Now let us suppose a grue skeptic approaching this situation. He might make up two terms “reft” and “light”. Defined as you would expect, but just in case:
“A ball is reft of the line iff it is right of it before time T when it lands, or if it is left of it after time T when it lands.
A ball is light of the line iff it is left of the line before time T when it lands, or if it is right of the line after time T when it first lands.”
The skeptic would continue:
“Why should we treat the observation of several occurrences of Right, as evidence for ‘The next ball will land on the right.’ and not as evidence for ‘The next ball will land reft of the line.’?”
Things for some reason become perfectly clear at this point for the defender of Bayesian inference, because now we have an easy to imaginable model. Of course, if a ball landing right of the line is evidence for Right, then it cannot possibly be evidence for ~Right; to be evidence for Reft, after time T, is to be evidence for ~Right, because after time T, Reft is logically identical to ~Right; hence it is not evidence for Reft, after time T, for the same reasons it is not evidence for ~Right. Of course, before time T, any evidence for Reft is evidence for Right for analogous reasons.
But now the grue skeptic can say something brilliant, that stops much of what the Bayesian has proposed dead in its tracks:
“Why can’t I just repeat that paragraph back to you and swap every occurrence of ‘right’ with ‘reft’ and ‘left’ with ‘light’, and vice versa? They are perfectly symmetrical in terms of their logical realtions to one another.
If we take ‘reft’ and ‘light’ as primitives, then we have to define ‘right’ and ‘left’ in terms of ‘reft’ and ‘light’ with the use of time intervals.”
What can we possibly reply to this? Can he/she not do this with every argument we propose then? Certainly, the skeptic admits that Bayes, and the contradiction in Right & Reft, after time T, prohibits previous Rights from being evidence of both Right and Reft after time T; where he is challenging us is in choosing Right as the result which it is evidence for, even though “Reft” and “Right” have a completely symmetrical syntactical relationship. There is nothing about the definitions of reft and right which distinguishes them from each other, except their spelling. So is that it? No, this simply means we have to propose an argument that doesn’t rely on purely syntactical reasoning. So that if the skeptic performs the swap on our argument, the resulting argument is no longer sound.
What would happen in this scenario if it were actually set up? I know that seems like a strangely concrete question for a philosophy text, but its answer is a helpful hint. What would happen is that after time T, the behavior of the ratio: ‘Rights:Lefts’ as more trials were added, would proceed as expected, and the behavior of the ratio: ‘Refts:Lights’ would approach the reciprocal of the ratio: ‘Rights:Lefts’. The only way for this to not happen, is for us to have been calling the right side of the table “reft”, or for the line to have moved. We can only figure out where the line is by knowing where the balls landed relative to it; anything we can figure out about where the line is from knowing which balls landed Reft and which ones landed Light, we can only figure out because in knowing this and and time, we can know if the ball landed left or right of the line.
To this I know of no reply which the grue skeptic can make. If he/she say’s the paragraph back to me with the proper words swapped, it is not true, because In the hypothetical where we have a table, a line, and we are calling one side right and another side left, the only way for Refts:Lefts behave as expected as more trials are added is to move the line (if even that), otherwise the ratio of Refts to Lights will approach the reciprocal of Rights to Lefts.
This thin line is analogous to the frequency of emeralds that turn out green out of all the emeralds that get made. This is why we can assume that the line will not move, because that frequency has one precise value, which never changes. Its other important feature is reminding us that even if two terms are syntactically symmetrical, they may have semantic conditions for application which are ignored by the syntactical model, e.g., checking to see which side of the line the ball landed on.
In conclusion:
Every random variable has as a part of it, stored in its definition/code, a frequency distribution over its values. By the fact that somethings happen sometimes, and others happen other times, we know that the world contains random variables, even if they are never fundamental in the source code. Note that “frequency” is not used as a state of partial knowledge, it is a fact about a set and one of its subsets.
The reason that:
“The first emerald ever observed was green.
The second emerald ever observed was green.
The third emerald ever observed was green.
… etc.
The nth emerald ever observed was green.
(conclusion):
There is a very high probability that a never before observed emerald will be green.”
is a valid inference, but the grue equivalent isn’t, is that grue is not a property that the emerald construction sites of our universe deal with. They are blind to the grueness of their emeralds, they only say anything about whether or not the next emerald will be green. It may be that the rule that the emerald construction sites use to get either a green or non-green emerald change at time T, but the frequency of some particular result out of all trials will never change; the line will not move. As long as we know what symbols we are using for what values, observing many green emeralds is evidence that the next one will be grue, as long as it is before time T, every record of an observation of a green emerald is evidence against a grue one after time T. “Grue” changes meanings from green to blue at time T, ’green‴s meaning stays the same since we are using the same physical test to determine green-hood as before; just as we use the same test to tell whether the ball landed right or left. There is no reft in the universe’s source code, and there is no grue. Green is not fundamental in the source code, but green can be reduced to some particular range of quanta states; if you had the universes source code, you couldn’t write grue without first writing green; writing green without knowing a thing about grue would be just as hard as while knowing grue. Having a physical test, or primary condition for applicability, is what privileges green over grue after time T; to have a physical consistent test is the same as to reduce to a specifiable range of physical parameters; the existence of such a test is what prevents the skeptic from performing his/her swaps on our arguments.
Take this more as a brainstorm than as a final solution. It wasn’t originally but it should have been. I’ll write something more organized and consize after I think about the comments more, and make some graphics I’ve designed that make my argument much clearer, even to myself. But keep those comments coming, and tell me if you want specific credit for anything you may have added to my grue toolkit in the comments.
- The New Riddle of Induction: Neutral and Relative Perspectives on Color by 2 Dec 2017 16:15 UTC; 6 points) (
- 26 Nov 2011 6:49 UTC; 2 points) 's comment on Welcome to Less Wrong! (2010-2011) by (
I don’t see any reason such an object is likely to eat me when I’m walking around in the dark.
I don’t see the relevance. Nelson’s problem is about the general validity of inductive inference. Do you have a solution that doesn’t depend upon inductive inferences?
Was this meant to be a response to my other comment? If not, I think one of us is missing the other’s joke, but have no idea which one.
No, it was supposed to be a response to its actual parent. I assumed that you were (somewhat but not entirely) humorously suggesting that the problem can somehow be solved by some appeal to natural selection or the like.
Ah, no, I was simply making reference to the fact that the Zork games from way back when (the first was apparently late-1970′s) would warn you,
… when you wandered into an unlit area.
I have seen something like
in someone’s email signature, and been delighted by it. (Though I worry that part of my delight derives from smugness about getting the joke.)
Nice!
Sorry, my mistake!
No worries at all; I just didn’t want to invest the time trying to figure out how it related to my serious comment if it turned out to be a joke I didn’t get, or vice-versa.
The problem seems trivially easy.
Each observed emerald is evidence for both “the emerald is green” and “the emerald is grue.” The first is preferred because it is vastly simpler (and picking any particular T, of course, is hugely privileging the hypothesis!) Evidence that is equally strong for two propositions doesn’t change their relative likelihoods—so it starts out more likely that the emeralds are green than grue, and it ends more likely that the emeralds are green than grue, but both are quickly more likely than the proposition that emeralds are uniformly red.
What’s weird about this?
To clarify what potato said:
If someone was brought up from birth with the words “grue” and “bleen,” how would they say something was “green,” in their language? Well, they’d have to say that something was grue before, say, 2050, but bleen after. Something that changes from grue to bleen is clearly more complicated to write down than something that just stays grue all the time.
And this is just hiding the complexity, not making it simpler. Complexity isn’t a function of how many words you use, cf. “The lady down the street is a witch; she did it.” If we are writing a program that emits actual features of reality, rather than socially defined labels, the simplest program for green is simpler than the simplest program for grue or bleen. That you can also produce more complex programs that give the same results (defining green in terms of bleen and grue is only one such example) is both trivially true and irrelevant.
Wait, actually, I’d like to come back to this. What programming language are we using? If it’s one where either grue is primitive, or one where there are primitives that make grue easier to write than green, then true seems simpler than green. How do we pick which language we use?
Agreed.
This is a trick of definition only, however. Changing the definition does not cause those things affected by the old definition to conform to the new one.
Obviously they’d have to invent a new word, for an object that emits light that causes certain kind of qualia.
What’s weird, is that without a premise about what “green” and “blue” stand for semantically, the skeptic can just repeat that paragraph back to you, but switch all the occurrences of “grue” and “green”, since “grue” and “green” are logically symmetrical.
They can claim that the grue hypothesis is simpler than the green hypothesis?
If we take “green” and “bleen” as primitives, then it is the definition of “green” which requires the time interval, not grue.
But if we go down to the level of photons, “green” and “blue” don’t require a time interval in their definitions, yet “grue” and “bleen” do.
What do you mean by “primitives”?
It seems to me that the only sensible primitives are photons, which have particular energies. A perception system that has two sets of mappings from energies to names and a clock is necessarily less simple than a perception system that has one mapping from energies to names.
logical primitives, look up logical atomism, take it with a grain of salt.
(from wikipedia) For “green” to be atomic, that suggests it cannot be broken down. Are you suggesting that “green” cannot be broken down to statements about energies of photons?
No, I just mean that (or goodman just means that) if we assume the meanings of grue and bleen, then we have to define green in terms of grue and bleen and a time interval.
But where can I find grue and bleen? If knowledge of them were deleted from my memory, would I reform those concepts?
If you deleted my knowledge of color, but left me my eyes, I could still distinguish between photons of 2.75 eV and photons of 2.3 eV. That’s a difference you can find outside you and that persists.
right, thats the point, to solve the problem, you have to move into semantics.
If you were a confused philosopher then yes, you probably would! It’s definitely part of thought-space that I expect people to rush to fill once they are spending their time thinking of pointless stuff. Hopefully if it was you you would proceed straight to dissolving the question!
You mean grue and bleen?
But… why would we be allowed to do that?
It’s evidence for both.
The solution to the grue problem is a combination of biting the bullet and Occam’s razor
You don’t need Bayes to solve ‘grue’ problems. Merely reductionism.
Explain please
“Goodman’s Grue” just doesn’t seem to be a problem at all. It can only seem like a problem if you forget that ‘grue’ is a name given to a somewhat complex sequence of events (relative to a thing just being a color) and start making mistakes when manipulating the symbol. There just isn’t any reason to suppose there is any ‘threat to the whole of science’ in the first place.
I agree, you are essentially saying that if you forget that green and blue are not simply syntactical binary predicates from first order logic – if you remember that they are semantic concepts, then it is clear that the grue problem is not at all a threat to science. But this is no trivial result, it means that there is a part to the application of Bayes, i.e., induction, which requires the acquisition of semantic concepts. If you fed evidence statements into a bayesian program, it would have to have an understanding of the semantic application of terms like green and grue. So you are right: reducing “green” and “grue” to their semantic/physical tests is the key in my proposed solution. Bayes can’t be enough, obviously, since bayes is a syntactical and axiomatic system.
I guessed what seemed bayesian to me about the whole thing was the analogy to bayse’s table problem, which was the main intuition pump I used to solve the problem. I’ll edit the article to reflect this. Thanks
I think this is incorrect. The actual application of Bayes’ theorem works the same way for each of your theories. What differs is your priors, and that difference sticks around until you have some evidence that’s more likely for one theory than another. If your priors are screwy, then yes, you’ll hold wrong beliefs until you’re given evidence that lets you distinguish between the correct and incorrect beliefs.
Ahh, that makes sense!
The first thing that struck me was the inherently self-contradictory nature of the grue definition. For physical properties to be retroactively alterable seems to contradict fundamental principles of causality and matter.
Am I simply not understanding the topic? Or is my intuitive-conceptualization too influenced by “timeless physics”? (The notion that all moments in time can be stated to ‘exist’).
The most you get with statements about “grue-ness” is that some objects which we observed to be “green” were in fact green but after a specific time (T) all changed to another color. This does not change the fact that they were green in the past.
Science seems perfectly-well suited to handling things that change from one state to another. Radioactive decay, for example. If this is some extra-material transition that occurs… well, I just don’t see how that’s an actually available physical phenomenon. If you change the definition of the term, you are now discussing a new thing.
you are missing the point. nothing changes color. and no definitions are changed, only meanings.
… but meanings are definitions. You can’t change one without changing the other. The terms are synonyms.
Time-based definitions just mean you use one definition before time T and another definition after time T. I am lost as to what the paradox here is.
defenitions point to meanings, but the meaning of a term can only be found by looking at the cognitive machines that use the term, and in that specific contxt as well.
…
definition:
A statement of the exact meaning of a word, esp. in a dictionary.
An exact statement or description of the nature, scope, or meaning of something.
meaning:
What is meant by a word, text, concept, or action
Definitions are meanings. And meanings are definitions.
A ⊃ B & B ⊃ A ⊨ A = B
I remain lost as to where the paradox is supposed to be.
The quotes you give suggest definitions are statements of meaning, not meanings.
… I am not especially aware of there being a functional difference between a “statement of meaning” and the meaning itself when we’re discussing what terms mean.
Anything that is applicable to a definition is applicable to the meaning itself. Any adjustment of the meaning adjusts the defintion. Any adjustment of the definition adjusts the meaning. When you have a direct correlation with bi-directional causality, that is mutual identity.
Have you read the cluster structure of thing space? Or the exponential concept space article? I recommend them.
Yes I have read them, and they are not relevant to the topic of mutual identity between ‘definition’ and ‘meaning*’.
*: s/magic/meaning/. Thanks, Swype!
Would you similarly say that “mortal” is a term with a self-contradictory definition?
“mortal” implies that a thing is capable of transitioning from one state to another. This would be more like trying to create a tearm,
aled
, which those things that have the property of beingaled
where time=-T are alive but where time=+T are dead...And yet no events are to have occured at time=T. This is a material contradiction. Even worse; another possible meaning would be that things which are
aled
where time=-T are alive where time=n; and, simultaneously, things which arealed
where time=+T are dead where time=n.And yet the same group of objects are to have the characteristic of being
aled
-- yet, again, no events are to have occurred.This is paradoxical, certainly—but only because the definition contradicts itself.
grue
andbleen
require thatA=¬A
; a false assertion. (EDIT: Apparently I need to explain that the paradox I mention here IS NOT the Goodman’s Grue Paradox.)Contrastingly, those things which are “mortal” are defined as having having a unique time=d, where time=-d they are alive; where time=+d they are dead; and where time=d they transition states.
Can you clarify what prevents a grue/bleen categorizer from saying that when a grue object changes the frequency of light that it reflects (which is something every grue object does at one time or another), that’s an event?
That’s just it—to be “grue” its color must never change from the color “grue”. “Grue” is a color. Not two colors—just one. If this confuses you—it should. It’s a self-contradicting definition. “Grue” is a violation of the Law of Identity.
I agree that its color never changes from the color grue, nor did I suggest that it did. I’m still curious about the answer to my question (which was about light frequencies, not colors), though.
Please pick one: are we discussing “grue/bleen categorizors” or are we discussing “light frequencies”? Because the topics are mutually exclusive.
I am talking about an observer who experiences certain colors in response to certain patterns of light frequencies over time. So, both.
If that’s a contradiction in terms, then I’m likely too confused to contribute usefully to further discussion.
Here’s the thing: “grue” and “bleen” are each only one color. For this to be explicable to your understanding, “grue” would have to have one and only one light frequency for time=n.
The problem of course is that this light frequency is apparently the same as “green” at time=-T, but the same as “blue” at time=+T.
The very definition of “grue” is such that
A=¬A
.Should we introduce to the “grue-ites” the notion that objects can change color—then they would be incapable of maintaining their belief in “grue”. Essentially; this entire conversation is predicated upon “grue-ites” existing in a universe without light-frequencies but still possessing color. This is, of course, explicitly contradictory: colors are frequencies of light (as experienced by observers).
Three things.
1) If the definition of “grue” is such that the light reflected by a “grue” object is the same frequency at all times, and further that the observer’s eyes don’t change and more generally that nothing in the world changes, then I agree with you that “grue” as defined is a contradictory idea, for essentially the reasons you cite.
2) In the real world, “green” is not associated with one and only one light frequency. There are lots of light frequencies that would cause me to experience a sensation I’d label “green”. Indeed, I am seeing several dozen of those frequencies as I write this.
3) In the real world, there is no light frequency associated with “green” and only “green”. There are lots of situations that will cause me to experience an object reflecting a single light frequency as different colors.
For example: change the background color or design an elaborate pattern in a picture. There are some freaky things our brains can do (and can be tricked into doing).
(nods) Or just shine a red light in my eyes for a while, then turn it off.
It’s not even particularly freaky, it’s just that we’re accustomed to treating certain aspects of our perceived environment as primitive atoms of perception when in fact they are the outputs of complicated heuristics that aren’t perfectly calibrated for consistency.
Freaky as in an incredible feat on the behalf of our brain to be able to reconstruct 3d images from subtle clues like relative shading. Giving deceptive input to make it perceive color incorrectly is just a harmless side effect.
Ah. Yes.
As optimizing systems go, we leave a lot to be desired, but for self-organizing soup we’re pretty impressive.
Under naively realistic conditions this is a non-issue. If one takes a statistically relevant sampling size of humans at random, and asks them their assessment of the color, they will agree based on its light frequency as the sole understandably relevant qualifier.
Frequencies are identified as a spectrum, not as a single point. Furthermore—you can differentiate from one variety to another. But an objecct which is one variety of green (that occupies one specific point) does not become another variety of green without undergoing a transition event.
There’s a reason why the paradox’s definition completely fails to delve into the physical technicalities of the claim.
This is why I was able to rephrase it with the
aled
concept.Err… no? Sometimes they are identified as a spectrum and sometimes as a single point.
wedrifid is pointing to the destination of the road that I’m going down in the comment branch over here, I think.
The Wikipedia page for this problem is here: http://en.wikipedia.org/wiki/Grue_and_bleen
Nitpick: Emeralds are a bad example. An “emerald” is just green beryl—a blue instance of the same mineral is just a blue piece of beryl. They exist, but they aren’t emeralds.
Philosophy of Science textbooks mention that fact. Goodman chose a bad example and now we must all pay the price.
The original problem, as stated, is “valid”: a mind with a “grue”-like prior would make the grue prediction, while normal human minds (with a “green”-like prior, mostly as a result of our evolution around colors) would make the “green” prediction. If we want a more neutral prior, we go with “minimum message length”, and “what are colors”. Grue and green are words in a dictionary, so they do not count for math—only Turing machines do. It’s simpler to write a Turing machine which puts out “light at XXXhz, light at XXXhz” then one that takes time T into account. Therefore, the green prior is more in-line with an MML-prior mind. We take MML priors as most compatible with human-like reasoning.
This seems problematic because it implies that humans would be perfectly fine with accepting grue over blue if they didn’t know about the nature of light.
Fortunately, the reason this helps is deeper than counting the number of hertz. When you want to determine the complexity of a term, you have to specify what language to use to write the term. The reason grue seems complicated to us evolved animals is because it has higher complexity in the language of our observations—the language of what neurons we feel light up when we look at the rock.
So does that mean that if an entity had a neuronal structure that intuited grue and bleen it would be justified in treating the hypothesis that way? I’d be willing to bite that bullet I think.
It means that that entity’s evolved instincts would be out-of-whack with the MML, so if that entity also got to the point where it invented Turing machines, it would see the flaw in its reasoning. This is no different than realizing that Maxwell’s equations, though they look more complicated than “anger” to a human, are actually simpler. Sometimes, the intuition is wrong. In the blue/grue case, human intuition happens to not be wrong, but a hypothetical entity is—and both humans and the entity, after understanding math and computer science, would agree that humans are wrong about anger, and hypothetical entities are wrong about grue. Why is that a problem?
Right, they would, if for weird historical reasons they also thought of “grue” and “bleen” as reasonable linguistic primitives. So the human scientists would be surprised when the next emerald turned out to be bleen rather than grue, and they’d be able to observe that the shift happened at time T, and thus observe that green is a natural property. So this isn’t really much of a problem.
That’s not completely satisfying in that one wants an induction scheme that assigns priors independent of linguistic accident. If one tries to make hypotheses simplicity depend on language then one quickly gets very complicated hypotheses being labeled as simple (e.g. “God”).
Well, it is if you use hz. However, I prefer hz’. hz’ are just like hz until time T, but then different in the appropriate way after time T.
If grue-people expect the green emeralds to spontaneously change into blue emeralds, why shouldn’t they also expect a simple green-detecting turing machine to spontaneously change into a blue-detecting turing machine and vice versa? Yes, a Turing machine is a mathematical construction; it does not spontaneously change. But they, using “grue” as a basic concept, would expect everything that even remotely depends on colours to change at a certain time, including physical approximations to Turing machines.
“To this I know of no reply which the grue skeptic can make, if he/she say’s the paragraph back to me with the proper words swapped, it is not true, because In the hypothetical where we have a table, a line, and we are calling one side right and another side left, the only way for Refts:Lefts behave as expected as more trials are added is to move the line (if even that), otherwise the ratio of Refts to Lights will approach the reciprocal of Rights to Lefts. ”
He can simply define the term “line” to imply that it flips directions at time t.
This paradox seems to be equivalent to talking about the programming language that the K-complexity of something uses. For example, in any realistic programming language, it would be easier to define MWI than the Copenhagen interpretation of quantum mechanics, since the latter involves all the laws of the former and then some, but what if you use a language that, once MWI is defined, assumes waveform collapse and such unless told otherwise? You can construct a language to match any given prior, and while any two such languages and priors will converge in the limit, you can’t say which is right for a finite case.
“Oh yeah? Well I’m going to go hang out in the dark while doomed. You’ll see!”
I recommend editing this post to have shorter paragraphs.
This seems hard t me, but i agree. Do you have any pointers? (edit): I tried.
There’s your error! You think that the line is in the middle of the table through the entire experiment, but actually it’s in the riddle of the table, where “riddle” means “in the middle of the table before time T and on the right side of the table afterward.” All of our experience before time T has confirmed this.
He never said where it was, the problem was to find where the line was on the table.
A better objection would be to ask whether the line chooved, where ‘chooved’ means ‘stayed in the same place before time T and moved to the opposite location afterward’.
To be honest, I don’t have a clear sense of what he’s saying. However, from a snippet like this:
it sounds like he’s trying to draw some conclusion from an assumption (“the line doesn’t/won’t move”) that ultimately rests on inductive support. Is that not the case? If so, how does that supposed support not fall victim to the new problem of induction?
It doesn’t seem to me like the problem is to justify induction, but to justify the induction on green over grue after time T.
The problem is to justify any inductively-obtained statement vulnerable to a grue-like variant. “X will remain in the same place” is one such statement. (Namely: Any evidence that X will remain in a given place is prima facie evidence that it will same in the same place’, where place’ refers to its current location at T and some other location afterward.) Grue is just an example.
That the line will stay in the same place is not something I induce, it is a premise in the hypothetical. The line, or really the area of right of the line on the table, represents the actual frequency with which an emerald turns out green, out of all the cases where an emerald is observed, this is certainly a non-moving line, since there is one and only one answer to that question.
But that’s question-begging. Let me put this another way. Define the function reft-distance(x) = x’s distance to the rightmost edge of the table before time T, or the distance to the leftmost edge of the table after time T. (Then “x is reft of y” is definable as reft-distance(x) < reft-distance(y). Similarly for the function light-distance(x).) Assuming the line doesn’t move is equivalent to assuming that the line’s right-distance remains constant, but that its reft-distance changes after T. But that’s not a fair assumption, the skeptic will insist: he prefers to assume the line doesn’t “anti-move,” which means its reft-distance remains constant but its right-distance changes.
If we’re simply stipulating that your assumption (that the line doesn’t move) is correct and the skeptic’s assumption (that the line doesn’t anti-move) is incorrect, that’s not very useful. We might as well just stipulate that emeralds remain green for all time or whatever.
you forgot to adress this part:
The line is constant because the area to its right represents the frequency with which a certain result is observed out of the number of trials. What the skeptic would have to be assuming is that the first 98 balls just happened to fall on the first 100th of the table by chance.
Assuming that the line is constant is analogous to assuming that emeralds’ color won’t change after T, correct? The skeptic will refuse to do either of these, preferring instead to assume that the line is anti-constant and that emeralds’ anti-color won’t change after T.
No, that’s a common misunderstanding. No emerald ever has to change color for the grue hypothesis to be true
It is analogous to assuming that there is a definite frequency of green emeralds out of emeralds ever made.
Well, O.K. “The next observed emerald is green if before T and blue otherwise” doesn’t entail any change of color. I suppose I should have said, “Analogous to assuming that the emeralds’ color (as opposed to anti-color) distribution doesn’t vary before and after T.”
I’m really not seeing that analogy. It seems more analogous to assuming there’s a single, time-independent probability of observing a green emerald. (Holding the line fixed means there’s a single, time-independent probability of landing right of the line.) Which is again an assumption the skeptic would deny, preferring instead the existence of a single, time-invariant probability of observing a grue emerald.
Correct, but my solution rests around there being a semantic method for testing greenness. This is what breaks the symmetry which the skeptic was abusing. Because the test stays the same the meaning of green stays the same.
I don’t think I really understand what this means. Could you give more detail?
Read my conclusion over, I made some edits, if you still don’t understand comment and i’ll explain.
I’m not sure I’ve understood that very well, either. From what I can gather, it seems like you’re arguing that 1. the meaning and physical tests for grue change over time, and consequently 2. grue is a more complicated property than green is, so we’re justified in privileging the green hypothesis. If that’s so, then I no longer see what role the reft/light example plays in your argument. You could’ve just started and finished with that.
yea, the reft light argument is just what made it obvious to me, i though it might help my readers to.
All right. Regarding the idea that the meaning of “grue” changes over time—how do you take this to be the case? What do you mean by “meaning” here? Intension, extension or what?
The common physical test, of using your eyes. The result from your eyes, and instruments which pick up the same sort of optical information of your eyes are the test for the test for application of green. This is how we learn green. This definition of green is semantic. Theses instrument’s results are the primary meaning of green, how your brain decides whether to use the term. They are semantic because their usage must refer to the outside world
It seems that the assumption in your hypothetical is of an unchanging process producing the random variable, about which we have partial knowledge. In the case of the ball, we know of the unmoving invisible line, the throws uniformly distributed over the table, and whatever mechanism it is that lets us know whether the ball has fallen to the left or the right of the line. However, we don’t know enough to know exactly where the balls will land. In the case of the emeralds, we know enough about the emerald construction sites to know that they are grue-blind, and that they will stay grue-blind no matter how many emeralds they produce. In both cases, we know something of the mechanism behind the random variable, and that it will not change. Is that correct?
You talk of a threat to the whole of science. How does your answer to this hypothetical respond to that threat? Do scientists ever have the knowledge assumed in your hypotheticals? How can scientists gain that knowledge in the first place without getting grued up, if they need it that knowledge to stay gruefree? It reminds me of Bugs Bunny pulling himself out of a magicians hat, by holding his ears.
It is not, my assumption is of a definite frequency with which some result comes, out of trials.
When you realize that the reason you don’t determine the meaning of green using grue and bleen because there is a physical test which has higher authority in defining greenhood, the threat disolves.
By “frequency” I suppose you mean the fraction of balls dropped on the right out of all ball drops, past and future? And with emeralds… I guess you mean the fraction of green emeralds out of all emeralds that hbe been or will be observed?
I suppose the physical test in the ball problem is the ball landing on one side or the other of the line. In the emerald problem, the physical test is, what is it?
Solomonoff Induction is a formalized answer to problems of inference which also applies to the grue problem. It basically just says to weigh all possible explanations that fit your data by their complexity, but it is specified mathematically. Since grue is more complex than green, it weighs green much higher until reason to believe in grue shows up.
This is slightly off topic though, because the key is reducing the items you’re talking about to what they are made up of so that you can properly encode them in order to compare the complexity. As said here, it just takes reductionism.
Solomonoff induction involves defining complexity. Green and blue aren’t the most basic possible things, so you can’t straight up stick in grue and bleen, but you still can come up with some language where grue and bleen are just as easy to define as blue and green are in whatever we’d be likely to use. All Solomonoff induction can really do is specify that the probabilities must add up to 100%.
Let’s do actual Bayesian math on this problem. Let Green_n be “the green premises 1 through n”, and so on.
Pr( An emerald is grue | Emerald is green, it is before time T ) = ~1.
Pr( An emerald is grue | Emerald is green, it is after time T ) = ~0.
Pr( An emerald is grue | Emerald is blue, it is before time T ) = ~0.
Pr( An emerald is grue | Emerald is blue, it is after time T ) = ~1.
These are our grue axioms—the probabilistic representation of “grue iff green before time T or blue after time T”.
Pr( New Emerald is green | Green_n ) = 0.99 (this is our first sentence axiom)
Pr( New Emerald is grue | Emerald is green ) = undefined. We need to know if we are pre-T or post-T. Without the prior probability for being pre-T (from which we can derive its complement, post-T, or vice versa).
But that is wussing out; Bayesian agents should always be able to assign some level of credence. Assume maximum ignorance about T: it is equally likely to be pre-T or post-T.
We can find Pr( New Emerald is grue | Emerald is green ) by finding Pr( New Emerald is grue | Emerald is green, pre-T ) and adding it to Pr( New Emerald is grue | Emerald is green, post-T ) then normalising.
The pre-T case: Pr( New Emerald is grue | Green_n, pre-T ) = Pr( emerald is grue | emerald is green, pre-T ) Pr( emerald is green). We know that Pr( emerald is green ) is 0.99. We have Pr( emerald is grue | emerald is green, pre-T ) as an axiom of ~1 above. So 0.99 ~1 = ~0.99.
The post-T case: Pr( New Emerald is grue | Green_n, post-T ) = Pr( emerald is grue | emerald is green, post-T ) Pr( emerald is green). We know that Pr( emerald is green ) is 0.99. We have Pr( emerald is grue | emerald is green, post-T ) as an axiom of ~0 above. So 0.99 ~0 = ~0.
Normalising gives us: Pr( New Emerald is grue | Emerald is green ) = ~.495.
This is for the case where we don’t know if pre-T or post-T. When you say
we can ask a better question than “Pr( New Emerald is grue | Emerald is green ) ?”. We can ask Pr( An emerald is grue | Emerald is green, it is after time T ). This is an axiom from before! We now know it’s ~0, which resolves the problem: the probabilities sum to 0.99 + epsilon, which is below 1, which conserves the Kolmogorov axioms.
Whence the problem? The error creeps in when you use the pre-T case to get one .99, and then you use the complement of the post-T case to get another .99, and then add them together. If you specify pre-T or post-T, but then swap T in calculating some of the posterior probabilities, of course you can violate probability theory! You’re already violating it by varying the state of T within the calculation!
If I am not mistaken, this is my independent formulation of the formal Bayesian resolution of the Goodman Grue paradox.
Note that this question was first put forward in 1955, so that it was a purely hypothetical question until 1 January 2000, when sapphires were discovered to be grue. (Before and after images of the same gem.)
The case makes an interesting parallel to the term “black swan”, another famous philosophical thought experiment that received unexpected data.
What changed? Are you telling a joke and those are pictures of different gems? Or is one in a different kind of light, or at a different angle? I don’t get it.
Edit: Or are you talking about what Alicorn was talking about farther down?
I’m telling a joke.
One would suspect that the emerald-producing locations in our universe do not behave quite as cleanly as mathematically as you describe them. Instead, fuzziness and messiness creep in. Maybe such sites degrade over time, causing the emeralds to be slightly bluer. Maybe not.
Broad principles like “green earlier implies green now” are approximations that allow us to simplify the complexity of actual, extremely difficult Bayesian inference.
So… your Bayesian answer to the grue problem is to become a frequentist? You’re doing it wrong.
As has been pointed out to you, “grue” is a description of a perfectly consistent prior on observations. The reason that “green” is preferable is its simplicity (in terms of basic predictions of physical events) and specificity (i.e. if T is unspecified, then the “green” hypothesis makes more specific predictions than “grue”, while if it is specified, then the complexity of the number T comes into play).
I still think that these “devastating” problems have been solved in the first chapters of Jaynes’ book.
Because the first green emeralds are no evidence that the next will be green.
Let’s translate the problem differently: I write a program that shows colored dot on the screen. The first n dot are green. What is the probability that the next dot will be green? If those are your only information, you simply cannot tell. By the principle of indifference, the probability is 1/(the number of possible colors represented on the screen). I could have coded the program with a counter so that it shows only pink dots after the billionth green dot, and you would never know.
If seeing n green emerald makes you raise the probability that the next emerald is green, you are making very specific and yet hidden assumptions.
Let’s add for example a different information: you are extracting emeralds from an urn that contains 99 green emerald and only one blue emerald. Then the probability that the next emerald is green, after having seen n<99 emerald is lower, not higher.
That said, let’s talk about real emerald. In this case, we know that we are extracting emerald from an urn (the Earth) that by the time of the execution of the experiment, contains only N emerald. We know something about the process of production of the emeralds, and we also know that the process that produces blue emerald is extremely unlikely.
What does the green hypotheis say? It assumes that the blue emerald process didn’t happen. So it is correct to say that the next emerald will be green (with probability one, however. It doesn’t increase).
What does the grue hypothesis say? That the blue process did infact happened, so whenever we observe a green emerald the probability of the next being blue (and so the probability of it being grue) increases (and similarly, the probability of it being green decreases accordingly).
The paradox then doesn’t happen because green and grue have very distinct implicit assumptions.
This is seriously wrong.
‘I haven’t seen a post on LW about the grue paradox, and this surprised me since I had figured that if any arguments would be raised against Bayesian LW doctrine, it would be the grue problem.’:
If of relevance, note http://lesswrong.com/lw/q8/many_worlds_one_best_guess/ .
I think I came up with a solution:
to date, the vast majority of grue-like hypotheses (hypotheses that suggest new items that have always been grue before time t will continue to be found grue after time t) has failed. inductive logic, then, doesn’t suggest that because emeralds have been grue to date, they will continue to be grue after time t. so far, after every time t, that’s not been the case.
If it’s unclear what I mean when I say grue-like hypotheses have failed, let me word it better: if time t was 1975, then the hypothesis that emeralds found after time t will be grue was incorrect. same for 1976. same for 1977. etc etc. An infinite, or at least incredibly large number, of grue-like hypotheses, then, has failed, so inductive logic doesn’t tell us to predict that emeralds found after time t will be grue. Inductive logic, to the contrary, tells us that once time t comes about, new emeralds found will be bleen.
Sorry for the sloppy wording, I hope you brilliant fellows will read my post for the idea within, and not for some sloppy wording to nitpick. Some of you guys are very good at that.
Let’s forget, for a moment, that the position of the invisible line reflects the long-run frequency of “right” and “left” results. (you say that it reflects the proportion of green emeralds among existing emeralds, and results of “right” are analogous to results of “green”, so.)
In the ball problem, there is an invisible line on a table. More balls falling on the right implies that the area on the right side of the line is larger, and thus that future ball drops are more likely to fall on the right side.
Or maybe it’s evidence that they’ll fall on the reft side. “The reft side of the table” refers to the right side, before time T, and refers to the left side, after time T. Slightly different from the response of the grue skeptic in your article.
It’s not exactly analogous to grue, since at time T, either the areas to the right and left of the line change, or the areas to the reft and light of the line change. Whereas, if all emeralds are grue, that doesn’t require any emeralds to change color.
Do areas to the right and left of lines stay constant, or do areas to the reft and light of lines stay constant? My own experience with lines on tables is ambiguous in this regard, as it is not yet time T.
Notice I have not mentioned movement. You’ve been describing “the area to the right and left of the line changing” as “movement”; it’s a term that treats right/left and reft/light asymmetrically and thus a potential source of confusion.
Now back to your hypothetical.
You have assumed in your hypothetical that the position of the line reflects the frequency of right and left results. That implies that if you knew the position of the line, your probability for the next result being “right” would be equal to the frequency of right results among ball drops. It also implies that the area to the right of the line stays constant. It also implies that the frequency of “right” results before time T is likely to be reflective of the frequency of “right” results after time T.
It seems to me that you have assumed that which is to be proven.
When we evaluate a term’s complexity, we must use some language to evaluate it in. If we use standard english, green is simpler, while if we use grue english, grue is simpler. But is there a unique choice language?
Well, if we want to describe the world, the symmetry is broken by the fact that we can observe the world—our unique “language” is our observations of the natural world—which color is simpler when describing the neurons in our visual cortex, if you will.
When we describe reality in terms of the language of our
That is the fundamental point, I agree. The overall description of our world includes our description of ourselves. It is here that the simplicity advantage of blue / green resides.
What does ‘first observed’ mean? It seems like the sort of thing that someone with a passing knowledge of quantum mechanics would make up, giving a privileged status to conscious observers.
Apart from this objection, I see both in the post and in some of the comments a confusion about the meaning of ‘grue’. Take again the definition:
Notice that no object ever changes colour. A green object, first observed before time T, is still grue provided it remains green until the end of time; the definition only refers to the colour at the time of first observation, not at the current moment. To say that an object is grue at time T-1 is not to predict that it will turn blue at time T; it is a prediction that it will remain green for all time.
Edit: Never mind the rest of the comment, I made a silly mistake.
I didn’t mean to sound harsh at all btw; I wouldn’t want to discourage anyone from making mistakes publicly on LW, which is a great place to have mistakes corrected.
A successful prediction does not weaken a hypothesis.
Also, your argument works just as well for G as for g; therefore, a green emerald is evidence against emeralds being green and against emeralds being grue.
You made an arithmetic mistake. I figured you might want to try and find it yourself, and reasoned: if you do want to be told, you can just ask, but if I had assumed you wanted to be told and was wrong, I couldn’t untell you.
The assumption that P(O) is P(G) + P(g) is also incorrect; there is also the hypothesis that half the emeralds are green, for example. But either way you shouldn’t end up with P(g|O) < P(g).
It can weaken it relative to a competing hypothesis.
Actually it is unsolvable in Bayesian framework, and the only honest answer would be to admit it.
Bayesianism gives you consistency, but it doesn’t anchor you to reality in any way. Assignment of probabilities that prefers green, and assignment of probabilities that prefers grue are both equally consistent.
Many people on lesswrong have been trying to handwave the problem away with Kolmogorov Complexity, but if you check real math, then you’ll see that for any finite amount of data it solves exactly nothing—two different computational models have finite difference in probability assignment, but this finite difference is unbounded, and for any computational model you can find another that’s arbitrarily far away from it.
No finite amount of data will cause non-negligible amount of convergence between models, since their differences are unbounded times greater than informational contents of that information.
At some point you’ll have to admit that green and grue versions are equally consistent with data and all logical a priori reasons, and it’s just your personal (or societal or whatever) preference to accept green over grue.
PS. This is all completely unrelated to the second big issue with Bayesianism that you only get consistency over infinite models by breaking Gödel’s incompleteness theorems—and every theory where you’re not allowed to say “I don’t know” without assigning specific probability number to it shares this problem. Between these two problems I see Bayesianism as an useful tool, not as any deeper theory of reality.
“If you’re insane enough, and have unreasonable enough priors, even Bayesianism won’t save you,” is an argument against insanity and unreasonableness, not against Bayesianism.
Bayesianism only attempts to give you consistency, different grue-Bayesians would see green-Bayesian as “insane and unreasonable”, just as green-Bayesians would see grue-Bayesians.
They’re both just as consistent, and nothing about their systems of beliefs is internally different.
If you want to solve green/grue problem, Bayesianism won’t hurt your attempts but neither will it help you in any way.
Yea, what really helped about the bayesian analogy to the table, line, ball thing, was remembering that there was a physical bases for right, but that reft did not have a physical basis in the same way. The same goes for grue.
I completely agree that if you want to understand the reason for the use of grue over the use of green in the conclusion, you need to use more than the syntactical definitions of the terms. Bayes is of course syntactical. You have to look at the semantic meanings of the terms, their test for applicability.
How does property “it is grue for until some point, then it becomes bleen” have more physical basis than property “it’s grue all along”? What you’re saying makes no sense (...to a grue-ist).
If I wrote a program to find things that were green before time t, and things that were blue after time t, I owuld not save any time on the programing by making it just look for grue. Grue could not be coherently defined without committing to observers, but green could be defined (even if very complicatedly) without reference to observers, and thus we can be realists about it. I am a realist about green, and not about grue. THis makes sense since grue requires observers in its definition.
Meanwhile, in a parallel universe, grue-potato wrote this, and grue-taw is trying to make him see that green is just as consistent.