This is a first stab at solving Goodman’s famous grue problem. I haven’t seen a post on LW about the grue paradox, and this surprised me since I had figured that if any arguments would be raised against Bayesian LW doctrine, it would be the grue problem. I haven’t looked at many proposed solutions to this paradox, besides some of the basic ones in “The New Problem of Induction”. So, I apologize now if my solution is wildly unoriginal. I am willing to put you through this dear reader because:
I wanted to see how I would fare against this still largely open, devastating, and classic problem, using only the arsenal provided to me by my minimal Bayesian training, and my regular LW reading.
I wanted the first LW article about the grue problem to attack it from a distinctly Lesswrongian aproach without the benefit of hindsight knowledge of the solutions of non-LW philosophy.
And lastly, because, even if this solution has been found before, if it is the right solution, it is to LW’s credit that its students can solve the grue problem with only the use of LW skills and cognitive tools.
I would also like to warn the savvy subjective Bayesian that just because I think that probabilities model frequencies, and that I require frequencies out there in the world, does not mean that I am a frequentest or a realist about probability. I am a formalist with a grain of salt. There are no probabilities anywhere in my view, not even in minds; but the theorems of probability theory when interpreted share a fundamental contour with many important tools of the inquiring mind, including both, the nature of frequency, and the set of rational subjective belief systems. There is nothing more to probability than that system which produces its theorems.
Lastly, I would like to say, that even if I have not succeeded here (which I think I have), there is likely something valuable that can be made from the leftovers of my solution after the onslaught of penetrating critiques that I expect form this community. Solving this problem is essential to LW’s methods, and our arsenal is fit to handle it. If we are going to be taken seriously in the philosophical community as a new movement, we must solve serious problems from academic philosophy, and we must do it in distinctly Lesswrongian ways.
“The first emerald ever observed was green. The second emerald ever observed was green. The third emerald ever observed was green. … etc. The nth emerald ever observed was green. (conclusion): There is a very high probability that a never before observed emerald will be green.”
That is the inference that the grue problem threatens, courtesy of Nelson Goodman. The grue problem starts by defining “grue”:
“An object is grue iff it is first observed before time T, and it is green, or it is first observed after time T, and it is blue.”
So you see that before time T, from the list of premises:
“The first emerald ever observed was green. The second emerald ever observed was green. The third emerald ever observed was green. … etc. The nth emerald ever observed was green.” (we will call these the green premises)
it follows that:
“The first emerald ever observed was grue. The second emerald ever observed was grue. The third emerald ever observed was grue. … etc. The nth emerald ever observed was grue.” (we will call these the grue premises)
The proposer of the grue problem asks at this point: “So if the green premises are evidence that the next emerald will be green, why aren’t the grue premises evidence for the next emerald being grue?” If an emerald is grue after time T, it is not green. Let’s say that the green premises brings the probability of “A new unobserved emerald is green.” to 99%. In the skeptic’s hypothesis, by symmetry it should also bring the probability of “A new unobserved emerald is grue.” to 99%. But of course after time T, this would mean that the probability of observing a green emerald is 99%, and the probability of not observing a green emerald is at least 99%, since these sentences have no intersection, i.e., they cannot happen together, to find the probability of their disjunction we just add their individual probabilities. This must give us a number at least as big as 198%, which is of course, a contradiction of the Komolgorov axioms. We should not be able to form a statement with a probability greater than one.
This threatens the whole of science, because you cannot simply keep this isolated to emeralds and color. We may think of the emeralds as trials, and green as the value of a random variable. Ultimately, every result of a scientific instrument is a random variable, with a very particular and useful distribution over its values. If we can’t justify inferring probability distributions over random variables based on their previous results, we cannot justify a single bit of natural science. This, of course, says nothing about how it works in practice. We all know it works in practice. “A philosopher is someone who say’s, ‘I know it works in practice, I’m trying to see if it works in principle.’”—Dan Dennett
We may look at an analogous problem. Let’s suppose that there is a table and that there are balls being dropped on this table, and that there is an infinitely thin line drawn perpendicular to the edge of the table somewhere which we are unaware of. The problem is to figure out the probability of the next ball being right of the line given the last results. Our first prediction should be that there is a 50% chance of the ball being right of the line, by symmetry. If we get the result that one ball landed right of the line, by Laplace’s rule of succession we infer that there is a 2/3ds chance that the next ball will be right of the line. After n trials, if every trial gives a positive result, the probability we should assign to the next trial being positive as well is n+1/n +2.
If this line was placed 2/3ds down the table, we should expect that the ratio of rights to lefts should approach 2:1. This gives us a 2/3ds chance of the next ball being a right, and the fraction of Rights out of trials approaches 2/3ds ever more closely as more trials are performed.
Now let us suppose a grue skeptic approaching this situation. He might make up two terms “reft” and “light”. Defined as you would expect, but just in case:
“A ball is reft of the line iff it is right of it before time T when it lands, or if it is left of it after time T when it lands. A ball is light of the line iff it is left of the line before time T when it lands, or if it is right of the line after time T when it first lands.”
The skeptic would continue:
“Why should we treat the observation of several occurrences of Right, as evidence for ‘The next ball will land on the right.’ and not as evidence for ‘The next ball will land reft of the line.’?”
Things for some reason become perfectly clear at this point for the defender of Bayesian inference, because now we have an easy to imaginable model. Of course, if a ball landing right of the line is evidence for Right, then it cannot possibly be evidence for ~Right; to be evidence for Reft, after time T, is to be evidence for ~Right, because after time T, Reft is logically identical to ~Right; hence it is not evidence for Reft, after time T, for the same reasons it is not evidence for ~Right. Of course, before time T, any evidence for Reft is evidence for Right for analogous reasons.
But now the grue skeptic can say something brilliant, that stops much of what the Bayesian has proposed dead in its tracks:
“Why can’t I just repeat that paragraph back to you and swap every occurrence of ‘right’ with ‘reft’ and ‘left’ with ‘light’, and vice versa? They are perfectly symmetrical in terms of their logical realtions to one another. If we take ‘reft’ and ‘light’ as primitives, then we have to define ‘right’ and ‘left’ in terms of ‘reft’ and ‘light’ with the use of time intervals.”
What can we possibly reply to this? Can he/she not do this with every argument we propose then? Certainly, the skeptic admits that Bayes, and the contradiction in Right & Reft, after time T, prohibits previous Rights from being evidence of both Right and Reft after time T; where he is challenging us is in choosing Right as the result which it is evidence for, even though “Reft” and “Right” have a completely symmetrical syntactical relationship. There is nothing about the definitions of reft and right which distinguishes them from each other, except their spelling. So is that it? No, this simply means we have to propose an argument that doesn’t rely on purely syntactical reasoning. So that if the skeptic performs the swap on our argument, the resulting argument is no longer sound.
What would happen in this scenario if it were actually set up? I know that seems like a strangely concrete question for a philosophy text, but its answer is a helpful hint. What would happen is that after time T, the behavior of the ratio: ‘Rights:Lefts’ as more trials were added, would proceed as expected, and the behavior of the ratio: ‘Refts:Lights’ would approach the reciprocal of the ratio: ‘Rights:Lefts’. The only way for this to not happen, is for us to have been calling the right side of the table “reft”, or for the line to have moved. We can only figure out where the line is by knowing where the balls landed relative to it; anything we can figure out about where the line is from knowing which balls landed Reft and which ones landed Light, we can only figure out because in knowing this and and time, we can know if the ball landed left or right of the line.
To this I know of no reply which the grue skeptic can make. If he/she say’s the paragraph back to me with the proper words swapped, it is not true, because In the hypothetical where we have a table, a line, and we are calling one side right and another side left, the only way for Refts:Lefts behave as expected as more trials are added is to move the line (if even that), otherwise the ratio of Refts to Lights will approach the reciprocal of Rights to Lefts.
This thin line is analogous to the frequency of emeralds that turn out green out of all the emeralds that get made. This is why we can assume that the line will not move, because that frequency has one precise value, which never changes. Its other important feature is reminding us that even if two terms are syntactically symmetrical, they may have semantic conditions for application which are ignored by the syntactical model, e.g., checking to see which side of the line the ball landed on.
In conclusion:
Every random variable has as a part of it, stored inits definition/code, a frequency distribution over its values. By the fact that somethings happen sometimes, and others happen other times, we know that the world contains random variables, even if they are never fundamental in the source code. Note that “frequency” is not used as a state of partial knowledge, it is a fact about a set and one of its subsets.
The reason that:
“The first emerald ever observed was green. The second emerald ever observed was green. The third emerald ever observed was green. … etc. The nth emerald ever observed was green. (conclusion): There is a very high probability that a never before observed emerald will be green.”
is a valid inference, but the grue equivalent isn’t, is that grue is not a property that the emerald construction sites of our universe deal with. They are blind to the grueness of their emeralds, they only say anything about whether or not the next emerald will be green. It may be that the rule that the emerald construction sites use to get either a green or non-green emerald change at time T, but the frequency of some particular result out of all trials will never change; the line will not move. As long as we know what symbols we are using for what values, observing many green emeralds is evidence that the next one will be grue, as long as it is before time T, every record of an observation of a green emerald is evidence against a grue one after time T. “Grue” changes meanings from green to blue at time T, ’green‴s meaning stays the same since we are using the same physical test to determine green-hood as before; just as we use the same test to tell whether the ball landed right or left. There is no reft in the universe’s source code, and there is no grue. Green is not fundamental in the source code, but green can be reduced to some particular range of quanta states; if you had the universes source code, you couldn’t write grue without first writing green; writing green without knowing a thing about grue would be just as hard as while knowing grue. Having a physical test, or primary condition for applicability, is what privileges green over grue after time T; to have a physical consistent test is the same as to reduce to a specifiable range of physical parameters; the existence of such a test is what prevents the skeptic from performing his/her swaps on our arguments.
Take this more as a brainstorm than as a final solution. It wasn’t originally but it should have been. I’ll write something more organized and consize after I think about the comments more, and make some graphics I’ve designed that make my argument much clearer, even to myself. But keep those comments coming, and tell me if you want specific credit for anything you may have added to my grue toolkit in the comments.
Bayes Slays Goodman’s Grue
This is a first stab at solving Goodman’s famous grue problem. I haven’t seen a post on LW about the grue paradox, and this surprised me since I had figured that if any arguments would be raised against Bayesian LW doctrine, it would be the grue problem. I haven’t looked at many proposed solutions to this paradox, besides some of the basic ones in “The New Problem of Induction”. So, I apologize now if my solution is wildly unoriginal. I am willing to put you through this dear reader because:
I wanted to see how I would fare against this still largely open, devastating, and classic problem, using only the arsenal provided to me by my minimal Bayesian training, and my regular LW reading.
I wanted the first LW article about the grue problem to attack it from a distinctly Lesswrongian aproach without the benefit of hindsight knowledge of the solutions of non-LW philosophy.
And lastly, because, even if this solution has been found before, if it is the right solution, it is to LW’s credit that its students can solve the grue problem with only the use of LW skills and cognitive tools.
I would also like to warn the savvy subjective Bayesian that just because I think that probabilities model frequencies, and that I require frequencies out there in the world, does not mean that I am a frequentest or a realist about probability. I am a formalist with a grain of salt. There are no probabilities anywhere in my view, not even in minds; but the theorems of probability theory when interpreted share a fundamental contour with many important tools of the inquiring mind, including both, the nature of frequency, and the set of rational subjective belief systems. There is nothing more to probability than that system which produces its theorems.
Lastly, I would like to say, that even if I have not succeeded here (which I think I have), there is likely something valuable that can be made from the leftovers of my solution after the onslaught of penetrating critiques that I expect form this community. Solving this problem is essential to LW’s methods, and our arsenal is fit to handle it. If we are going to be taken seriously in the philosophical community as a new movement, we must solve serious problems from academic philosophy, and we must do it in distinctly Lesswrongian ways.
That is the inference that the grue problem threatens, courtesy of Nelson Goodman. The grue problem starts by defining “grue”:
So you see that before time T, from the list of premises:
it follows that:
The proposer of the grue problem asks at this point: “So if the green premises are evidence that the next emerald will be green, why aren’t the grue premises evidence for the next emerald being grue?” If an emerald is grue after time T, it is not green. Let’s say that the green premises brings the probability of “A new unobserved emerald is green.” to 99%. In the skeptic’s hypothesis, by symmetry it should also bring the probability of “A new unobserved emerald is grue.” to 99%. But of course after time T, this would mean that the probability of observing a green emerald is 99%, and the probability of not observing a green emerald is at least 99%, since these sentences have no intersection, i.e., they cannot happen together, to find the probability of their disjunction we just add their individual probabilities. This must give us a number at least as big as 198%, which is of course, a contradiction of the Komolgorov axioms. We should not be able to form a statement with a probability greater than one.
This threatens the whole of science, because you cannot simply keep this isolated to emeralds and color. We may think of the emeralds as trials, and green as the value of a random variable. Ultimately, every result of a scientific instrument is a random variable, with a very particular and useful distribution over its values. If we can’t justify inferring probability distributions over random variables based on their previous results, we cannot justify a single bit of natural science. This, of course, says nothing about how it works in practice. We all know it works in practice. “A philosopher is someone who say’s, ‘I know it works in practice, I’m trying to see if it works in principle.’”—Dan Dennett
We may look at an analogous problem. Let’s suppose that there is a table and that there are balls being dropped on this table, and that there is an infinitely thin line drawn perpendicular to the edge of the table somewhere which we are unaware of. The problem is to figure out the probability of the next ball being right of the line given the last results. Our first prediction should be that there is a 50% chance of the ball being right of the line, by symmetry. If we get the result that one ball landed right of the line, by Laplace’s rule of succession we infer that there is a 2/3ds chance that the next ball will be right of the line. After n trials, if every trial gives a positive result, the probability we should assign to the next trial being positive as well is n+1/n +2.
If this line was placed 2/3ds down the table, we should expect that the ratio of rights to lefts should approach 2:1. This gives us a 2/3ds chance of the next ball being a right, and the fraction of Rights out of trials approaches 2/3ds ever more closely as more trials are performed.
Now let us suppose a grue skeptic approaching this situation. He might make up two terms “reft” and “light”. Defined as you would expect, but just in case:
The skeptic would continue:
Things for some reason become perfectly clear at this point for the defender of Bayesian inference, because now we have an easy to imaginable model. Of course, if a ball landing right of the line is evidence for Right, then it cannot possibly be evidence for ~Right; to be evidence for Reft, after time T, is to be evidence for ~Right, because after time T, Reft is logically identical to ~Right; hence it is not evidence for Reft, after time T, for the same reasons it is not evidence for ~Right. Of course, before time T, any evidence for Reft is evidence for Right for analogous reasons.
But now the grue skeptic can say something brilliant, that stops much of what the Bayesian has proposed dead in its tracks:
What can we possibly reply to this? Can he/she not do this with every argument we propose then? Certainly, the skeptic admits that Bayes, and the contradiction in Right & Reft, after time T, prohibits previous Rights from being evidence of both Right and Reft after time T; where he is challenging us is in choosing Right as the result which it is evidence for, even though “Reft” and “Right” have a completely symmetrical syntactical relationship. There is nothing about the definitions of reft and right which distinguishes them from each other, except their spelling. So is that it? No, this simply means we have to propose an argument that doesn’t rely on purely syntactical reasoning. So that if the skeptic performs the swap on our argument, the resulting argument is no longer sound.
What would happen in this scenario if it were actually set up? I know that seems like a strangely concrete question for a philosophy text, but its answer is a helpful hint. What would happen is that after time T, the behavior of the ratio: ‘Rights:Lefts’ as more trials were added, would proceed as expected, and the behavior of the ratio: ‘Refts:Lights’ would approach the reciprocal of the ratio: ‘Rights:Lefts’. The only way for this to not happen, is for us to have been calling the right side of the table “reft”, or for the line to have moved. We can only figure out where the line is by knowing where the balls landed relative to it; anything we can figure out about where the line is from knowing which balls landed Reft and which ones landed Light, we can only figure out because in knowing this and and time, we can know if the ball landed left or right of the line.
To this I know of no reply which the grue skeptic can make. If he/she say’s the paragraph back to me with the proper words swapped, it is not true, because In the hypothetical where we have a table, a line, and we are calling one side right and another side left, the only way for Refts:Lefts behave as expected as more trials are added is to move the line (if even that), otherwise the ratio of Refts to Lights will approach the reciprocal of Rights to Lefts.
This thin line is analogous to the frequency of emeralds that turn out green out of all the emeralds that get made. This is why we can assume that the line will not move, because that frequency has one precise value, which never changes. Its other important feature is reminding us that even if two terms are syntactically symmetrical, they may have semantic conditions for application which are ignored by the syntactical model, e.g., checking to see which side of the line the ball landed on.
In conclusion:
Every random variable has as a part of it, stored in its definition/code, a frequency distribution over its values. By the fact that somethings happen sometimes, and others happen other times, we know that the world contains random variables, even if they are never fundamental in the source code. Note that “frequency” is not used as a state of partial knowledge, it is a fact about a set and one of its subsets.
The reason that:
is a valid inference, but the grue equivalent isn’t, is that grue is not a property that the emerald construction sites of our universe deal with. They are blind to the grueness of their emeralds, they only say anything about whether or not the next emerald will be green. It may be that the rule that the emerald construction sites use to get either a green or non-green emerald change at time T, but the frequency of some particular result out of all trials will never change; the line will not move. As long as we know what symbols we are using for what values, observing many green emeralds is evidence that the next one will be grue, as long as it is before time T, every record of an observation of a green emerald is evidence against a grue one after time T. “Grue” changes meanings from green to blue at time T, ’green‴s meaning stays the same since we are using the same physical test to determine green-hood as before; just as we use the same test to tell whether the ball landed right or left. There is no reft in the universe’s source code, and there is no grue. Green is not fundamental in the source code, but green can be reduced to some particular range of quanta states; if you had the universes source code, you couldn’t write grue without first writing green; writing green without knowing a thing about grue would be just as hard as while knowing grue. Having a physical test, or primary condition for applicability, is what privileges green over grue after time T; to have a physical consistent test is the same as to reduce to a specifiable range of physical parameters; the existence of such a test is what prevents the skeptic from performing his/her swaps on our arguments.
Take this more as a brainstorm than as a final solution. It wasn’t originally but it should have been. I’ll write something more organized and consize after I think about the comments more, and make some graphics I’ve designed that make my argument much clearer, even to myself. But keep those comments coming, and tell me if you want specific credit for anything you may have added to my grue toolkit in the comments.