I think there’s a rationalist antipattern, where an area is very data-poor, and one then decides that one can come up with the correct theory for that area using reason, e.g. Occam’s razor/Bayesian updates/minimum description length/etc..
My experience is that this works surprisingly poorly in practice. Probably the reason is that people massively underestimate just how many models can be made which fit through the data one has observed.
In particular, one thing that has an especially high tendency to go badly is this:
So I’m asking you, LessWrongers to figure out how to weight credences between these theories to at least identify the most probable theory for our universe.
To illustrate how that can go bad, consider a biased coin with 60% chance of ending up tails, and imagine you flipped it 10 times. What’s the most likely sequence you could observe from this? HHHHHHHHHH. But is this a typical sequence, whose properties you can use to probably predict things right? No, you’d often (not always, depends on various obvious things) be better off with something like HTHHHTHTTH as your prototypical sequence, even though it is less likely, because its aggregate properties are more accurate.
Though you’d be even better off by integrating over all the possible sequences weighted by the probability, which in this metaphor corresponds to keeping an open mind to different possibilities while also being very skeptical about any 1 of them.
This is a good point, but also kind of an oversimplification of the situation in physics. Imagine Alice is trying to fit some (x, y) data points on a chart. She doesn’t know much about any kinds of function other than linear functions, but she can still fit half of the points at a time pretty well. Half of the points have a large x coordinate, and can be fit well by a line of positive slope. Alice calls this line “The Theory of General Relativity”. Half of the points have a small x coordinate, and can be fit well by a line of negative slope. Alice calls this line “Quantum Field Theory”. Collecting data in the region in between is very difficult for technical reasons, and so Alice doesn’t have any data there yet. But it looks like if she extended the lines until they met, they would end up making a kind of “V” shape.
This is a huge problem for her, because there is no linear function that gives a good fit on both sets of data points. Whatever the true function, it must somehow be “non-linear”, whatever that means. In order to go through both sets of points, the function must somehow “bend”. Alice can kind of imagine what a bendy function ought to look like, but when it comes time to put it into mathematics, she’s suddenly stuck. All the functions she’s ever seen can be written in the form y=mx+b, but how could there be a value for m that works for this kind of function? No real number works, and she tried complex numbers, but none of those worked either. Now Alice is trying to think of other number systems extending the reals that might contain a good value of m.
Then Bob comes along, and says “look, you’re attempting an impossible problem. There are way too many functions that go through all the data points we have so far. In order to distinguish between them, we have no choice but to try and collect some data in middle region, in between the two clusters of data points that we have so far. We have an oversupply of theories, and what we really need is data.”
To this Alice replies, “yes, in some sense it’s true that the data we currently have is insufficient. But I also wouldn’t say that we have an oversupply of theories. We know that the true function must fit well to all of the data points we already have, but we can’t actually explicitly write down even one single function like that. We have various hand-wavey notions of how one might eventually be able to write down such a function and calculate its values for various x coordinates, but none of those notions have actually delivered. Our understanding of the problem just isn’t good enough yet. Relative to our current mathematical capabilities, the problem isn’t under-constrained, it’s so over-constrained that there are 0 solutions, and no one is even close to finding one. More experiments are always good, but I’ll only agree that we can’t make progress without them once someone can find me two distinct candidate theories that are both internally consistent, and that fit all the data we’ve already taken.”
I think there’s a rationalist antipattern, where an area is very data-poor, and one then decides that one can come up with the correct theory for that area using reason, e.g. Occam’s razor/Bayesian updates/minimum description length/etc..
My experience is that this works surprisingly poorly in practice. Probably the reason is that people massively underestimate just how many models can be made which fit through the data one has observed.
In particular, one thing that has an especially high tendency to go badly is this:
To illustrate how that can go bad, consider a biased coin with 60% chance of ending up tails, and imagine you flipped it 10 times. What’s the most likely sequence you could observe from this? HHHHHHHHHH. But is this a typical sequence, whose properties you can use to probably predict things right? No, you’d often (not always, depends on various obvious things) be better off with something like HTHHHTHTTH as your prototypical sequence, even though it is less likely, because its aggregate properties are more accurate.
Though you’d be even better off by integrating over all the possible sequences weighted by the probability, which in this metaphor corresponds to keeping an open mind to different possibilities while also being very skeptical about any 1 of them.
This is a good point, but also kind of an oversimplification of the situation in physics. Imagine Alice is trying to fit some (x, y) data points on a chart. She doesn’t know much about any kinds of function other than linear functions, but she can still fit half of the points at a time pretty well. Half of the points have a large x coordinate, and can be fit well by a line of positive slope. Alice calls this line “The Theory of General Relativity”. Half of the points have a small x coordinate, and can be fit well by a line of negative slope. Alice calls this line “Quantum Field Theory”. Collecting data in the region in between is very difficult for technical reasons, and so Alice doesn’t have any data there yet. But it looks like if she extended the lines until they met, they would end up making a kind of “V” shape.
This is a huge problem for her, because there is no linear function that gives a good fit on both sets of data points. Whatever the true function, it must somehow be “non-linear”, whatever that means. In order to go through both sets of points, the function must somehow “bend”. Alice can kind of imagine what a bendy function ought to look like, but when it comes time to put it into mathematics, she’s suddenly stuck. All the functions she’s ever seen can be written in the form y=mx+b, but how could there be a value for m that works for this kind of function? No real number works, and she tried complex numbers, but none of those worked either. Now Alice is trying to think of other number systems extending the reals that might contain a good value of m.
Then Bob comes along, and says “look, you’re attempting an impossible problem. There are way too many functions that go through all the data points we have so far. In order to distinguish between them, we have no choice but to try and collect some data in middle region, in between the two clusters of data points that we have so far. We have an oversupply of theories, and what we really need is data.”
To this Alice replies, “yes, in some sense it’s true that the data we currently have is insufficient. But I also wouldn’t say that we have an oversupply of theories. We know that the true function must fit well to all of the data points we already have, but we can’t actually explicitly write down even one single function like that. We have various hand-wavey notions of how one might eventually be able to write down such a function and calculate its values for various x coordinates, but none of those notions have actually delivered. Our understanding of the problem just isn’t good enough yet. Relative to our current mathematical capabilities, the problem isn’t under-constrained, it’s so over-constrained that there are 0 solutions, and no one is even close to finding one. More experiments are always good, but I’ll only agree that we can’t make progress without them once someone can find me two distinct candidate theories that are both internally consistent, and that fit all the data we’ve already taken.”
That is a fair point.