Last week was a gathering of physicists in Oxford to discuss string theory and the philosophy of science.
From the article:
Nowadays, as several philosophers at the workshop said, Popperian falsificationism has been supplanted by Bayesian confirmation theory, or Bayesianism...
Gross concurred, saying that, upon learning about Bayesian confirmation theory from Dawid’s book, he felt “somewhat like the Molière character who said, ‘Oh my God, I’ve been talking prose all my life!’”
That the Bayesian view is news to so many physicists is itself news to me, and it’s very unsettling news. You could say that modern theoretical physics has failed to be in-touch with other areas of science, but you could also make the argument that the rationalist community has failed to properly reach out and communicate with scientists.
The character from Molière learns a fancy name (“speaking in prose”) for the way he already communicates. David Gross isn’t saying that he is unfamiliar with the Bayesian view, he’s saying that “Bayesian confirmation theory” is a fancy name for his existing epistemic practice.
Rationalist community needs to learn a little humility. Do you realize the disparity in intellectual firepower between “you guys” and theoretical physicists?
This is the overgeneralized IQ fantasy. A really smart physicist may be highly competent at say string theory, but know very little about french pasteries or cuda programming or—more to the point—solomonoff induction.
I post here, but I don’t identify as a rationalist. Two most valuable ideas (to me) that circulate here are tabooing and steelmanning (but they were not invented here).
I think I try to cultivate what you would call the “rationalist mindset” in order to do math. But I view it as a tool for certain problems only, not a part of my identity.
Solomonoff induction is uncomputable, so it’s not going to help you in any way. But Jaynes (who was a physicist) said that using Bayesian methods to analyze magnetic resonance data helped him gain an unprecedented resolution.
Quoting from his book:
In the 1987 Ph.D. thesis of G. L. Bretthorst, and more fully in Bretthorst
(1988), we applied Bayesian analysis to estimation of frequencies of nonstationary sinu-
soidal signals, such as exponential decay in nuclear magnetic resonance (NMR) data, or
chirp in oceanographic waves. We found – as was expected on theoretical grounds – an
improved resolution over the previously used Fourier transform methods.
If we had claimed a 50% improvement, we would have been believed at once, and
other researchers would have adopted this method eagerly. But, in fact, we found orders of
magnitude improvement in resolution.
jacob_cannell above seems to think it is very important for physicists to know about Solomonoff induction.
Solomonoff induction is one of those ideas that keeps circulating here, for reasons that escape me.
If we are talking about Bayesian methods for data analysis, almost no one on LW who is breathlessly excited about Bayesian stuff actually knows what they are talking about (with 2-3 exceptions, who are stats/ML grad students or up). And when called on it retreat to the “Bayesian epistemology” motte.
Bayesian methods didn’t save Jaynes from being terminally confused about causality and the Bell inequalities.
I still haven’t figured out what you have against Bayesian epistemology. It’s not like this is some sort of LW invention—it’s pretty standard in a lot of philosophical and scientific circles, and I’ve seen plenty of philosophers and scientists who call themselves Bayesians.
Solomonoff induction is one of those ideas that keeps circulating here, for reasons that escape me.
My understanding is that Solomonoff induction is usually appealed to as one of the more promising candidates for a formalization of Bayesian epistemology that uses objective and specifically Occamian priors. I haven’t heard Solomonoff promoted as much outside LW, but other similar proposals do get thrown around by a lot of philosophers.
Bayesian methods didn’t save Jaynes from being terminally confused about causality and the Bell inequalities.
Of course Bayesianism isn’t a cure-all by itself, and I don’t think that’s controversial. It’s just that it seems useful in many fundamental issues of epistemology. But in any given domain outside of epistemology (such as causation or quantum mechanics), domain-relevant expertise is almost certainly more important. The question is more whether domain expertise plus Bayesianism is at all helpful, and I’d imagine it depends on the specific field. Certainly for fundamental physics it appears that Bayesianism is often viewed as at least somewhat useful (based on the conference linked by the OP and by a lot of other things I’ve seen quoted from professional physicists).
I don’t have any problem with Bayesian epistemology at all. You can have whatever epistemology you want.
What I do have a problem with is this “LW myopia” where people here think they have something important to tell to people like Ed Witten about how people like Ed Witten should be doing their business. This is basically insane, to me. This is strong evidence that the type of culture that gets produced here isn’t particularly sanity producing.
Solomonoff induction is useless to know about for anyone who has real work to do (let’s say with actual data, like physicists). What would people do with it?
In many cases I’d agree it’s pretty crazy, especially if you’re trying to go up against top scientists.
On the other hand, I’ve seen plenty of scientists and philosophers claim that their peers (or they themselves) could benefit from learning more about things like cognitive biases, statistics fallacies, philosophy of science, etc. I’ve even seen experts claim that a lot of their peers make elementary mistakes in these areas. So it’s not that crazy to think that by studying these subjects you can have some advantages over some scientists, at least in some respects.
Of course that doesn’t mean you can be sure that you have the advantage. As I said, probably in most cases domain expertise is more important.
Absolutely agree it is important for scientists to know about cognitive biases. Francis Bacon, the father of the empirical method, explicitly used cognitive biases (he called them “idols,” and even classified them) as a justification for why the method was needed.
I always said that Francis Bacon should be LW’s patron saint.
So it sounds like you’re only disagreeing with the OP in degree. You agree with the OP that a lot of scientists should be learning more about cognitive biases, better statistics, epistemology, etc., just as we are trying to do on LW. You’re just pointing out (I think) that the “informed laymen” of LW should have some humility because (a) in many cases (esp. for top scientists?) the scientists have indeed learned lots of rationality-relevant subject matter, perhaps more than most of us on LW, (b) domain expertise is usually more important than generic rationality, and (c) top scientists are very well educated and very smart.
edit: Although I should say LW “trying to learn better statistics” is too generous. There is a lot more “arguing on the internet” and a lot less “reading” happening.
jacob_cannell above seems to think it is very important for physicists to know about Solomonoff induction.
I think a more charitable read would go like this: being smarter doesn’t necessarily mean that you know everything there’s to know nor that you are more rational than other people. Since being rational or knowing about Bayesian epistemology is important in every field of science, physicists should be motivated to learn this stuff.
I don’t think he was suggesting that French pastries are literally useful to them.
Solomonoff induction is one of those ideas that keeps circulating here, for reasons that escape me.
Well, LW was born as a forum about artificial intelligence. Solomonoff induction is like an ideal engine for generalized intelligence, which is very cool!
Bayesian methods didn’t save Jaynes from being terminally confused about causality and the Bell inequalities.
That’s unfortunate, but we cannot ask of anyone, even geniuses, to transcend their time. Leonardo da Vinci held some ridiculous beliefs, for our standars, just like Ramanujan or Einstein. With this I’m not implying that Jaynes was a genius of that caliber, I would ascribe that status more to Laplace.
On the ‘bright’ side, in our time nobody knows how to reconcile epistemic probability and quantum causality :)
As far as I am aware, Solomonoff induction describes the singularly correct way to do statistical inference in the limits of infinite compute. (It computes generalized/full Bayesian inference)
All of AI can be reduced to universal inference, so understanding how to do that optimally with infinite compute perhaps helps one think more clearly about how practical efficient inference algorithms can exploit various structural regularities to approximate the ideal using vastly less compute.
Because AIXI is the first complete mathematical model of a general AI and is based on Solomonoff induction. Also, computable approximation to Solomonoff prior has been used to teach small AI to play videogames unsupervised. So, yeah.
While Bretthorst is his immediate and obvious successor, unfortunately nobody that I know of has taken up the task to develop the field the way Jaynes did.
A really smart physicist may be highly competent at say string theory, but know very little about french
pasteries or cuda programming or—more to the point—solomonoff induction.
I am pretty sure jacob_connell specifically brought up Solomonoff induction. I am still waiting for him to explain why I (let alone Ed Witten) should care about this idea.
Since being rational or knowing about Bayesian epistemology is important in every field of science
How do you know what is important in every field of science? Are you a scientist? Do you publish? Where is your confidence coming from, first principles?
Solomonoff induction is like an ideal engine for generalized intelligence, which is very cool!
Whether Solomonoff induction is cool or not is a matter of opinion (and “mathematical taste,”) but more to the point the claim seems to be it’s not only cool but vital for physicists to know about. I want to know why. It seems fully useless to me.
we cannot ask of anyone, even geniuses, to transcend their time.
Jaynes died in 1997. Bayesian networks (the correct bit of math to explain what is going on with Bell inequalities) were written up in book form in 1988, and were known about in various special case forms long before that.
Where is your confidence coming from, first principles?
Well, yes of course. Cox’ theorem. Journals are starting to refute papers based on the “p<0.05” principle. Many studies in medicine and psychology cannot be replicated. Scientists are using inferior analysis methods when better are available just because they were not taught to. I do say there’s a desperate need to divulge Bayesian thinking.
Jaynes died in 1997. Bayesian networks (the correct bit of math to explain what is going on with Bell inequalities) were written up in book form in 1988, and were known about in various special case forms long before that.
I wasn’t referring to that. Jaynes knew that quantum mechanics was incompatible with the epistemic view of probability, and from his writing, while never explicit, it’s clear that he was thinking about a hidden variables model. Undisputable violation of the Bell inequalities were performed only this year. Causality was published in 2001. We still don’t know how to stitch epistemic probabilities and quantum causality. What I’m saying is that the field was in motion when Jaynes died, and we still don’t know a large deal about it. As I said, we cannot ask anyone not to hold crazy ideas from time to time.
Datapoint: in [biological] systematics in its broadest sense, Bayesian methods are increasingly important (molecular evolution studies,...), but I’ve never heard about pure Bayesian epistemology being in demand. Maybe because we leave it all to our mathematicians.
Part of the issue I keep harping about is people keep confusing Bayes rule, Bayesian networks, Bayesian statistical inference, and Bayesian epistemology. I don’t have any issue with a thoughtful use of Bayesian statistical inference when it is appropriate—how could I?
My issue is people being confused, or people having delusions of grandeur.
jacob_cannell above seems to think it is very important for physicists to know about Solomonoff induction.
Nah—I was just using that as an example of things physicists (regardless of IQ) don’t automatically know.
Most physicists were trained to think in terms of Popperian epistemology, which is strictly inferior to (dominated by) Bayesian epistemology (if you don’t believe that, it’s not worth my time to debate). In at least some problem domains, the difference in predictive capability between the two methodologies are becoming significant.
Physicists don’t automatically update their epistemologies, it isn’t something they are using to having to update.
Most physicists were trained to think in terms of Popperian epistemology, which is strictly inferior to (dominated
by) Bayesian epistemology (if you don’t believe that, it’s not worth my time to debate).
I equate “Bayesian epistemology” with a better approximation of universal inference. It’s easy to generate example environments where Bayesian agents dominate Popperian agents, while the converse is never true. Popperian agents completely fail to generalize well from small noisy datasets. When you have very limited evidence, popperian reliance on hard logical falsifiability just fails.
This shouldn’t even really be up for debate—do you actually believe the opposite position, or are you just trolling?
Ha, but a robot programmed with a Solomonoff induction-like software will learn to do French pastries long before pastries will learn how to do Solomonoff induction!
robot programmed with a Solomonoff induction-like software will learn to do French pastries long before pastries will learn how to do Solomonoff induction!
French pastries correspond to a pretty long bit-string so you may have to wait for a very long time (and eat a lot of very bad possibly-pastries in the meantime :-P). A physicist can learn to make pastries much quicker.
It could be that the attitude/belief that theoretical physicists are far smarter than anyone else (and therefore, by implication, do not need to listen to anyone else) is part of the problem I’m outlining.
It could be, but I think theoretical physicists actually are very intelligent. Do you disagree?
edit: But let’s leave them aside, and talk about me, since I am actually here. I am not in the same league as Ed Witten, not even close. Do you (generic sense) have something sensible to communicate to me about how I go about my business?
edit: But let’s leave them aside, and talk about me, since I am actually here. I am not in the same league as Ed Witten, not even close. Do you (generic sense) have something sensible to communicate to me about how I go about my business?
I am not. But I do theory work, and some of it is even related to analyzing data (and I am actually here to have this conversation, whereas Ed is not). So—what do you have to teach me?
I dunno. I have PhD in engineering. In my graduate research and in my brief life as a practicing scientist, I used rationalist skills like “search for more hypotheses” and “think exclusively about the problem for five minutes before doing anything else” and generally leveraged LW-style thinking, that I didn’t learn in school, to be more successful and productive than I probably would have been otherwise. I could probably write a lengthy article about how I perceive LW to have helped me in my life, but I know that it would seem extremely post hoc and you could also probably say that the skills I’m using are not unique to LW. All I can say is that the core insight that formed the crux of my dissertation arose because I was using a very LW-style approach to analyzing a problem.
The thing about rationalist skills is that LW does not any cannot have a monopoly on them. In fact, the valuable function of LW (at least in the past) has been to both aggregate and sort through potentially actionable strategic directives and algorithms.
What’s interesting to me is that school doesn’t do that at all. I got through however-many years of schooling and earned a PhD without once taking a class about Science, about how to actually do it, about what the process of Science is. I absorbed some habits from advisers and mentors, that’s about it. The only place that I even know of where people talk at length about the inner operations of mind that correspond to the outer reality where one observes discoveries being made is Less Wrong.
And if you’re an entrepreneur and don’t care about science, then Less Wrong is also one of a few places where people talk at length about how to marshal your crappy human brain and coax it to working productively on tasks that you have deliberately and strategically chosen.
One problem is that I’m probably thinking of the Less Wrong of four years ago rather than the Less Wrong of today. In any case, all those old posts that I found so much value in are still there.
I feel like this is an important point that goes a long way to give one the intellectual / social humility IlyaShpitser is pointing at, and I agree completely that the value of LW as a site/community/etc. is primarily in sorting and aggregating. (It’s the people that do the creating or transferring.)
You are correct in that surveys of IQ and other intelligence scores consistently show physicists having some of the highest. But mathematics, statistics, computer science, and engineering are the same, and most studies I’ve seen generally see very little, if any, significant difference in intelligence scores between these fields.
‘Rationalist’ isn’t a field or specialization, it’s defined more along the lines of refining and improving rational thinking. Based on the lesswrong survey, fields like mathematics and computer science are heavily represented here. There are actually more physicists (4.3%) than philosophers (2.4%). If this is inconsistent with your perception of the community, update your prior.
From all of this it is safe to assume that the average LW’er is ‘very smart’, and that LW contains a mini-community of rationalist scientists. One data point: Me. I have a PhD in engineering and I’m a practising scientist. Maybe I should have phrased my initial comment as: “It might be better if the intersection of rationalists and scientists were larger.”
(a) It is very difficult to perceive qualitative differences for people 1 sigma+ above “you” (for any value of “you”), but it is enormous.
(b) How much “science process” does this community actually understand? How many are practicing scientists, as in publish real stuff in journals?
The outside view worry is there might be a bit of a “twenty something knowitall” going on. You read some stuff, and liked it. That’s great! If the stuff isn’t universally adopted by very smart folks, there are probably very good reasons for that! Read more!
My argument boils down to: “no, really, very smart people are actually very smart.”
The median IQ at LessWrong is 139, the average Nobel laureate is reputed to have an IQ of 145. Presumably that means many people at LessWrong are in a position to understand the reasoning of Nobel laureates, at least.
I calculate about 128 for the average IQ of a survey respondent who provides one and I suspect that nonresponse means the actual average is closer to 124 or so. (Thus I agree with you that there is a significant gap between the average Nobel laureate and the average LWer.)
I think the right way to look at LW’s intellectual endowment is that it’s very similar to a top technical college, like Harvey Mudd. There are a handful of professor/postdoc/TA types running around, but as a whole the group skews very young (graph here, 40 is 90th percentile) and so even when people are extraordinarily clever they don’t necessarily have the accomplishments or the breadth for that to be obvious. (And because of how IQ distributions work, especially truncated ones with a threshold, we should expect most people to be close to the threshold.)
I agree with this. I think looking at a typical LWer as a typical undergrad at Harvey Mudd is a good model. (This is not a slur, btw, Harvey Mudd is great).
I look at the IQ results for the survey every year. A selected handful of comments:
Karma vs. multiple IQ tests: positive correlation (.45) between self-report and Raven’s for users with positive karma, negative correlation (-.11) between self-report and Raven’s for users without positive karma.
SATs are very high: 96th percentile in the general population is lower quartile here. (First place I make the Harvey Mudd comparison.)
SAT self-report vs. IQ self-report: average SAT, depending on which one you look at and how you correct it, suggests that the average LWer is somewhere between 98th and 99.5th percentile. (IQ self-report average is above 99.5th percentile, and so I call the first “very high” and the second “extremely high.”)
I’ve interacted with a handful of Nobel laureates, I’m familiar with the professors and students at two top 15 graduate physics programs, and I’ve interacted with a bunch of LWers. LW as whole seems roughly comparable to a undergraduate physics department, active LWers roughly comparable to a graduate physics department, and there are top LWers at the level of the Nobel laureates (but aging means the ~60 year old Nobel laureates are not a fair comparison to the ~30 year old top LWers, and this is selecting just the math genius types from the top LWers, not the most popular top LWers). Recall Marcello comparing Conway and Yudkowsky.
Because I hung out with some top academic people, I know what actual genius is like.
Incidentally, when I talk about people being “very smart” I don’t mean “as measured by IQ.” As I mentioned lots of times before, I think IQ is a very poor measure of math smarts, and a very poor measure of generalized smarts at the top end. Intelligence is too heterogeneous, and too high dimensional. But there is such as thing as being “very smart,” it’s just a multidimensional thing.
So in this case, I just don’t think there is a lot of info in the data. I much prefer looking at what people have done as a proxy for their smarts. “If you are so smart, where are all your revolutionary papers?” This also correctly adjusts for people who actually are very smart, but who bury their talents (and so their hypothetical smarts are not super interesting to talk about).
I’ve already had this discussion with someone else, about another topic: I pointed out that statistically, lottery winners end up not happier than they were before winning. He said that he knew how to spend them well to be effectively much happier. In our discussion, you have some insight that from my perspective are biased, but from your point of view are not. Unfortunately, your data rely on uncommunicable evidence, so we should just disagree and call it a day.
But the situation is not as hopeless as it seems. Try to find some people at the top of their game, and hang out with them for a bit. Honestly, if you think “Mr. Average Less Wrong” and Ed Witten are playing in the same stadium, you are being a bit myopic. But this is the kind of thing more info can help with. You say you can’t use my info (and don’t want to take my word for it), but you can generate your own if you care.
the average Nobel laureate is reputed to have an IQ of 145.
Is there a reliable source for this?
[1] is one source. Its method is: “Jewish IQ is distributed like American-of-European-ancestry IQ, but a standard deviation higher. If you look at the population above a certain IQ threshold, you see a higher fraction of Jews than in the normal population. If you use the threshold of 139, you see 27% Jews, which is the fraction of Jews who are Nobel laureates. So let’s assume that Nobel laureate IQ is distributed like AOEA IQ after you cut off everyone with IQ below 139. It follows that Nobel laureates have an average IQ of 144.”
I hope you’ll agree that this seems dubious.
[2] agrees that it’s dubious, and tries to calculate it a different way (still based on fraction of Jews), and gets 136. (It’s only reported by field, but it would be the same as chemistry and literature, because they’re both 27% Jews.) It gets that number by doing a bunch of multiplications which I suspect are the wrong multiplications to do. (Apparently, if IQ tests had less g loading, and if self-identified ethnicity correlated less with ancestry, then the g loading of Jewishness would go up?) But even if the calculations do what they’re supposed to, it feels like a long chain of strong assumptions and noisy data, and this method seems about equally dubious to me.
I tried to get a discussion going on this exact subject in my post this week, but there seemed to be little interest. A major weakness of the standard Bayesian inference method is that it assumes a problem only has two possible solutions. Many problems involve many possible solutions, and many times the number of possible solutions is unknown, and in many cases the correct solution hasn’t been thought of yet. In such instances, confirmation through inductive inference may not be the best way of looking at the problem.
Where did you get this from? Maintaining beliefs over an entire space of possible solutions is a strength of the Bayesian approach. Please don’t talk about Bayesian inference after reading a single thing about updating beliefs on whether a coin is fair or not. That’s just a simple tutorial example.
If I have 3 options, A, B, and C, and I’m 40% certain the best option is A, 30% certain the best option is B, and 30% certain the best option is C, would it be correct to say that I’ve confirmed option A instead of say my best evidence suggests A? This can sort of be corrected for with the standard Bayesian confirmation model, but the problem becomes larger as the number of possibilities increases to the point where you can’t get a good read on your own certainty, or to the point where the number of possibilities is unknown.
I’m arguing that Bayesian confirmation theory as a philosophy was originally conceived as a model using only two possibilities (A and ~A), and then this model was extrapolated into problems with more than two possibilities. If it had been originally conceived using more than two possibilities, it wouldn’t have made any sense to use the word confirmation. So explanations of Bayesian confirmation theory will often entail considering theories or decisions in isolation rather than as part of a group of decisions or theories.
So if there are 20 possible explanations for a problem, and there is no strong evidence suggesting any one explanation, then I will have 5% certainty of the average explanation. Unless I am extremely good at calibration, then I can’t confirm any of them, and if I consider each explanation in isolation from the other explanations, then all of them are wrong.
It doesn’t matter whether we’re talking about hypotheses or decision-making.
Bayesian confirmation theory as a philosophy was originally conceived as a model using only two possibilities
I’m not sure whether this is true, but it’s irrelevant. Bayesian confirmation theory works just fine with any number of hypotheses.
then I can’t confirm any of them
If by “confirm” you mean “assign high probability to, without further evidence”, yes. That seems to me to be exactly what you’d want. What is the problem you see here?
If it had been originally conceived using more than two possibilities, it wouldn’t have made any sense to use the word confirmation.
You sound confused. The “confirmation” stems from
In Bayesian Confirmation Theory, it is said that evidence confirms (or would confirm) hypothesis H (to at least some degree) just in case the prior probability of H conditional on E is greater than the prior unconditional probability of H
So what if p(H) = 1, p(H|A) = .4, p(H|B) = .3, and p(H|C) = .3? The evidence would suggest all are wrong. But I have also determined that A, B, and C are the only possible explanations for H. Clearly there is something wrong with my measurement, but I have no method of correcting for this problem.
H is Hypothesis. You have three: HA, HB, and HC. Let’s say your prior is that they are equally probable, so the unconditional P(HA) = P(HB) = P(HC) = 0.33
Let’s also say you saw some evidence E and your posteriors are P(HA|E) = 0.4, P(HB|E) = 0.3, P(HC|E) = 0.3. This means that evidence E confirms HA because P(HA|E) > P(HA). This does not mean that you are required to believe that HA is true or bet your life’s savings on it.
That’s a really good explanation of part of the problem I was getting at. But that requires considering the three hypotheses as a group rather than in isolation from all other hypotheses to calculate 0.33.
But that requires considering the three hypotheses as a group rather than in isolation from all other hypotheses to calculate 0.33
No, it does not.
Let’s say you have a hypothesis HZ. You have a prior for it, say P(HZ) = 0.2 which means that you think that there is a 20% probability that HZ is true and 80% probability that something else is true. Then you see evidence E and it so happens that the posterior for HZ becomes 0.25, so P(HZ|E) = 0.25. This means that evidence E confirmed hypothesis HZ and that statement requires nothing from whatever other hypotheses HA,B,C,D,E,etc. might there be.
How would you calculate that prior of 0.2? In my original example, my prior was 1, and then you transformed it into 0.33 by dividing by the number of possible hypotheses. You wouldn’t be able to do that without taking the other two possibilities into account. As I said, the issue can be corrected for if the number of hypotheses is known, but not if the number of possibilities is unknown. However, frequently philosophical theories of bayesian confirmation theory don’t consider this problem. From this paper by Morey, Romeijn, and Rouder:
Overconfident Bayes is problematic because it lacks the necessary humility that accompanies the understanding that inferences are based on representations. We agree that there is a certain silliness in computing a posterior odds between model A and model B, seeing that it is in favour of model A by 1 million to one, and then declaring that model A has a 99.9999% probability of being true. But this silliness arises not from model A being false. It arises from the fact that the representation of possibilities is quite likely impoverished because there are only two models. This impoverished representation makes translating the representational statistical inferences into inferences pertaining to the real world difficult or impossible.
Priors are always for a specific hypothesis. If your prior is 1, this means you believe this hypothesis unconditionally and no evidence can make you stop believing it.
You are talking about the requirement that all mutually exclusive probabilities must sum to 1. That’s just a property of probabilities and has nothing to do with Bayes.
the issue can be corrected for if the number of hypotheses is known, but not if the number of possibilities is unknown.
Yes, it can. To your “known” hypotheses you just add one more which is “something else”.
Really, just go read. You are confused because you misunderstand the basics. Stop with the philosophy and just figure out how the math works.
I’m not arguing with the math; I’m arguing with how the philosophy is often applied. Consider the condition where my prior is greater than my evidence for all choices I’ve looked at, the number of possibilities is unknown, but I still need to make a decision about the problem? As the paper I was originally referencing mentioned, what if all options are false?
What does “have to make a decision” mean when “all options are false”?
Are you thinking about the situation when you have, say, 10 alternatives with the probabilities of 10% each except for two, one at 11% and one at 9%? None of them are “true” or “false”, you don’t know that. What you probably mean is that even the best option, the 11% alternative, is more likely to be false than true. Yes, but so what? If you have to pick one, you pick the RELATIVE best and if its probability doesn’t cross the 50% threshold, well, them’s the breaks.
Yes that is exactly what I’m getting at. It doesn’t seem reasonable to say you’ve confirmed the 11% alternative. But then there’s another problem, what if you have to make this decision multiple times? Do you throw out the other alternatives and only focus on the 11%? That would lead to status quo bias. So you have to keep the other alternatives in mind, but what do you do with them? Would you then say you’ve confirmed those other alternatives? This is where the necessity of something like falsification comes into play. You’ve got to continue analyzing multiple options as new evidence comes in, but trying to analyze all the alternatives is too difficult, so you need a way to throw out certain alternatives, but you never actually confirm any of them. These problems come up all the time in day to day decision making such as deciding on what’s for dinner tonight.
It doesn’t seem reasonable to say you’ve confirmed the 11% alternative.
In the context of the Bayesian confirmation theory, it’s not you who “confirms” the hypothesis. It’s evidence which confirms some hypothesis and that happens at the prior → posterior stage. Once you’re dealing with posteriors, all the confirmation has already been done.
what if you have to make this decision multiple times?
Do you get any evidence to update your posteriors? Is there any benefit to picking different alternatives? If no and no, then sure, you repeat your decision.
That would lead to status quo bias.
No, it would not. That’s not what the status quo bias is.
You keep on using words without understanding their meaning. This is a really bad habit.
If your problem is which tests to run, then you’re in the experimental design world. Crudely speaking, you want to rank your available tests by how much information they will give you and then do those which have high expected information and discard those which have low expected information.
All you have to do is not simultaneously use “confirm” to mean both “increase the probability of” and “assign high probability to”.
As for throwing out unlikely possibilities to save on computation: that (or some other shortcut) is sometimes necessary but it’s an entirely separate matter from Bayesian confirmation theory or indeed Popperian falsificationism. (Popper just says to rule things out when you’ve disproved them. In your example, you have a bunch of things near to 10% and Popper gives you no licence to throw any of them out.
Yes, sorry. I’m considering multiple sources which I recognize the rest of you haven’t read, and trying to translate them into short comments which I’m probably not the best person to do so, so I recognize the problem I’m talking about may come out a bit garbled, but I think the quote from the Morey et al. paper I quoted above describes the problem the best.
You see how Morey et al call the position they’re criticizing “Overconfident Bayesianism”? That’s because they’re contrasting it with another way of doing Bayesianism, about which they say “we suspect that most Bayesians adhere to a similar philosophy”. They explicitly say that what they’re advocating is a variety of Bayesian confirmation theory.
The part about deduction from the Morey et al. paper:
GS describe model testing as being outside the scope of Bayesian confirmation theory, and we agree. This should not be seen as a failure of Bayesian confirmation theory, but rather as an admission that Bayesian confirmation theory cannot describe all aspects of the data analysis cycle. It would be widely agreed that the initial generation of models is outside Bayesian confirmation theory; it should then be no surprise that subsequent generation of models is also outside its scope.
Who has been claiming that Bayesian confirmation theory is a tool for generating models?
(It can kinda-sorta be used that way if you have a separate process that generates all possible models, hence the popularity of Solomonoff induction around here. But that’s computationally intractable.)
As stated in my original comment, confirmation is only half the problem to be considered. The other half is inductive inference which is what many people mean when they refer to Bayesian inference. I’m not saying one way is clearly right and the other wrong, but that this is a difficult problem to which the standard solution may not be best.
You’d have to read the Andrew Gelman paper they’re responding to to see a criticism of confirmation.
As I said, the issue can be corrected for if the number of hypotheses is known, but not if the number of possibilities is
unknown
You don’t need to know the number, you need to know the model (which could have infinite hypotheses in it).
Your model (hypothesis set) could be specified by an infinite number of parameters, say “all possible means and variances of a Gaussian.” You can have a prior on this space, which is a density. You update the density with evidence to get a new density. This is Bayesian stats 101. Why not just go read about it? Bishop’s machine learning book is good.
True, but working from a model is not an inductive method, so it can’t be classified as confirmation through inductive inference which is what I’m criticizing.
??? IlyaShpitser if I understand correctly is talking about creating a model of a prior, collecting evidence, and then determining whether the model is true or false. That’s hypothesis testing, which is deduction; not induction.
You have a (possibly infinite) set of hypotheses. You maintain beliefs about this set. As you get more data, your beliefs change. To maintain beliefs you need a distribution/density. To do that you need a model (a model is just a set of densities you consider). You may have a flexible model and let the data decide how flexible you want to be (non-parametric Bayes stuff, I don’t know too much about it), but there’s still a model.
Suggesting for the third and final time to get off the internet argument train and go read a book about Bayesian inference.
That interesting solution is exactly what people doing Bayesian inference do. Any criticism you may have that doesn’t apply to what Ilya describes isn’t a criticism of Bayesian inference.
But that requires considering the three hypotheses as a group rather than in isolation from all other hypotheses to calculate 0.33.
Not really. A hypothesis’s prior probability comes from the total of all of your knowledge; in order to determine that P(HA)=0.33 Lumifer needed the additional facts that there were three possibilities that were all equally likely.
It works just as well if I say that my prior is P(HA)=0.5, without any exhaustive enumeration of the other possibilities. Then evidence E confirms HA if P(HA|E)>P(HA).
(One should be suspicious that my prior probability assessment is a good one if I haven’t accounted for all the probability mass, but the mechanisms still work.)
One should be suspicious that my prior probability assessment is a good one if I haven’t accounted for all the probability mass, but the mechanisms still work.
Which is one of the other problems I was getting at
If you start with inconsistent assumptions, you get inconsistent conclusions. If you believe P(H)=1, P(A&B&C)=1, and P(H|A) etc. are all <1, then you have already made a mistake. Why are you blaming this on Bayesian confirmation theory?
The relevance is that it’s a really weird way to set up a problem. If P(H)=1 and P(H|A)=0.4 then it is necessarily the case that P(A)=0. If that’s not immediately obvious to you, you may want to come back to this topic after sleeping on it.
A large chunk of academics would say that it is. For example, from the paper I was referencing in my post:
At some point in history, a statistician may well write down a model which he or she believes contains all the systematic influences among properly defined variables for the system of interest, with correct functional forms and distributions of noise terms. This could happen, but we have never seen it, and in social science we have never seen anything that comes close. If nothing else, our own experience suggests that however
many different specifications we thought of, there are always others which did not occur to us, but cannot be immediately dismissed a priori, if only because they can be seen as alternative approximations to the ones we made. Yet the Bayesian agent is required to start with a prior distribution whose support covers all alternatives that could be considered.
A major weakness of the standard Bayesian inference method is that it assumes a problem only has two possible solutions.
This is a weird sentence to me. I learned about Bayesian inference through Jaynes’ book and surely it doesn’t portray that inference as having only two possible solutions. The other book I know about, Sivia’s, doesn’t do this either.
Last week was a gathering of physicists in Oxford to discuss string theory and the philosophy of science.
From the article:
That the Bayesian view is news to so many physicists is itself news to me, and it’s very unsettling news. You could say that modern theoretical physics has failed to be in-touch with other areas of science, but you could also make the argument that the rationalist community has failed to properly reach out and communicate with scientists.
The character from Molière learns a fancy name (“speaking in prose”) for the way he already communicates. David Gross isn’t saying that he is unfamiliar with the Bayesian view, he’s saying that “Bayesian confirmation theory” is a fancy name for his existing epistemic practice.
Rationalist community needs to learn a little humility. Do you realize the disparity in intellectual firepower between “you guys” and theoretical physicists?
This is the overgeneralized IQ fantasy. A really smart physicist may be highly competent at say string theory, but know very little about french pasteries or cuda programming or—more to the point—solomonoff induction.
As I said, here I am. Tell me how Solomonoff induction is going to change how I do my business. I am listening.
You are already a lesswronger—would you say that lesswrong has changed the way you think at all? Why do you keep coming back?
I post here, but I don’t identify as a rationalist. Two most valuable ideas (to me) that circulate here are tabooing and steelmanning (but they were not invented here).
I think I try to cultivate what you would call the “rationalist mindset” in order to do math. But I view it as a tool for certain problems only, not a part of my identity.
Do you want me to leave?
I like you being here.
That wasn’t my point. My point was, you are the best one to answer your own question.
Solomonoff induction is uncomputable, so it’s not going to help you in any way.
But Jaynes (who was a physicist) said that using Bayesian methods to analyze magnetic resonance data helped him gain an unprecedented resolution. Quoting from his book:
jacob_cannell above seems to think it is very important for physicists to know about Solomonoff induction.
Solomonoff induction is one of those ideas that keeps circulating here, for reasons that escape me.
If we are talking about Bayesian methods for data analysis, almost no one on LW who is breathlessly excited about Bayesian stuff actually knows what they are talking about (with 2-3 exceptions, who are stats/ML grad students or up). And when called on it retreat to the “Bayesian epistemology” motte.
Bayesian methods didn’t save Jaynes from being terminally confused about causality and the Bell inequalities.
I still haven’t figured out what you have against Bayesian epistemology. It’s not like this is some sort of LW invention—it’s pretty standard in a lot of philosophical and scientific circles, and I’ve seen plenty of philosophers and scientists who call themselves Bayesians.
My understanding is that Solomonoff induction is usually appealed to as one of the more promising candidates for a formalization of Bayesian epistemology that uses objective and specifically Occamian priors. I haven’t heard Solomonoff promoted as much outside LW, but other similar proposals do get thrown around by a lot of philosophers.
Of course Bayesianism isn’t a cure-all by itself, and I don’t think that’s controversial. It’s just that it seems useful in many fundamental issues of epistemology. But in any given domain outside of epistemology (such as causation or quantum mechanics), domain-relevant expertise is almost certainly more important. The question is more whether domain expertise plus Bayesianism is at all helpful, and I’d imagine it depends on the specific field. Certainly for fundamental physics it appears that Bayesianism is often viewed as at least somewhat useful (based on the conference linked by the OP and by a lot of other things I’ve seen quoted from professional physicists).
I don’t have any problem with Bayesian epistemology at all. You can have whatever epistemology you want.
What I do have a problem with is this “LW myopia” where people here think they have something important to tell to people like Ed Witten about how people like Ed Witten should be doing their business. This is basically insane, to me. This is strong evidence that the type of culture that gets produced here isn’t particularly sanity producing.
Solomonoff induction is useless to know about for anyone who has real work to do (let’s say with actual data, like physicists). What would people do with it?
In many cases I’d agree it’s pretty crazy, especially if you’re trying to go up against top scientists.
On the other hand, I’ve seen plenty of scientists and philosophers claim that their peers (or they themselves) could benefit from learning more about things like cognitive biases, statistics fallacies, philosophy of science, etc. I’ve even seen experts claim that a lot of their peers make elementary mistakes in these areas. So it’s not that crazy to think that by studying these subjects you can have some advantages over some scientists, at least in some respects.
Of course that doesn’t mean you can be sure that you have the advantage. As I said, probably in most cases domain expertise is more important.
Absolutely agree it is important for scientists to know about cognitive biases. Francis Bacon, the father of the empirical method, explicitly used cognitive biases (he called them “idols,” and even classified them) as a justification for why the method was needed.
I always said that Francis Bacon should be LW’s patron saint.
So it sounds like you’re only disagreeing with the OP in degree. You agree with the OP that a lot of scientists should be learning more about cognitive biases, better statistics, epistemology, etc., just as we are trying to do on LW. You’re just pointing out (I think) that the “informed laymen” of LW should have some humility because (a) in many cases (esp. for top scientists?) the scientists have indeed learned lots of rationality-relevant subject matter, perhaps more than most of us on LW, (b) domain expertise is usually more important than generic rationality, and (c) top scientists are very well educated and very smart.
Is that correct?
Yup!
edit: Although I should say LW “trying to learn better statistics” is too generous. There is a lot more “arguing on the internet” and a lot less “reading” happening.
I nominate Carneades, the inventor of the idea of degrees of certainty.
Harry J.E. Potter did receive Bacon’s diary as a gift from his DADA teacher, after all.
I think a more charitable read would go like this: being smarter doesn’t necessarily mean that you know everything there’s to know nor that you are more rational than other people. Since being rational or knowing about Bayesian epistemology is important in every field of science, physicists should be motivated to learn this stuff. I don’t think he was suggesting that French pastries are literally useful to them.
Well, LW was born as a forum about artificial intelligence. Solomonoff induction is like an ideal engine for generalized intelligence, which is very cool!
That’s unfortunate, but we cannot ask of anyone, even geniuses, to transcend their time. Leonardo da Vinci held some ridiculous beliefs, for our standars, just like Ramanujan or Einstein. With this I’m not implying that Jaynes was a genius of that caliber, I would ascribe that status more to Laplace. On the ‘bright’ side, in our time nobody knows how to reconcile epistemic probability and quantum causality :)
That seems to be a pretty big claim. Can you articulate why you believe it to be true?
As far as I am aware, Solomonoff induction describes the singularly correct way to do statistical inference in the limits of infinite compute. (It computes generalized/full Bayesian inference)
All of AI can be reduced to universal inference, so understanding how to do that optimally with infinite compute perhaps helps one think more clearly about how practical efficient inference algorithms can exploit various structural regularities to approximate the ideal using vastly less compute.
Because AIXI is the first complete mathematical model of a general AI and is based on Solomonoff induction.
Also, computable approximation to Solomonoff prior has been used to teach small AI to play videogames unsupervised.
So, yeah.
If you don’t consider Jaynes to be comtemporary, which author do you consider to be his successor that updated where Jaynes went wrong?
While Bretthorst is his immediate and obvious successor, unfortunately nobody that I know of has taken up the task to develop the field the way Jaynes did.
I am pretty sure jacob_connell specifically brought up Solomonoff induction. I am still waiting for him to explain why I (let alone Ed Witten) should care about this idea.
How do you know what is important in every field of science? Are you a scientist? Do you publish? Where is your confidence coming from, first principles?
Whether Solomonoff induction is cool or not is a matter of opinion (and “mathematical taste,”) but more to the point the claim seems to be it’s not only cool but vital for physicists to know about. I want to know why. It seems fully useless to me.
Jaynes died in 1997. Bayesian networks (the correct bit of math to explain what is going on with Bell inequalities) were written up in book form in 1988, and were known about in various special case forms long before that.
???
Well, yes of course. Cox’ theorem. Journals are starting to refute papers based on the “p<0.05” principle. Many studies in medicine and psychology cannot be replicated. Scientists are using inferior analysis methods when better are available just because they were not taught to.
I do say there’s a desperate need to divulge Bayesian thinking.
I wasn’t referring to that. Jaynes knew that quantum mechanics was incompatible with the epistemic view of probability, and from his writing, while never explicit, it’s clear that he was thinking about a hidden variables model.
Undisputable violation of the Bell inequalities were performed only this year. Causality was published in 2001. We still don’t know how to stitch epistemic probabilities and quantum causality.
What I’m saying is that the field was in motion when Jaynes died, and we still don’t know a large deal about it. As I said, we cannot ask anyone not to hold crazy ideas from time to time.
Datapoint: in [biological] systematics in its broadest sense, Bayesian methods are increasingly important (molecular evolution studies,...), but I’ve never heard about pure Bayesian epistemology being in demand. Maybe because we leave it all to our mathematicians.
Part of the issue I keep harping about is people keep confusing Bayes rule, Bayesian networks, Bayesian statistical inference, and Bayesian epistemology. I don’t have any issue with a thoughtful use of Bayesian statistical inference when it is appropriate—how could I?
My issue is people being confused, or people having delusions of grandeur.
Nah—I was just using that as an example of things physicists (regardless of IQ) don’t automatically know.
Most physicists were trained to think in terms of Popperian epistemology, which is strictly inferior to (dominated by) Bayesian epistemology (if you don’t believe that, it’s not worth my time to debate). In at least some problem domains, the difference in predictive capability between the two methodologies are becoming significant.
Physicists don’t automatically update their epistemologies, it isn’t something they are using to having to update.
Heh, ok. Thanks for your time!
Ok, so I lied, I’ll bite.
I equate “Bayesian epistemology” with a better approximation of universal inference. It’s easy to generate example environments where Bayesian agents dominate Popperian agents, while the converse is never true. Popperian agents completely fail to generalize well from small noisy datasets. When you have very limited evidence, popperian reliance on hard logical falsifiability just fails.
This shouldn’t even really be up for debate—do you actually believe the opposite position, or are you just trolling?
French pastries (preferably from a Japanese/Korean pastry shop) are better than Solomonoff induction—they are yummier.
Ha, but a robot programmed with a Solomonoff induction-like software will learn to do French pastries long before pastries will learn how to do Solomonoff induction!
French pastries correspond to a pretty long bit-string so you may have to wait for a very long time (and eat a lot of very bad possibly-pastries in the meantime :-P). A physicist can learn to make pastries much quicker.
It could be that the attitude/belief that theoretical physicists are far smarter than anyone else (and therefore, by implication, do not need to listen to anyone else) is part of the problem I’m outlining.
It could be, but I think theoretical physicists actually are very intelligent. Do you disagree?
edit: But let’s leave them aside, and talk about me, since I am actually here. I am not in the same league as Ed Witten, not even close. Do you (generic sense) have something sensible to communicate to me about how I go about my business?
When did you become a theoretical physicist?
I am not. But I do theory work, and some of it is even related to analyzing data (and I am actually here to have this conversation, whereas Ed is not). So—what do you have to teach me?
I dunno. I have PhD in engineering. In my graduate research and in my brief life as a practicing scientist, I used rationalist skills like “search for more hypotheses” and “think exclusively about the problem for five minutes before doing anything else” and generally leveraged LW-style thinking, that I didn’t learn in school, to be more successful and productive than I probably would have been otherwise. I could probably write a lengthy article about how I perceive LW to have helped me in my life, but I know that it would seem extremely post hoc and you could also probably say that the skills I’m using are not unique to LW. All I can say is that the core insight that formed the crux of my dissertation arose because I was using a very LW-style approach to analyzing a problem.
The thing about rationalist skills is that LW does not any cannot have a monopoly on them. In fact, the valuable function of LW (at least in the past) has been to both aggregate and sort through potentially actionable strategic directives and algorithms.
What’s interesting to me is that school doesn’t do that at all. I got through however-many years of schooling and earned a PhD without once taking a class about Science, about how to actually do it, about what the process of Science is. I absorbed some habits from advisers and mentors, that’s about it. The only place that I even know of where people talk at length about the inner operations of mind that correspond to the outer reality where one observes discoveries being made is Less Wrong.
And if you’re an entrepreneur and don’t care about science, then Less Wrong is also one of a few places where people talk at length about how to marshal your crappy human brain and coax it to working productively on tasks that you have deliberately and strategically chosen.
One problem is that I’m probably thinking of the Less Wrong of four years ago rather than the Less Wrong of today. In any case, all those old posts that I found so much value in are still there.
I feel like this is an important point that goes a long way to give one the intellectual / social humility IlyaShpitser is pointing at, and I agree completely that the value of LW as a site/community/etc. is primarily in sorting and aggregating. (It’s the people that do the creating or transferring.)
You are correct in that surveys of IQ and other intelligence scores consistently show physicists having some of the highest. But mathematics, statistics, computer science, and engineering are the same, and most studies I’ve seen generally see very little, if any, significant difference in intelligence scores between these fields.
‘Rationalist’ isn’t a field or specialization, it’s defined more along the lines of refining and improving rational thinking. Based on the lesswrong survey, fields like mathematics and computer science are heavily represented here. There are actually more physicists (4.3%) than philosophers (2.4%). If this is inconsistent with your perception of the community, update your prior.
From all of this it is safe to assume that the average LW’er is ‘very smart’, and that LW contains a mini-community of rationalist scientists. One data point: Me. I have a PhD in engineering and I’m a practising scientist. Maybe I should have phrased my initial comment as: “It might be better if the intersection of rationalists and scientists were larger.”
While 4.3% of LW people are physicists the reverse isn’t true.
If only smart people were automatically bias free...
Could you expand on this further? I’m not sure I understand your argument. Also, intellectual humility or social humility?
Re: your last question: yes.
(a) It is very difficult to perceive qualitative differences for people 1 sigma+ above “you” (for any value of “you”), but it is enormous.
(b) How much “science process” does this community actually understand? How many are practicing scientists, as in publish real stuff in journals?
The outside view worry is there might be a bit of a “twenty something knowitall” going on. You read some stuff, and liked it. That’s great! If the stuff isn’t universally adopted by very smart folks, there are probably very good reasons for that! Read more!
My argument boils down to: “no, really, very smart people are actually very smart.”
The median IQ at LessWrong is 139, the average Nobel laureate is reputed to have an IQ of 145. Presumably that means many people at LessWrong are in a position to understand the reasoning of Nobel laureates, at least.
The gap between the average Nobel laureate (in physics, say) and the average LWer is enormous. If your measure says it isn’t, it’s a crappy measure.
I calculate about 128 for the average IQ of a survey respondent who provides one and I suspect that nonresponse means the actual average is closer to 124 or so. (Thus I agree with you that there is a significant gap between the average Nobel laureate and the average LWer.)
I think the right way to look at LW’s intellectual endowment is that it’s very similar to a top technical college, like Harvey Mudd. There are a handful of professor/postdoc/TA types running around, but as a whole the group skews very young (graph here, 40 is 90th percentile) and so even when people are extraordinarily clever they don’t necessarily have the accomplishments or the breadth for that to be obvious. (And because of how IQ distributions work, especially truncated ones with a threshold, we should expect most people to be close to the threshold.)
I agree with this. I think looking at a typical LWer as a typical undergrad at Harvey Mudd is a good model. (This is not a slur, btw, Harvey Mudd is great).
I was confused for a moment.
What makes you so confident that your model is correct, instead of the data disproving it?
No sarcasm, it’s a honest question.
I look at the IQ results for the survey every year. A selected handful of comments:
Karma vs. multiple IQ tests: positive correlation (.45) between self-report and Raven’s for users with positive karma, negative correlation (-.11) between self-report and Raven’s for users without positive karma.
SATs are very high: 96th percentile in the general population is lower quartile here. (First place I make the Harvey Mudd comparison.)
SAT self-report vs. IQ self-report: average SAT, depending on which one you look at and how you correct it, suggests that the average LWer is somewhere between 98th and 99.5th percentile. (IQ self-report average is above 99.5th percentile, and so I call the first “very high” and the second “extremely high.”)
I’ve interacted with a handful of Nobel laureates, I’m familiar with the professors and students at two top 15 graduate physics programs, and I’ve interacted with a bunch of LWers. LW as whole seems roughly comparable to a undergraduate physics department, active LWers roughly comparable to a graduate physics department, and there are top LWers at the level of the Nobel laureates (but aging means the ~60 year old Nobel laureates are not a fair comparison to the ~30 year old top LWers, and this is selecting just the math genius types from the top LWers, not the most popular top LWers). Recall Marcello comparing Conway and Yudkowsky.
That last link is kinda cringeworthy.
I just spent the last minute or so trying to figure out what you didn’t like about my percentile comparisons. ;)
The underlying subject is often painful to discuss, so even handled well there will be things to cringe about.
I don’t know if you knew that my question was directed at IlyaShpitser and not at you… I do not doubt your data.
Because I hung out with some top academic people, I know what actual genius is like.
Incidentally, when I talk about people being “very smart” I don’t mean “as measured by IQ.” As I mentioned lots of times before, I think IQ is a very poor measure of math smarts, and a very poor measure of generalized smarts at the top end. Intelligence is too heterogeneous, and too high dimensional. But there is such as thing as being “very smart,” it’s just a multidimensional thing.
So in this case, I just don’t think there is a lot of info in the data. I much prefer looking at what people have done as a proxy for their smarts. “If you are so smart, where are all your revolutionary papers?” This also correctly adjusts for people who actually are very smart, but who bury their talents (and so their hypothetical smarts are not super interesting to talk about).
I’ve already had this discussion with someone else, about another topic: I pointed out that statistically, lottery winners end up not happier than they were before winning. He said that he knew how to spend them well to be effectively much happier.
In our discussion, you have some insight that from my perspective are biased, but from your point of view are not. Unfortunately, your data rely on uncommunicable evidence, so we should just disagree and call it a day.
Lottery winners do end up happier.
Thanks, I updated!
Well, you don’t have to agree if you don’t want.
But the situation is not as hopeless as it seems. Try to find some people at the top of their game, and hang out with them for a bit. Honestly, if you think “Mr. Average Less Wrong” and Ed Witten are playing in the same stadium, you are being a bit myopic. But this is the kind of thing more info can help with. You say you can’t use my info (and don’t want to take my word for it), but you can generate your own if you care.
Will do! We’ll see how it pans out :D
Is there a reliable source for this?
[1] is one source. Its method is: “Jewish IQ is distributed like American-of-European-ancestry IQ, but a standard deviation higher. If you look at the population above a certain IQ threshold, you see a higher fraction of Jews than in the normal population. If you use the threshold of 139, you see 27% Jews, which is the fraction of Jews who are Nobel laureates. So let’s assume that Nobel laureate IQ is distributed like AOEA IQ after you cut off everyone with IQ below 139. It follows that Nobel laureates have an average IQ of 144.”
I hope you’ll agree that this seems dubious.
[2] agrees that it’s dubious, and tries to calculate it a different way (still based on fraction of Jews), and gets 136. (It’s only reported by field, but it would be the same as chemistry and literature, because they’re both 27% Jews.) It gets that number by doing a bunch of multiplications which I suspect are the wrong multiplications to do. (Apparently, if IQ tests had less g loading, and if self-identified ethnicity correlated less with ancestry, then the g loading of Jewishness would go up?) But even if the calculations do what they’re supposed to, it feels like a long chain of strong assumptions and noisy data, and this method seems about equally dubious to me.
What gets me more is the guy who was complaining that the atomic theory is left in the same framework with 1-epsilon probability.
No, this is not a problem.
I tried to get a discussion going on this exact subject in my post this week, but there seemed to be little interest. A major weakness of the standard Bayesian inference method is that it assumes a problem only has two possible solutions. Many problems involve many possible solutions, and many times the number of possible solutions is unknown, and in many cases the correct solution hasn’t been thought of yet. In such instances, confirmation through inductive inference may not be the best way of looking at the problem.
Where did you get this from? Maintaining beliefs over an entire space of possible solutions is a strength of the Bayesian approach. Please don’t talk about Bayesian inference after reading a single thing about updating beliefs on whether a coin is fair or not. That’s just a simple tutorial example.
If I have 3 options, A, B, and C, and I’m 40% certain the best option is A, 30% certain the best option is B, and 30% certain the best option is C, would it be correct to say that I’ve confirmed option A instead of say my best evidence suggests A? This can sort of be corrected for with the standard Bayesian confirmation model, but the problem becomes larger as the number of possibilities increases to the point where you can’t get a good read on your own certainty, or to the point where the number of possibilities is unknown.
I don’t understand your question. Is this about maintaining beliefs over hypotheses or decision-making?
I’m arguing that Bayesian confirmation theory as a philosophy was originally conceived as a model using only two possibilities (A and ~A), and then this model was extrapolated into problems with more than two possibilities. If it had been originally conceived using more than two possibilities, it wouldn’t have made any sense to use the word confirmation. So explanations of Bayesian confirmation theory will often entail considering theories or decisions in isolation rather than as part of a group of decisions or theories.
So if there are 20 possible explanations for a problem, and there is no strong evidence suggesting any one explanation, then I will have 5% certainty of the average explanation. Unless I am extremely good at calibration, then I can’t confirm any of them, and if I consider each explanation in isolation from the other explanations, then all of them are wrong.
It doesn’t matter whether we’re talking about hypotheses or decision-making.
I’m not sure whether this is true, but it’s irrelevant. Bayesian confirmation theory works just fine with any number of hypotheses.
If by “confirm” you mean “assign high probability to, without further evidence”, yes. That seems to me to be exactly what you’d want. What is the problem you see here?
You sound confused. The “confirmation” stems from
(source)
So what if p(H) = 1, p(H|A) = .4, p(H|B) = .3, and p(H|C) = .3? The evidence would suggest all are wrong. But I have also determined that A, B, and C are the only possible explanations for H. Clearly there is something wrong with my measurement, but I have no method of correcting for this problem.
H is Hypothesis. You have three: HA, HB, and HC. Let’s say your prior is that they are equally probable, so the unconditional P(HA) = P(HB) = P(HC) = 0.33
Let’s also say you saw some evidence E and your posteriors are P(HA|E) = 0.4, P(HB|E) = 0.3, P(HC|E) = 0.3. This means that evidence E confirms HA because P(HA|E) > P(HA). This does not mean that you are required to believe that HA is true or bet your life’s savings on it.
That’s a really good explanation of part of the problem I was getting at. But that requires considering the three hypotheses as a group rather than in isolation from all other hypotheses to calculate 0.33.
No, it does not.
Let’s say you have a hypothesis HZ. You have a prior for it, say P(HZ) = 0.2 which means that you think that there is a 20% probability that HZ is true and 80% probability that something else is true. Then you see evidence E and it so happens that the posterior for HZ becomes 0.25, so P(HZ|E) = 0.25. This means that evidence E confirmed hypothesis HZ and that statement requires nothing from whatever other hypotheses HA,B,C,D,E,etc. might there be.
How would you calculate that prior of 0.2? In my original example, my prior was 1, and then you transformed it into 0.33 by dividing by the number of possible hypotheses. You wouldn’t be able to do that without taking the other two possibilities into account. As I said, the issue can be corrected for if the number of hypotheses is known, but not if the number of possibilities is unknown. However, frequently philosophical theories of bayesian confirmation theory don’t consider this problem. From this paper by Morey, Romeijn, and Rouder:
You need to read up on basic Bayesianism.
Priors are always for a specific hypothesis. If your prior is 1, this means you believe this hypothesis unconditionally and no evidence can make you stop believing it.
You are talking about the requirement that all mutually exclusive probabilities must sum to 1. That’s just a property of probabilities and has nothing to do with Bayes.
Yes, it can. To your “known” hypotheses you just add one more which is “something else”.
Really, just go read. You are confused because you misunderstand the basics. Stop with the philosophy and just figure out how the math works.
I’m not arguing with the math; I’m arguing with how the philosophy is often applied. Consider the condition where my prior is greater than my evidence for all choices I’ve looked at, the number of possibilities is unknown, but I still need to make a decision about the problem? As the paper I was originally referencing mentioned, what if all options are false?
You are not arguing, you’re just being incoherent. For example,
...that sentence does not make any sense.
Then the option “something else” is true.
But you can’t pick something else; you have to make a decision
What does “have to make a decision” mean when “all options are false”?
Are you thinking about the situation when you have, say, 10 alternatives with the probabilities of 10% each except for two, one at 11% and one at 9%? None of them are “true” or “false”, you don’t know that. What you probably mean is that even the best option, the 11% alternative, is more likely to be false than true. Yes, but so what? If you have to pick one, you pick the RELATIVE best and if its probability doesn’t cross the 50% threshold, well, them’s the breaks.
Yes that is exactly what I’m getting at. It doesn’t seem reasonable to say you’ve confirmed the 11% alternative. But then there’s another problem, what if you have to make this decision multiple times? Do you throw out the other alternatives and only focus on the 11%? That would lead to status quo bias. So you have to keep the other alternatives in mind, but what do you do with them? Would you then say you’ve confirmed those other alternatives? This is where the necessity of something like falsification comes into play. You’ve got to continue analyzing multiple options as new evidence comes in, but trying to analyze all the alternatives is too difficult, so you need a way to throw out certain alternatives, but you never actually confirm any of them. These problems come up all the time in day to day decision making such as deciding on what’s for dinner tonight.
In the context of the Bayesian confirmation theory, it’s not you who “confirms” the hypothesis. It’s evidence which confirms some hypothesis and that happens at the prior → posterior stage. Once you’re dealing with posteriors, all the confirmation has already been done.
Do you get any evidence to update your posteriors? Is there any benefit to picking different alternatives? If no and no, then sure, you repeat your decision.
No, it would not. That’s not what the status quo bias is.
You keep on using words without understanding their meaning. This is a really bad habit.
When I say throw out I’m talking about halting tests, not changing the decision.
If your problem is which tests to run, then you’re in the experimental design world. Crudely speaking, you want to rank your available tests by how much information they will give you and then do those which have high expected information and discard those which have low expected information.
True.
All you have to do is not simultaneously use “confirm” to mean both “increase the probability of” and “assign high probability to”.
As for throwing out unlikely possibilities to save on computation: that (or some other shortcut) is sometimes necessary but it’s an entirely separate matter from Bayesian confirmation theory or indeed Popperian falsificationism. (Popper just says to rule things out when you’ve disproved them. In your example, you have a bunch of things near to 10% and Popper gives you no licence to throw any of them out.
Yes, sorry. I’m considering multiple sources which I recognize the rest of you haven’t read, and trying to translate them into short comments which I’m probably not the best person to do so, so I recognize the problem I’m talking about may come out a bit garbled, but I think the quote from the Morey et al. paper I quoted above describes the problem the best.
You see how Morey et al call the position they’re criticizing “Overconfident Bayesianism”? That’s because they’re contrasting it with another way of doing Bayesianism, about which they say “we suspect that most Bayesians adhere to a similar philosophy”. They explicitly say that what they’re advocating is a variety of Bayesian confirmation theory.
The part about deduction from the Morey et al. paper:
Who has been claiming that Bayesian confirmation theory is a tool for generating models?
(It can kinda-sorta be used that way if you have a separate process that generates all possible models, hence the popularity of Solomonoff induction around here. But that’s computationally intractable.)
As stated in my original comment, confirmation is only half the problem to be considered. The other half is inductive inference which is what many people mean when they refer to Bayesian inference. I’m not saying one way is clearly right and the other wrong, but that this is a difficult problem to which the standard solution may not be best.
You’d have to read the Andrew Gelman paper they’re responding to to see a criticism of confirmation.
You don’t need to know the number, you need to know the model (which could have infinite hypotheses in it).
Your model (hypothesis set) could be specified by an infinite number of parameters, say “all possible means and variances of a Gaussian.” You can have a prior on this space, which is a density. You update the density with evidence to get a new density. This is Bayesian stats 101. Why not just go read about it? Bishop’s machine learning book is good.
True, but working from a model is not an inductive method, so it can’t be classified as confirmation through inductive inference which is what I’m criticizing.
You are severely confused about the basics. Please unconfuse yourself before getting to the criticism stage.
??? IlyaShpitser if I understand correctly is talking about creating a model of a prior, collecting evidence, and then determining whether the model is true or false. That’s hypothesis testing, which is deduction; not induction.
You don’t understand.
You have a (possibly infinite) set of hypotheses. You maintain beliefs about this set. As you get more data, your beliefs change. To maintain beliefs you need a distribution/density. To do that you need a model (a model is just a set of densities you consider). You may have a flexible model and let the data decide how flexible you want to be (non-parametric Bayes stuff, I don’t know too much about it), but there’s still a model.
Suggesting for the third and final time to get off the internet argument train and go read a book about Bayesian inference.
Oh, sorry I misunderstood your argument. That’s an interesting solution.
That interesting solution is exactly what people doing Bayesian inference do. Any criticism you may have that doesn’t apply to what Ilya describes isn’t a criticism of Bayesian inference.
As much as I hate to do it, I am going to have to agree with Lumifer, you sound confused. Go read Bishop.
Not really. A hypothesis’s prior probability comes from the total of all of your knowledge; in order to determine that P(HA)=0.33 Lumifer needed the additional facts that there were three possibilities that were all equally likely.
It works just as well if I say that my prior is P(HA)=0.5, without any exhaustive enumeration of the other possibilities. Then evidence E confirms HA if P(HA|E)>P(HA).
(One should be suspicious that my prior probability assessment is a good one if I haven’t accounted for all the probability mass, but the mechanisms still work.)
Which is one of the other problems I was getting at
If you start with inconsistent assumptions, you get inconsistent conclusions. If you believe P(H)=1, P(A&B&C)=1, and P(H|A) etc. are all <1, then you have already made a mistake. Why are you blaming this on Bayesian confirmation theory?
You are confused. If p(H) = 1, p(H, anything) = 1 or 0, so p(H | anything) = 1 or 0, if p(anything) > 0.
Wait, how would you get P(H) = 1?
Fine. p(H) = 0.5, p(H|A) = 0.2, p(H|B) = 0.15, p(H|C) = 0.15 It’s not really relevant to the problem.
The relevance is that it’s a really weird way to set up a problem. If P(H)=1 and P(H|A)=0.4 then it is necessarily the case that P(A)=0. If that’s not immediately obvious to you, you may want to come back to this topic after sleeping on it.
Fair enough.
\sum_i p(H|i) need not add up to p(H) (or indeed to 1).
No, it doesn’t.
Edit—I’m agreeing with you. Sorry if that wasn’t clear.
This is not true at all.
A large chunk of academics would say that it is. For example, from the paper I was referencing in my post:
That doesn’t at all say Bayesian reasoning assumes only two possibilities. It says Bayesian reasoning assumes you know what all the possibilities are.
True, but how often do you see an explanation of Bayesian reasoning in philosophy that uses more than two possibilities?
. . .
This is a weird sentence to me. I learned about Bayesian inference through Jaynes’ book and surely it doesn’t portray that inference as having only two possible solutions.
The other book I know about, Sivia’s, doesn’t do this either.
You’re referring to how it is described in statistics textbooks. I’m talking about confirmation theory as a philosophy.