My rule has to do with paradigm shifts—yes, I do believe in them. I’ve been through a few myself. It is useful if you want to be the first on your block to know that the shift has taken place. I formulated the rule in 1974. I was visiting the Stanford Linear Accelerator Center (SLAC) for a weeks to give a couple of seminars on particle physics. The subject was QCD. It doesn’t matter what this stands for. The point is that it was a new theory of sub-nuclear particles and it was absolutely clear that it was the right theory. There was no critical experiment but the place was littered with smoking guns. Anyway, at the end of my first lecture I took a poll of the audience. “What probability would you assign to the proposition ‘QCD is the right theory of hadrons.’?” My socks were knocked off by the answers. They ranged from .01 percent to 5 percent. As I said, by this time it was a clear no-brainer. The answer should have been close to 100 percent.
The next day I gave my second seminar and took another poll. “What are you working on?” was the question. Answers: QCD, QCD, QCD, QCD, QCD,.....… Everyone was working on QCD. That’s when I learned to ask “What are you doing?” instead of “what do you think?”
I saw exactly the same phenomenon more recently when I was working on black holes. This time it was after a string theory seminar, I think in Santa Barbara. I asked the audience to vote whether they agreed with me and Gerard ’t Hooft or if they thought Hawking’s ideas were correct. This time I got a 50-50 response. By this time I knew what was going on so I wasn’t so surprised. Anyway I later asked if anyone was working on Hawking’s theory of information loss. Not a single hand went up. Don’t ask what they think. Ask what they do.
Not necessarily a great metric; working on the second-most-probable theory can be the best rational decision if the expected value of working on the most probable theory is lower due to greater cost or lower reward.
This is why many scientists are terrible philosophers of science. Not all of them, of course; Einstein was one remarkable exception. But it seems like many scientists have views of science (e.g. astonishingly naive versions of Popperianism) which completely fail to fit their own practice.
Yes. When chatting with scientists I have to intentionally remind myself that my prior should be on them being Popperian rather than Bayesian. When I forget to do this, I am momentarily surprised when I first hear them say something straightforwardly anti-Bayesian.
I see. I doubt that it is as simple as naive Popperianism, however. Scientists routinely construct and screen hypotheses based on multiple factors, and they are quite good at it, compared to the general population. However, as you pointed out, many do not use or even have the language to express their rejection in a Bayesian way, as “I have estimated the probability of this hypothesis being true, and it is too low to care.” I suspect that they instinctively map intelligence explosion into the Pascal mugging reference class, together with perpetual motion, cold fusion and religion, but verbalize it in the standard Popperian language instead. After all, that is how they would explain why they don’t pay attention to (someone else’s) religion: there is no way to falsify it. I suspect that any further discussion tends to reveal a more sensible approach.
Yeah. The problem is that most scientists seem to still be taught from textbooks that use a Popperian paradigm, or at least Popperian language, and they aren’t necessarily taught probability theory very thoroughly, they’re used to publishing papers that use p-value science even though they kinda know it’s wrong, etc.
So maybe if we had an extended discussion about philosophy of science, they’d retract their Popperian statements and reformulate them to say something kinda related but less wrong. Maybe they’re just sloppy with their philosophy of science when talking about subjects they don’t put much credence in.
This does make it difficult to measure the degree to which, as Eliezer puts it, “the world is mad.” Maybe the world looks mad when you take scientists’ dinner party statements at face value, but looks less mad when you watch them try to solve problems they care about. On the other hand, even when looking at work they seem to care about, it often doesn’t look like scientists know the basics of philosophy of science. Then again, maybe it’s just an incentives problem. E.g. maybe the scientist’s field basically requires you to publish with p-values, even if the scientists themselves are secretly Bayesians.
The problem is that most scientists seem to still be taught from textbooks that use a Popperian paradigm, or at least Popperian language
I’m willing to bet most scientists aren’t taught these things formally at all. I never was. You pick it up out of the cultural zeitgeist, and you develop a cultural jargon. And then sometimes people who HAVE formally studied philosophy of science try to map that jargon back to formal concepts, and I’m not sure the mapping is that accurate.
they’re used to publishing papers that use p-value science even though they kinda know it’s wrong, etc.
I think ‘wrong’ is too strong here. Its good for some things, bad for others. Look at particle-accelerator experiments- frequentist statistics are the obvious choice because the collider essentially runs the same experiment 600 million times every second, and p-values work well to separate signal from a null-hypothesis of ‘just background’.
If there was a genuine philosophy of science illumination it would be clear that, despite the shortcomings of the logical empiricist setting in which Popper found himself , there is much more of value in a sophisticated Popperian methodological falsificationism than in Bayesianism. If scientists were interested in the most probable hypotheses, they would stay as close to the data as possible. But in fact they want interesting, informative, risky theories and genuine explanations. This goes against the Bayesian probabilist ideal. Moreover, you cannot falsify with Bayes theorem, so you’d have to start out with an exhaustive set of hypotheses that could account for data (already silly), and then you’d never get rid of them—they could only be probabilistically disconfirmed.
Strictly speaking, one can’t falsify with any method outside of deductive logic—even your own Severity Principle only claims to warrant hypotheses, not falsify their negations. Bayesian statistical analysis is just the same in this regard.
A Bayesian analysis doesn’t need to start with an exhaustive set of hypotheses to justify discarding some of them. Suppose we have a set of mutually exclusive but not exhaustive hypotheses. The posterior probability of an hypothesis under the assumption that the set is exhaustive is an upper bound for its posterior probability in an analysis with an expanded set of hypotheses. A more complete set can only make a hypotheses less likely, so if its posterior probability is already so low that it would have a negligible effect on subsequent calculations, it can safely be discarded.
But in fact they want interesting, informative, risky theories and genuine explanations. This goes against the Bayesian probabilist ideal.
I’m a Bayesian probabilist, and it doesn’t go against my ideal. I think you’re attacking philosophical subjective Bayesianism, but I don’t think that’s the kind of Bayesianism to which lukeprog is referring.
For what it’s worth, I understand well the arguments in favor of Bayes, yet I don’t think that scientific results should be published in a Bayesian manner. This is not to say that I don’t think that frequentist statistics is frequently and grossly mis-used by many scientists, but I don’t think Bayes is the solution to this. In fact, many of the problems with how statistics is used, such as implicitly performing many multiple comparisons without controlling for this, would be just as large of problems with Bayesian statistics.
Either the evidence is strong enough to overwhelm any reasonable prior, in which case frequentist statistics wlil detect the result just fine; or else the evidence is not so strong, in which case you are reduced to arguing about priors, which seems bad if the goal is to create a societal construct that reliable uncovers useful new truths.
The p-value is “the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.” It is often misinterpreted, e.g. by 68 out of 70 academic psychologists studied by Oakes (1986, pp. 79-82).
The Bayes factor differs in many ways from a P value. First, the Bayes factor is not a probability itself but a ratio of probabilities, and it can vary from zero to infinity. It requires two hypotheses, making it clear that for evidence to be against the null hypothesis, it must be for some alternative. Second, the Bayes factor depends on the probability of the observed data alone, not including unobserved “long run” results that are part of the P value calculation. Thus, factors unrelated to the data that affect the P value, such as why an experiment was stopped, do not affect the Bayes factor...
I wasn’t saying it was the same, my point is that reporting the data on which one can update in Bayesian manner is the norm. (As is updating, e.g. if the null hypothesis is really plausible, at p<0.05 nobody’s really going to believe you anyway)
With regards to the Bayes factor. The issue is that there is a whole continuum of alternate hypotheses. There’s no single factor between those that you can report on which could be used for combining evidence in favour of quantitatively different alternative “most supported” hypotheses. The case of the null hypothesis (vs all possible other hypotheses) is special in that regard, and so that is what a number is reported for.
With regards to the case of the ratio between evidence for two point hypotheses, as discussed in the article you link: Neyman-Pearson lemma is quite old.
With regards to the cause of experiment termination, you have to account somewhere for the fact that termination of the experiment has the potential to cherry pick and thus bias the resulting data (if that is what he’s talking about, because its not clear to me what is his point and it seems to me that he misunderstood the issue).
Furthermore, the relevant mathematics probably originates from the particle physics, where it serves a different role: a threshold on the p-value is here to quantify the worst-case likelihood that your experimental apparatus will be sending people on the wild goose chase. It has more to do with the value of the experiment than probabilities, given that priors for hypotheses in physics would require a well defined hypotheses space (which is absent). And given that the work on the production of stronger evidence is a more effective way to spend your time there than any debating of the priors. And given that the p-value related issues in any case can be utterly dwarfed by systematic errors and problems with the experimental set up, something the probability of which changes after the publication as other physicists do or do not point towards potential problems in the set up.
A side note: there’s a value of information issue here. I know that if I were to discuss Christian theology with you (not the atheism, but the fine points of the life of Jesus, that sort of thing, which I never really had time or inclination to look into), the expected value of information to you would be quite low. Because most of the time that I spent practising mathematics and such, you spent on the former. It would be especially the case if you entered some sort of very popular contest in any way requiring theological knowledge, and scored #10th of all time on a metric that someone else seen fit to chose in advance. The same goes for discussions of mathematics, but the other way around. This is also the case for any experts you are talking to. They’re rather rational people, that’s how they got to have impressive accomplishments, and a lot of practical rationality is about ignoring low expected value pursuits. Einsteins and Fermis of this world do not get to accomplish so much on so many different occasions without great innate abilities for that kind of thing. They also hold teaching positions and it is more productive for them to correct misconceptions in the eager students who are up to speed on the fundamental knowledge.
With regards to the Bayes factor. The issue is that there is a whole continuum of alternate hypotheses. There’s no single factor between those that you can report on which could be used for combining evidence in favour of quantitatively different alternative “most supported” hypotheses. The case of the null hypothesis (vs all possible other hypotheses) is special in that regard, and so that is what a number is reported for.
Mmm. I’ve read a lot of dumb papers where they show that their model beats a totally stupid model, rather than that their model beats the best model in the literature. In algorithm design fields, you generally need to publish a use case where your implementation of your new algorithm beats your implementation of the best other algorithms for that problem in the field (which is still gameable, because you implement both algorithms, but harder).
Thinking about the academic controversy I learned about most recently, it seems like if authors had to say “this evidence is n:1 support for our hypothesis over the hypothesis proposed in X” instead of “the evidence is n:1 support for our hypothesis over there being nothing going on” they would have a much harder time writing papers that don’t advance the literature, and you might see more scientists being convinced of other hypotheses because they have to implement them personally.
In physics a new theory has to be supported over the other theories, for example. What you’re talking about would have to be something that happens in sciences that primarily find weak effects in the noise and co-founders anyway, i.e. psychology, sociology, and the like.
I think you need to specifically mention what fields you are talking about, because not everyone knows that issues differ between fields.
With regards to malemployment debate you link, there’s a possibility that many of the college graduates have not actually learned anything that they could utilize, in the first place, and consequently there exist nothing worth describing as ‘malemployment’. Is that the alternate model you are thinking of?
What you’re talking about would have to be something that happens in sciences that primarily find weak effects in the noise and co-founders anyway, i.e. psychology, sociology, and the like.
Most of the examples I can think of come from those fields. There are a few papers in harder sciences which people in the field don’t take seriously because they don’t address the other prominent theories, but which people outside of the field think look serious because they’re not aware that the paper ignores other theories.
With regards to malemployment debate you link, there’s a possibility that many of the college graduates have not actually learned anything that they could utilize, in the first place, and consequently there exist nothing worth describing as ‘malemployment’. Is that the alternate model you are thinking of?
I was thinking mostly that it looked like the two authors were talking past one another. Group A says “hey, there’s heterogeneity in wages which is predicted by malemployment” whereas Group B says “but average wages are high, so there can’t be malemployment,” which ignores the heterogeneity. I do think that a signalling model of education (students have different levels of talent, and more talented students tend to go for more education, but education has little direct effect on talent) explains the heterogeneity and the wage differentials, and it would be nice to see both groups address that as well.
Once again, which education? Clearly, a training course for, say, a truck driver, is not signalling, but exactly what it says on the can: a training course for driving trucks. A language course, likewise so. Same goes for mathematics, hard sciences, and engineering disciplines. Which may perhaps be likened to necessity of training for a formula 1 driver, irrespective of the level of innate talent (within the human range of ability).
Now, if that was within the realm of actual science, something like this “signalling model of education” would be immediately invalidated by the truck driving example. No excuses. One can mend it into a “signalling model of some components of education in soft sciences”. Where there’s a big problem for “signalling” model: a PhD in those fields in particular is a poorer indicator of ability, innate and learned, than in technical fields (lower average IQs, etc), and signals very little.
edit: by the way, the innate ‘talent’ is not in any way exclusive of importance of learning; some recent research indicates that highly intelligent individuals retain neuroplasticity for longer time, which lets them acquire more skills. Which would by the way explain why child prodigies fairly often become very mediocre adults, especially whenever lack of learning is involved.
Clearly, a training course for, say, a truck driver, is not signalling, but exactly what it says on the can
If there was a glut of trained truck drivers on the market and someone needed to recruit new crane operators, they could choose to recruit only truck drivers because having passed the truck driving course would signal that you can learn to operate heavy machinery reliably, even if nothing you learned in the truck driving course was of any value in operating cranes.
OSHA rules would still require that the crane operator passes the crane related training.
The term ‘signalling’ seem to have heavily drifted and mutated online to near meaninglessness.
If someone attends a truck driving course with the intention of driving trucks—or a math course with the intention of a: learning math and b: improving their thinking skills—that’s not signalling behaviour.
And conversely, if someone wants to demonstrate some innate or pre-existing quality (such as mathematical ability), they participate in a relevant contest, and this is signalling.
Now, there may well be a lot of people who start in an educated family and then sort of drift through the life conforming to parental wishes, and end up obtaining, say, a physics PhD. And then they go to economics or something similar where they do not utilize their training in much any way. One could deduce about these people that they are more innately intelligent than average, more wealthy than average, etc etc, and that they learned some thinking skills. The former two things are much more reliably signalled with an IQ test and a statement from IRS.
The term ‘signalling’ seem to have heavily drifted and mutated online to near meaninglessness.
I think it’s more that the concept entered LessWrong via Robin Hanson’s expansion of the concept into his “Homo Hypocritus” theory. For examples, see every post on Overcoming Bias with a title of the form “X is not about Y”. This theory sees all communicative acts as signalling, that is to say, undertaken with the purpose of persuading someone that one possesses some desirable characteristic. To pass a mathematics test is just as much a signal of mathematical ability as to hang out with mathematicians and adopt their jargon.
There is something that distinguishes actual performance from other signals of ability: unforgeability. By doing something that only a mathematician could do, one sends a more effective signal—that is, one more likely to be believed—that one can do mathematics.
This is a radical re-understanding of communication. On this view, not one honest thing has ever been said, not one honest thing done, by anyone, ever. “Honesty” is not a part of how brains physically work. Whether we tell a truth or tell a lie, truth-telling is never part of our purpose, but no more than a means to the end of persuading people of our qualities. It is to be selected as a means only so far as it may happen to be more effective than the alternatives in a given situation. The concept of honesty as a virtue is merely part of such signalling.
The purpose of signalling desirable qualities is to acquire status, the universal currency of social interaction. Status is what determines your access to reproduction. Those who signal status best get to reproduce most. This is what our brains have evolved to do, for as long as they been able to do this at all. Every creature bigger than an earthworm does it.
Furthermore, all acts are communicative acts, and therefore all acts are signalling. Everything we do in front of someone else is signalling. Even in solitude we are signalling to ourselves, for we can more effectively utter false signals if we believe them. Every thought that goes through your head is self-signalling, including whatever thoughts you have while reading this. It’s all signalling.
...which means that describing an act as “signalling” is basically meaningless, insofar as it fails to ascribe to that act a property that distinguishes it from other acts. It’s like describing my lunch as “material”. True, yes, but uninteresting except as a launching point to distinguish among expensive and cheap signals, forgeable and unforgeable signals, purely external signals and self-signalling, etc.
That said, in most contexts when a behavior is described as “signalling” without further qualification I generally understand the speaker to be referring more specifically to cheap signalling which is reliably treated as though it were a more expensive signal. Hanging out with mathematicians and adopting their jargon without really understanding it usually falls in this category; completing a PhD program in mathematics usually doesn’t (though I could construct contrived exceptions in both cases).
...which means that describing an act as “signalling” is basically meaningless, insofar as it fails to ascribe to that act a property that distinguishes it from other acts.
That a proposition has the form “every X is a Y” does not make it uninteresting. For example: All matter is made of atoms. All humans are descended from apes. God made everything. Every prime number is a sum of three squares. Everyone requires oxygen to live. True or false, these are all meaningful, interesting statements. “All acts are acts of signalling” is similarly so.
That said, in most contexts when a behavior is described as “signalling” without further qualification I generally understand the speaker to be referring more specifically to cheap signalling which is reliably treated as though it were a more expensive signal.
Yes, this subtext is present whenever the concept of signalling is introduced (another example of an “all X is Y” which is nevertheless a meaningful observation).
“Signalling”, in this context, means “acts undertaken primarily for the purpose of gaining status by persuading others of one’s status-worthy qualities”. As such, “All communication is signalling” is not a tautology, but an empirical claim.
acts undertaken primarily for the purpose of gaining status by persuading others of one’s status-worthy qualities
Many of those acts can be undertaken without having any such qualities, though.
I think Hanson’s ideas are far more applicable to Hanson’s own personal behaviour than to the world in general.
In particular, what he’s trying to do with his “Signalling theory” is not to tell us anything about human behaviour, but instead to try to imply that he is neglecting necessity for actual training, which would be consistent with him having some immense innate abilities but not trying hard.
Meanwhile out there in the real world, if you specifically want to get a job that requires you to speak Chinese, you are going to have to attend a course in Chinese, to actually learn Chinese. Unless you are actually native Chinese in which case you won’t have to attend that course. Which applies to most disciplines, with perhaps other disciplines for which the skill may not even exist monkey style imitating the rest.
Meanwhile out there in the real world, if you specifically want to get a job that requires you to speak Chinese, you are going to have to attend a course in Chinese, to actually learn Chinese. Unless you are actually native Chinese in which case you won’t have to attend that course.
Though depending on the situation I might still find that it’s useful to attend the course, so I can get certification as having gone through the course, which in the real world might be of more value than speaking Chinese without that certification.
And these sorts of certification-based (as opposed to skill-based) considerations apply to most disciplines as well.
And, of course, the fact that I’m applying for this job, which requires Chinese, is itself a choice I’m making, and we can ask why I’m making that choice, and to what extent my motives are status-seeking vs. truth-seeking vs. improvements-in-the-world-seeking vs. something else.
Conversely, if I am entirely uninterested in certification and I really am solely interested in learning Chinese for the intrinsic value of learning Chinese, I might find it’s more useful not to attend a course, but instead study Chinese on my own (e.g. via online materials and spending my afternoons playing Mahjong in Chinatown).
If you already speak Chinese, you’d just need to pass an exam, no course attached, and if you are a native speaker, you’d be correctly presumed to speak it better than someone who spent many years on a course, lived in China, etc.
Many of those acts can be undertaken without having any such qualities, though.
I agree. I’m not defending Hanson’s theory, just saying what it is. Perhaps in more starkly extreme terms than he might, but I have never seen him put any limits on the concept. This, I am suggesting, is the origin of the broad application of the concept on LessWrong.
Meanwhile out there in the real world, if you specifically want to get a job that requires you to speak Chinese, you are going to have to attend a course in Chinese, to actually learn Chinese.
Quite so. But you are thinking like an engineer—that is, you are thinking in terms of actually getting things done. This is the right way to think, but it is not the way of the Hansonian fundamentalist (an imaginary figure that appears in my head when I contemplate signalling theory, and should not be confused with Robin Hanson himself).
The Hansonian fundamentalist would respond that it’s still all signalling. The only thing that he aims at getting done is the acquisition of status for himself. All else is means. The role that the actual ability to speak Chinese plays is that of an unforgeable signal, a concept which replaces that of truth, as far as what goes on inside our heads is concerned. Tarski’s definition of truth stands, but the Litany of Tarski does not. It is replaced by, “If X is true, I desire whatever attitude to X will maximise my status; if X is false, I desire whatever attitude to X will maximise my status. Let me not become attached to anything but status.”
If the job really cannot be done without good spoken Chinese, then to keep that job, you will need that ability. But if in the particular situation you correctly judged that you could get by with English and the help of a Chinese secretary, busk your way through the training course, and pull strings to keep your job if you run into difficulties, then that would be Homo Hypocritus’ choice. Homo Hypocritus does whatever will work best to convince his boss of his worthy qualities, with what lawyers call reckless disregard for the truth. Truth is never a consideration, except as a contingent means to status.
ETA:
I think Hanson’s ideas are far more applicable to Hanson’s own personal behaviour than to the world in general.
In particular, what he’s trying to do with his “Signalling theory” is not to tell us anything about human behaviour, but instead to try to imply that he is neglecting necessity for actual training, which would be consistent with him having some immense innate abilities but not trying hard.
He does have tenure at a reputable American university, which I think is not a prize handed out cheaply. OTOH, I am reminded of a cartoon whose caption is “Mad? Of course I’m mad! But I have tenure!”
If the job really cannot be done without good spoken Chinese, then to keep that job, you will need that ability. But if in the particular situation you correctly judged that you could get by with English and the help of a Chinese secretary, busk your way through the training course, and pull strings to keep your job if you run into difficulties, then that would be Homo Hypocritus’ choice. Homo Hypocritus does whatever will work best to convince his boss of his worthy qualities, with what lawyers call reckless disregard for the truth. Truth is never a consideration, except as a contingent means to status.
At that point we aren’t really talking about signalling innate qualities, we’re talking of forgeries and pretending. Those only work at all because there are people who are not pretending.
A fly that looks like a wasp is only scary because there are wasps with venom that actually works. And those wasps have venom so potent because they actually use it to defend the hives. They don’t merely have venom to be worthy of having bright colours. Venom works directly, not through the bright colour.
One could of course forge the signals and then convince themselves that they are honestly signalling the ability to forge signals… but at the end of the day, this fly that looks like a wasp, it is just a regular fly, and it only gets an advantage from us not being fully certain that it is a regular fly. And the flies that look like wasps are not even close to displacing the other flies - there’s an upper limit on those.
He does have tenure at a reputable American university, which I think is not a prize handed out cheaply. OTOH, I am reminded of a cartoon whose caption is “Mad? Of course I’m mad! But I have tenure!”
Well, tenure is an example of status… and in his current field there may not be as many equivalents of “venom actually working” as in other fields so it looks like it is all about colours.
You can say that whether it’s signaling is determined by the motivations of the person taking the course, or the motivations of the people offering the course, or the motivations of employers hiring graduates of the course. And you can define motivation as the conscious reasons people have in their minds, or as the answer to the question of whether the person would still have taken the course if it was otherwise identical but provided no signaling benefit. And there can be multiple motivations, so you can say that something is signaling if signaling is one of the motivations, or that it’s signaling only if signaling is the only motivation.
If you make the right selections from the previous, you can argue for almost anything that it’s not signaling, or that it is for that matter.
if someone wants to demonstrate some innate or pre-existing quality (such as mathematical ability), they participate in a relevant contest and this is signalling.
If I wanted to defend competitions from accusations of signaling like you defended education, I could easily come up with lots of arguments. Like people doing them to challenge themselves, experience teamwork, test their limits and meet like-minded people. And the fact that lots of people that participate in competitions even though they know they don’t have a serious chance of coming on top, etc.
OSHA rules would still require that the crane operator passes the crane related training.
(Sure, but I meant that only truck drivers would be accepted into the crane operator training in the first place, because they would be more likely to pass it and perform well afterward.)
And conversely, if someone wants to demonstrate some innate or pre-existing quality (such as mathematical ability), they participate in a relevant contest, and this is signalling.
Given the way the term is actuallly used, I wouldn’t call that “signalling” because “signalling” normally refers to demonstrating that you have some trait by doing something other than performing the trait itself (if it’s capable of being performed). You can signal your wealth by buying expensive jewels, but you can’t signal your ability to buy expensive jewels by buying expensive jewels. And taking a math test to let people know that you’re good at math is not signalling, but going to a mathematicians’ club to let people know that you’re good at math may be signalling.
Given the way the term is actuallly used, I wouldn’t call that “signalling” because “signalling” normally refers to demonstrating that you have some trait by doing something other than performing the trait itself
This seem to be the meaning common on these boards, yes.
And taking a math test to let people know that you’re good at math is not signalling, but going to a mathematicians’ club to let people know that you’re good at math may be signalling.
Going to mathematicians club (and the like) is something that you can do if you aren’t any good at math, though. And it only works as a “signal” of being good at math because most people go to that club for other reasons (that would be dependent on being good at math).
Signalling was supposed to be about credibly conveying information to another party whenever there is a motivation for you to lie.
It seems that instead signalling is used to refer to behaviours portrayed in “Flowers for Charlie” episode of “It’s always sunny in Philadelphia”.
Generally, the signalling model of education refers to the wage premium paid to holders of associates, bachelors, masters, and doctoral degrees, often averaged across all majors. (There might be research into signalling with regards to vocational degrees, but I think most people that look into that are more interested in licensing / scarcity effects.)
Well, in the hard science majors, there’s considerable training, which is necessary for a large fraction of occupations. Granted, a physics PhD who became an economist may have been signalling, but it is far from the norm. What is the norm is that vast majority of individuals employed as physics PhDs would be unable to perform some parts of their work if they hadn’t undergone relevant training, just as you wouldn’t have been able to speak a foreign language or drive a car without training.
Approximately it means “I have a financial or prestige incentive to find a relationship and I work in a field that doesn’t take its science seriously”.
Or, for instance in the case of particle physics, it means the probability you are just looking at background. You are painting with an overly broad brush. Sure, p-values are overused, but there are situations where the p-value IS the right thing to look at.
Or, for instance in the case of particle physics, it means the probability you are just looking at background.
No, it’s the probability that you’d see a result that extreme (or more extreme) conditioned on just looking at background. Frequentists can’t evaluate unconditional probabilities, and ‘probability that I see noise given that I see X’ (if that’s what you had in mind) is quite different from ‘probability that I see X given that I see noise’.
(Incidentally, the fact that this kind of conflation is so common is one of the strongest arguments against defaulting to p-values.)
Keep in mind that he and other physicists do not generally consider “probability that it is noise, given an observation X” to even be a statement about the world (it’s a statement about one’s personal beliefs, after all, one’s confidence in the engineering of an experimental apparatus, and so on and so forth), so they are perhaps conflating much less than it would appear under very literal reading. This is why I like the idea of using the word “plausibility” to describe beliefs, and “probability” to describe things such as the probability of an event rigorously calculated using a specific model.
edit: note by the way that physicists can consider a very strong result—e.g. those superluminal neutrinos—extremely implausible on the basis of a prior—and correctly conclude that there is most likely a problem with their machinery, on the basis of ratio between the likelihood of seeing that via noise to likelihood of seeing that via hardware fault. How’s that even possible without actually performing Bayesian inference?
edit2: also note that there is a fundamental difference as with plausibilities you will have to be careful to avoid vicious cycles in the collective reasoning. Plausibility, as needed for combining it with other plausibilities, is not a real number, it is a real number with attached description of how exactly it was made, so that evidence would not be double-counted. The number itself is of little use to communication for this reason.
Keep in mind that he and other physicists do not generally consider “probability that it is noise, given an observation X” to even be a statement about the world (it’s a statement about one’s personal beliefs, after all, one’s confidence in the engineering of an experimental apparatus, and so on and so forth)
It’s about the probability that there is an effect which will cause this deviation from background to become more and more supported by additional data rather than simply regress to the mean (or with your wording, the other way around). That seems fairly based-in-the-world to me.
The actual reality either has this effect, or it does not. You can quantify your uncertainty with a number, that would require you to assign some a-priori probability, which you’ll have to choose arbitrarily.
You can contrast this to a die roll which scrambles initial phase space, mapping (approximately but very close to) 1⁄6 of any physically small region of it to each number on the die, the 1⁄6 being an objective property of how symmetrical dies bounce.
They are specific to your idiosyncratic choice of prior, I am not interested in hearing them (in the context of science), unlike the statements about the world.
That knowledge is subjective doesn’t mean that such statements are not about the world. Furthermore, such statements can (and sometimes do) have arguments for the priors...
By this standard, any ‘statement about the world’ ignores all of the uncertainty that actually applies. Science doesn’t require you to sweep your ignorance under the rug.
Or, for instance in the case of particle physics, it means the probability you are just looking at background.
Well, technically, the probability that you will end up with a result given that you are just looking at background. I.e. the probability that after the experiment you will end up looking at background thinking it is not background*, assuming it is all background.
if it is used for a threshold for such thinking
It’s really awkward to describe that in English, though, and I just assume that this is what you mean (while Bayesianists assume that you are conflating the two).
Or, for instance in the case of particle physics, it means the probability you are just looking at background. You are painting with an overly broad brush. Sure, p-values are overused, but there are situations where the p-value IS the right thing to look at.
Note that the ‘brush’ I am using is essentially painting the picture “0.05 is for sissies”, not a rejection of p-values (which I may do elsewhere but with less contempt). The physics reference was to illustrate the contrast of standards between fields and why physics papers can be trusted more than medical papers.
With the thresholds from physics, we’d still be figuring out if penicillin really, actually kills certain bacteria (somewhat hyperbolic, 5 sigma ~ 1 in 3.5 million).
0.05 is a practical tradeoff, for supposed Bayesians, it is still much too strict, not too lax.
I for one think that 0.05 is way too lax (other than for the purposes of seeing whenever it is worth it to conduct a bigger study and other such value-of-information related uses) and 0.05 results require rather carefully constructed meta-study to interpret correctly. Because a selection factor of 20 is well within the range attainable by dodgy practices that are almost impossible to prevent, and even in the absence of the dodgy practices, selection due to you being more likely to hear of something interesting.
I can only imagine considering it too strict if I were unaware of those issues or their importance (Bayesianism or not)
This goes much more so for weaker forms of information, such as “Here’s a plausible looking speculation I came up with”. To get anywhere with that kind of stuff one would need to somehow account for the preference towards specific lines of speculation.
edit: plus, effective cures in medicine are the ones supported by very very strong evidence, on par with particle physics (e.g. the same penicillin killing bacteria, you have really big sample sizes when you are dealing with bacteria). The weak stuff—antidepressants for which we don’t know if they lower or raise the risk of the suicide, and are uncertain whenever the effect is an artefact from using in any way whatsoever a depression score that includes weight loss and insomnia as symptoms when testing a drug that causes weight gain and sleepiness.
I think it is mostly because priors for finding a strongly effective drug are very low, so when large p-values are involved, you can only find low effect, near-placebo drugs.
edit2: Other issue is that many studies are plagued by at least some un-blinding that can modulate the placebo effect. So, I think a threshold on the strength of the effect (not just p-value) is also necessary—things that are within the potential systematic error margin from the placebo effect may mostly be a result of systematic error.
edit3: By the way, note that for a study of same size, stronger effect will result in much lower p-value, and so a higher standard on p-values does not interfere with detection of strong effects much. When you are testing an antibiotic… well, the chance probability of one bacterium dying in some short timespan may be 0.1, and with antibiotic at a fairly high concentration, 99.99999… . Needless to say, a dozen bacteria put you far beyond the standards from the particle physics, and a whole poisoned petri dish makes point moot, with all the unconfidence coming from the possibility of killing the bacteria in some other way.
It probably is too lax. I’d settle for 0.01, but 0.005 or 0.001 would be better for most applications (i.e—where you can get it). We have have the whole range of numbers between 1 in 25 and 1 in 3.5 million to choose from, and I’d like to see an actual argument before concluding that the number we picked mostly from historical accident was actually right all along.
Still, a big part of the problem is the ‘p-value’ itself, not the number coming after it. Apart from the statistical issues, it’s far too often mistaken for something else, as RobbBB has pointed out elsewhere in this thread.
0.05 is a practical tradeoff, for supposed Bayesians, it is still much too strict, not too lax.
No, it isn’t. In an environment where the incentive to find a positive result in huge and there are all sorts of flexibilities in what particular results to report and which studies to abandon entirely, 0.05 leaves far too many false positives. I really does begin to look like this. I don’t advocate using the standards from physics but p=0.01 would be preferable.
Mind you, there is no particularly good reason why there is an arbitrary p value to equate with ‘significance’ anyhow.
Well, I would find it really awkward for a Bayesian to condone a modus operandi such as “The p-value of 0.15 indicates it is much more likely that there is a correlation than that the result is due to chance, however for all intents and purposes the scientific community will treat the correlation as non-existent, since we’re not sufficiently certain of it (even though it likely exists)”.
Similar to having choice of two roads to go down, one of which leads into the forbidden forest. Then saying “while I have decent evidence which way goes where, because I’m not yet really certain, I’ll just toss a coin.” How many false choices would you make in life, using an approach like that? Neglecting your duty to update, so to speak. A p-value of 0.15 is important evidence. A p-value of 0.05 is even more important evidence. It should not be disregarded, regardless of the perverse incentives in publishing and the false binary choice (if (p<=0.05) correlation=true, else correlation=false). However, for the medical community, a p-value of 0.15 might as well be 0.45, for practical purposes. Not published = not published.
This is especially pertinent given that many important chance discoveries may only barely reach significance initially, not because their effect size is so small, but because in medicine sample sizes often are, with the accompanying low power of discovering new effects. When you’re just a grad student with samples from e.g. 10 patients (no economic incentive yet, not yet a large trial), unless you’ve found magical ambrosia, p-values may tend to be “insignificant”, even of potentially significant breakthrough drugs .
Better to check out a few false candidates too many than to falsely dismiss important new discoveries. Falsely claiming a promising new substance to have no significant effect due to p-value shenanigans is much worse than not having tested it in the first place, since the “this avenue was fruitless” conclusion can steer research in the wrong direction (information spreads around somewhat even when unpublished, “group abc had no luck with testing substances xyz”).
IOW, I’m more concerned with false negatives (may never get discovered as such, lost chance) than with false positives (get discovered later on—in larger follow-up trials—as being false positives). A sliding p-value scale may make sense, with initial screening tests having a lax barrier signifying a “should be investigated further”, with a stricter standard for the follow-up investigations.
Well, I would find it really awkward for a Bayesian to condone a modus operandi such as “The p-value of 0.15 indicates it is much more likely that there is a correlation than that the result is due to chance, however for all intents and purposes the scientific community will treat the correlation as non-existent, since we’re not sufficiently certain of it (even though it likely exists)”.
And this is a really, really great reason not to identify yourself as “Bayesian”. You end up not using effective methods when you can’t derive them from Bayes theorem. (Which is to be expected absent very serious training in deriving things).
Better to check out a few false candidates too many than to falsely dismiss important new discoveries
Where do you think the funds for testing false candidates are going to come from? If you are checking too many false candidates, you are dismissing important new discoveries. You are also robbing time away from any exploration into the unexplored space.
edit: also I think you overestimate the extent to which promising avenues of research are “closed” by a failure to confirm. It is understood that a failure can result from a multitude of causes. Keep in mind also that with a strong effect, you have quadratically better p-value for the same sample size. You are at much less of a risk of dismissing strong results.
Well, I would find it really awkward for a Bayesian to condone a modus operandi such as “The p-value of 0.15 indicates it is much more likely that there is a correlation than that the result is due to chance, however for all intents and purposes the scientific community will treat the correlation as non-existent, since we’re not sufficiently certain of it (even though it likely exists)”.
The way statistically significant scientific studies are currently used is not like this. The meaning conveyed and the practical effect of official people declaring statistically significant findings is not a simple declaration of the Bayesian evidence implied by the particular statistical test returning less than 0.05. Because of this, I have no qualms with saying that I would prefer lower values than p<0.05 to be used in the place where that standard is currently used. No rejection of Bayesian epistemology is implied.
No, the multiple comparisons problem, like optional stopping, and other selection effects that alter error probabilities are a much greater problem in Bayesian statistics because they regard error probabilities and the sampling distributions on which they are based as irrelevant to inference, once the data are in hand. That is a consequence of the likelihood principle (which follows from inference by Bayes theorem).
I find it interesting that this blog takes a great interest in human biases, but guess what methodology is relied upon to provide evidence of those biases? Frequentist methods.
Unfortunately, we find ourselves in a world where the world’s policy-makers don’t just profess that AGI safety isn’t a pressing issue, they also aren’t taking any action on AGI safety. Even generally sharp people like Bryan Caplan give disappointingly lame reasons for not caring. :(
Why won’t you update towards the possibility that they’re right and you’re wrong?
This model should rise up much sooner than some very low prior complex model where you’re a better truth finder about this topic but not any topic where truth-finding can be tested reliably*, and they’re better truth finders about topics where truth finding can be tested (which is what happens when they do their work), but not this particular topic.
(*because if you expect that, then you should end up actually trying to do at least something that can be checked because it’s the only indicator that you might possibly be right about the matters that can’t be checked in any way)
Why are the updates always in one direction only? When they disagree, the reasons are “lame” according to yourself, which makes you more sure everyone’s wrong. When they agree, they agree and that makes you more sure you are right.
This model should rise up much sooner than some very low prior complex model where you’re a better truth finder about this topic...
It’s not so much that I’m a better truth finder, it’s that I’ve had the privilege of thinking through the issues as a core component of my full time job for the past two years, and people like Caplan only raise points that have been accounted for in my model for a long time. Also, I think the most productive way to resolve these debates is not to argue the meta-level issues about social epistemology, but to have the object-level debates about the facts at issue. So if Caplan replies to Carl’s comment and my own, then we can continue the object-level debate, otherwise… the ball’s in his court.
Why are the updates always in one direction only? When they disagree, the reasons are “lame” according to yourself, which makes you more sure everyone’s wrong. When they agree, they agree and that makes you more sure you are right.
This doesn’t appear to be accurate. E.g. Carl & Paul changed my mind about the probability of hard takeoff. And when have I said that some public figure agreeing with me made me more sure I’m right? See also my comments here.
If I mention a public figure agreeing with me, it’s generally not because this plays a significant role in my own estimates, it’s because other people think there’s a stronger correlation between social status and correctness than I do.
It’s not so much that I’m a better truth finder, it’s that I’ve had the privilege of thinking through the issues as a core component of my full time job for the past two years, and people like Caplan only raise points that have been accounted for in my model for a long time.
Yes, but why Caplan did not see it fit to think about the issue for a significant time, and you did?
There’s also the AI researchers who have had the privilege of thinking about relevant subjects for a very long time, education, and accomplishments which verify that their thinking adds up over time—and who are largely the actual source for the opinions held by the policy makers.
By the way, note that the usual method of rejection of wrong ideas, is not even coming up with wrong ideas in the first place, and general non-engagement of wrong ideas. This is because the space of wrong ideas is much larger than the space of correct ideas.
What I expect to see in the counter-factual world where the AI risk is a big problem, is that the proponents of the AI risk in that hypothetical world have far more impressive and far more relevant accomplishments and credentials.
but to have the object-level debates about the facts at issue.
The first problem with highly speculative topics is that great many arguments exist in favour of either opinion on a speculative topic. The second problem is that each such argument relies on a huge number of implicit or explicit assumptions that are likely to be violated due to their origin as random guesses. The third problem is that there is no expectation that the available arguments would be a representative sample of the arguments in general.
This doesn’t appear to be accurate. E.g. Carl & Paul changed my mind about the probability of hard takeoff.
Hmm, I was under the impression that you weren’t a big supporter of the hard takeoff to begin with.
If I mention a public figure agreeing with me, it’s generally not because this plays a significant role in my own estimates, it’s because other people think there’s a stronger correlation between social status and correctness than I do.
Well, your confidence should be increased by the agreement; there’s nothing wrong with that. The problem is when it is not balanced by the expected decrease by disagreement.
What I expect to see in the counter-factual world where the AI risk is a big problem, is that the proponents of the AI risk in that hypothetical world have far more impressive and far more relevant accomplishments and credentials.
There are a great many differences in our world model, and I can’t talk through them all with you.
Maybe we could just make some predictions? E.g. do you expect Stephen Hawking to hook up with FHI/CSER, or not? I think… oops, we can’t use that one: he just did. (Note that this has negligible impact on my own estimates, despite him being perhaps the most famous and prestigious scientist in the world.)
Okay, well… If somebody takes a decent survey of mainstream AI people (not AGI people) about AGI timelines, do you expect the median estimate to be earlier or later than 2100? (Just kidding; I have inside information about some forthcoming surveys of this type… the median is significantly sooner than 2100.)
Okay, so… do you expect more or fewer prestigious scientists to take AI risk seriously 10 years from now? Do you expect Scott Aaronson and Peter Norvig, within 25 years, to change their minds about AI timelines, and concede that AI is fairly likely within 100 years (from now) rather than thinking that it’s probably centuries or millennia away? Or maybe you can think of other predictions to make. Though coming up with crisp predictions is time-consuming.
Well, I too expect some form of something that we would call “AI”, before 2100. I can even buy into some form of accelerating progress, albeit the progress would be accelerating before the “AI” due to the tools using relevant technologies, and would not have that sharp of a break. I even do agree that there is a certain level of risk involved in all the future progress including progress of the software.
I have a sense you misunderstood me. I picture this parallel world where legitimate, rational inferences about the AI risk exist, and where this risk is worth working at in 2013 and stands out among the other risks, as well as any other pre-requisites for making MIRI worthwhile hold. And in this imaginary world, I expect massively larger support than “Steven Hawkins hooked up with FHI” or what ever you are outlining here.
You do frequently lament that the AI risk is underfunded, under-supported, and there’s under-awareness about it. In the hypothetical world, this is not the case and you can only lament that the rational spending should be 2 billions rather than 1 billion.
edit: and of course, my true rejection is that I do not actually see rational inferences leading there. The imaginary world stuff is just a side-note to explain how non-experts generally look at it.
edit2: and I have nothing against FHI’s existence and their work. I don’t think they are very useful, or address any actual safety issues which may arise, though, but with them I am fairly certain they aren’t doing any harm either (Or at least, the possible harm would be very small). Promoting the idea that AI is possible within 100 years, however, is something that increases funding for AI all across the board.
I have a sense you misunderstood me. I picture this parallel world where legitimate, rational inferences about the AI risk exist, and where this risk is worth working at in 2013 and stands out among the other risks, as well as any other pre-requisites for making MIRI worthwhile hold. And in this imaginary world, I expect massively larger support than “Steven Hawkins hooked up with FHI” or what ever you are outlining here.
Right, this just goes back to the same disagreement in our models I was trying to address earlier by making predictions. Let me try something else, then. Here are some relevant parts of my model:
I expect most highly credentialed people to not be EAs in the first place.
I expect most highly credential people to be mostly just aware of risks they happen to have heard about (e.g. climate change, asteroids, nuclear war), rather than attempting a systematic review of risks (e.g. by reading the GCR volume).
I expect most highly credentialed people to respond fairly well when actuarial risk is easily calculated (e.g. asteroid risk), and not-so-well when it’s more difficult to calculate (e.g. many insurance companies went bankrupt after 9/11).
I expect most highly credentialed people to have spent little time on explicit calibration training.
I expect most highly credentialed people to not systematically practice debiasing like some people practice piano.
I expect most highly credentialed people to know very little about AI, and very little about AI risk.
I expect that in general, even those highly credentialed people who intuitively think AI risk is a big deal will not even contact the people who think about AI risk for a living in order to ask about their views and their reasons for them, due to basic VoI failure.
I expect most highly credentialed people to have fairly reasonable views within their own field, but to often have crazy views “outside the laboratory.”
I expect most highly credentialed people to not have a good understanding of Bayesian epistemology.
I expect most highly credentialed people to continue working on, and caring about, whatever their career has been up to that point, rather than suddenly switching career paths on the basis of new information and an EV calculation.
I expect most highly credentialed people to not understand lots of pieces of “black swan epistemology” like this one and this one.
The question should not be about “highly credentialed” people alone, but about how they fare compared to people who are rather very low “credentialed”.
In particular, on your list, I expect people with fairly low credentials to fare much worse, especially at identification of the important issues as well as on rational thinking. Those combine multiplicatively, making it exceedingly unlikely—despite the greater numbers of the credential-less masses—that people who lead the work on an important issue would have low credentials.
I expect most highly credentialed people to not be EAs in the first place.
What’s EA? Effective altruism? If it’s an existential risk, it kills everyone, selfishness suffices just fine.
e.g. many insurance companies went bankrupt after 9/11
Ohh, come on. That is in no way a demonstration that insurance companies in general follow faulty strategies, and especially is not a demonstration that you could do better.
I expect most highly credentialed people to not systematically practice debiasing like some people practice piano.
In particular, on your list, I expect people with fairly low credentials to fare much worse
No doubt! I wasn’t comparing highly credentialed people to low-credentialed people in general. I was comparing highly credentialed people to Bostrom, Yudkowsky, Shulman, etc.
But why exactly would you expect conventional researchers in AI and related technologies (also including provable software, as used in the aerospace industry, and a bunch of other topics), with credentials and/or accomplishments in said fields, to fare worse on that list’s score?
Furthermore, with regards to the rationality, risks of mistake, and such… very little was done that can be checked for correctness in a clear cut way—most is of such nature that even when wrong it would not be possible to conclusively demonstrate it wrong. The few things that can be checked… look, when you write an article like this , discussing irrationality of Enrico Fermi, there’s a substantial risk of appearing highly arrogant (and irrational) if you get the technical details wrong. It is a miniature version of AI risk problem—you need to understand the subject, and if you don’t, there’s negative consequences. It is much, much easier to not goof up in things like that, than AI direction.
As you guys are researching into actual AI technologies, the issue is that one should be able to deem your effort less of a risk. Mere “we are trying to avoid risk and we think they don’t” can’t do. The cost of a particularly bad friendly AI goof-up is a sadistic AI (to borrow the term from Omohundro). A sadistic AI can probably run far more tortured minds than a friendly AI can run minds, by a very huge factor, so the risk of a goof up must be quite a lot lower than anyone demonstrated.
BTW, I went back and numbered the items in my list so they’re easier to refer to.
But why exactly would you expect conventional researchers in AI and related technologies… with credentials and/or accomplishments in said fields, to fare worse on that list’s score?
Because very few people in general, including credentialed AI people, satisfy (1), (2), (3), (5), (6), (7)†, (8), (10), and (12), but Bostrom, Yudkowsky and Shulman rather uncontroversially do satisfy those items. I also expect B/Y/S to outperform most credentialed experts on (4), (9), and (11), but I understand that’s a subjective judgment call and it would take a long time for me to communicate my reasons.
† The AI risk part of 7, anyway. Obviously, AI people specifically know a lot about AI.
Edit: Also, I’ll briefly mention that I haven’t downvoted any of your comments in this conversation.
Because very few people in general, including credentialed AI people, satisfy (1), (2), (3), (5), (6), (7), (8), (10), and (12)
Ok, let’s go over your list, for the AI people.
1 I expect most highly credentialed people to not be EAs in the first place.
If EA is effective altruism, that’s not relevant because one doesn’t have to be an altruist to care about existential risks.
2 I expect most highly credentialed people to not be familiar with the arguments for caring about the far future.
I expect them to be able to come up with that independently if it is a good idea.
3 I expect most highly credential people to be mostly just aware of risks they happen to have heard about (e.g. climate change, asteroids, nuclear war), rather than attempting a systematic review of risks (e.g. by reading the GCR volume).
I expect intelligent people to be able to foresee risks, especially when prompted by the cultural baggage (modern variations on the theme of Golem)
4 I expect most highly credentialed people to respond fairly well when actuarial risk is easily calculated (e.g. asteroid risk), and not-so-well when it’s more difficult to calculate (e.g. many insurance companies went bankrupt after 9/11).
Well, that ought to imply some generally better ability to evaluate hard to calculate probabilities, which would imply that you guys should be able to make quite a bit of money.
5 I expect most highly credentialed people to have spent little time on explicit calibration training.
The question is how well are they calibrated, not how much time they spent. You guys see miscalibration of famous people everywhere, even in Enrico Fermi.
6 I expect most highly credentialed people to not systematically practice debiasing like some people practice piano.
Once again, how unbiased is what’s important, not how much time spent on a very specific way to acquire an ability. I expect most accomplished people to have encountered far more feedback on being right / being wrong through their education and experience.
7 I expect most highly credentialed people to know very little about AI, and very little about AI risk.
Doesn’t apply to people in AI related professions.
8 I expect that in general, even those highly credentialed people who intuitively think AI risk is a big deal will not even contact the people who think about AI risk for a living in order to ask about their views and their reasons for them, due to basic VoI failure.
The way to raise VoI is prior history of thinking about something else for a living, with impressive results.
9 I expect most highly credentialed people to have fairly reasonable views within their own field, but to often have crazy views “outside the laboratory.”
Well, less credentialed people are just like this except they don’t have a laboratory inside of which they are sane, that’s usually why they are less credentialed in the first place.
10 I expect most highly credentialed people to not have a good understanding of Bayesian epistemology.
Of your 3, I only weakly expect Bostrom to have learned the necessary fundamentals for actually applying Bayes theorem correctly in somewhat non-straightforward cases.
Yes, the basic formula is simple, but derivations are subtle and complex for non independent evidence or cases involving loops in the graph or all those other things…
It’s like arguing that you are better equipped for a job at Weta Digital than any employee there because you know quantum electrodynamics (the fundamentals of light propagation), and they’re using geometrical optics.
I expect many AI researchers to understand the relevant mathematics a lot, lot better than the 3 on your list.
And I expect credentialed people in general to have a good understanding of the variety of derivative tricks that are used to obtain effective results under uncertainty when the Bayes theorem can not be effectively applied.
11 I expect most highly credentialed people to continue working on, and caring about, whatever their career has been up to that point, rather than suddenly switching career paths on the basis of new information and an EV calculation.
Yeah, well, and I expect non-credentialed people to have too much to lose from backing out of it in the event that the studies return a negative.
12 I expect most highly credentialed people to not understand lots of pieces of “black swan epistemology” like this one and this one.
You lose me here.
I would make a different list, anyway. There’s my list:
Relevant expertise as measured by educational credentials and/or accomplishments. Expertise is required for correctly recognizing risks (e.g. an astronomer is better equipped for recognizing risks from the outer space, a physicist for recognizing faults in a nuclear power plant design, et cetera)
Proven ability to make correct inferences (largely required for 1).
Self preservation (most of us have it)
Lack of 1 is an automatic dis-qualifier in my list. It doesn’t matter how much you are into things that you think are important for identifying, say, faults in a nuclear power plant design. If you are not an engineer, a physicist, or the like, you aren’t going to qualify for that job via some list you make yourself, which conveniently omits (1).
I disagree with many of your points, but I don’t have time to reply to all that, so to avoid being logically rude I’ll at least reply to what seems to be your central point, about “relevant expertise as measured by educational credentials and/or accomplishments.”
Who has educational credentials and/or accomplishments relevant to future AGI designs or long-term tech forecasting? Also, do you particularly disagree with what I wrote in AGI Impact Experts and Friendly AI Experts?
Also, in general, I’ll just remind everyone reading this that I don’t think these meta-level debates about proper social epistemology are as productive as object-level debates about strategically relevant facts (e.g. facts relevant to the theses in this post). Argument screens off authority, and all that.
Edit: Also, my view of Holden Karnofsky might be illustrative. I take Holden Karnofsky more seriously than almost anyone on the cost-effectiveness of global health interventions, despite the fact that he has 0 relevant degrees, 0 papers published in relevant journals, 0 awards for global health work, etc. Degrees and papers and so on are only proxy variables for what we really care about, and are easily screened off by more relevant variables, both for the case of Karnofsky on global health and for the case of Bostrom, Yudkowsky, Shulman, etc. on AI risk.
For Karnofsky and to some extent Bostrom yes, Shulman is debatable, Yudkowsky tried to get screened (tried to write a programming language, for example, wrote a lot of articles on various topics, many of them wrong, tried to write technical papers (TDT), really badly), and failed to pass the screening by a very big margin. Entirely irrational arguments about 10% counter-factual impact of his are also a part of failure. Omohundro passed with flying colours (his PhD is almost entirely irrelevant at that point, as it is screened off by his accomplishments in AI).
I’ll just remind everyone reading this that I don’t think these meta-level debates about proper social epistemology are as productive as object-level debates about strategically relevant facts....
Exactly. All of this is wasted effort once either FAI or UFAI is developed.
Who has educational credentials and/or accomplishments relevant to future AGI designs or long-term tech forecasting?
There’s the more relevant accomplishments, there are less relevant accomplishments, and lacks of accomplishment.
Also, in general, I’ll just remind everyone reading this that I don’t think these meta-level debates about proper social epistemology are as productive as object-level debates about strategically relevant facts
I agree that a discussion of strategically relevant facts would be much more productive. I don’t see facts here. I see many speculations. I see a lot of making things up to fit the conclusion.
If I were to tell you that I can, for example, win a very high stakes programming contest (with a difficult, open problem that has many potential solutions that can be ranked in terms of quality), the discussion of my approach to the contest problem between you and me would be almost useless for your or my prediction of victory (provided that basic standards of competence are met), irrespective of whenever my idea is good. Prior track record, on the other hand, would be a good predictor. This is how it is for a very well defined problem. It is not going to be better for a less well understood problem.
If EA is effective altruism, that’s not relevant because one doesn’t have to be an altruist to care about existential risks.
‘EA’ here refers to the traits a specific community seems to exemplify (though those traits may occur outside the community). So more may be suggested than the words ‘effective’ and ‘altruism’ contain.
In terms of the terms, I think ‘altruism’ here is supposed to be an inclination to behave a certain way, not an other-privileging taste or ideology. Think ‘reciprocal altruism’. You can be an egoist who’s an EA, provided your selfish calculation has led you to the conclusion that you should devote yourself to efficiently funneling money to the world’s poorest, efficiently reducing existential risks, etc. I’m guessing by ‘EA’ Luke has in mind a set of habits of looking at existential risks that ‘Effective Altruists’ tend to exemplify, e.g., quantifying uncertainty, quantifying benefit, strongly attending to quantitative differences, trying strongly to correct for a specific set of biases (absurdity bias, status quo bias, optimism biases, availability biases), relying heavily on published evidence, scrutinizing the methodology and interpretation of published evidence....
I expect them to be able to come up with that independently if it is a good idea.
My own experience is that I independently came up with a lot of arguments from the Sequences, but didn’t take them sufficiently seriously, push them hard enough, or examine them in enough detail. There seems to be a big gap between coming up with an abstract argument for something while you’re humming in the shower, and actually living your life in a way that’s consistent with your believing the argument is sound.
My own experience is that I independently came up with a lot of arguments from the Sequences, but didn’t take them sufficiently seriously, push them hard enough, or examine them in enough detail.
But we are speaking of credentialed people. They’re fairly driven.
Furthermore, general non acceptance of an idea is evidence that the idea is not good. You can’t seriously be listing general non acceptance of your ideas by the relevant experts as the reason why you are superior to those experts, because same non acceptance lowers the probability that those ideas are correct, proportionally to how much it raises how exceptional you are for holding those views. (The biggest problem with “Bayesianism” is dis-balanced/selective updates)
First off, if one can support existential risk for non Pascal’s wager type reasons then enormous utility of the future should not be relevant. If it is actually a requirement then I don’t think there’s anything to discuss here.
Secondarily, the most common norm of morality (Assuming we ignore things like Sharia), as specified in the laws of progressive countries, or as extrapolation of legal progress in less progressive ones, is to value the future people (we disapprove of smoking while pregnant), but not value counter-factual creation of future people (we allow abortion, and especially when the child would be disadvantaged and not have a fair chance). Rather than inferring the prevailing morality from the law and discussing it, various bad ideas are invented and discussed to make the argument appear stronger than it really is.
It is not that I am not exposed to this worldview. I am. It is that choosing between A: hurt someone, but a large number of happy people will be created, and B: not hurt someone, but a large number of happy people will not be created (with the deliberate choice having the causal impact on the hurting and creation), A is both illegal and immoral.
general non acceptance of an idea is evidence that the idea is not good. You can’t seriously be listing general non acceptance of your ideas by the relevant experts as the reason why you are superior to those experts, because same non acceptance lowers the probability that those ideas are correct, proportionally to how much it raises how exceptional you are for holding those views.
When I hear that Joe has a new argument against a belief of mine, then my confidence in my belief lowers a bit, and my confidence in Joe’s competence also lowers a bit. If I then go on to actually evaluate the argument in detail and discover that it’s an extraordinarily poor one, this should generally increase my confidence to higher than it was before I heard that Joe had an argument, and it should further lower my confidence in Joe’s competence.
I’ve spent enough time looking at the specific arguments for and against many of these propositions to have the contents of those arguments overwhelm my expertise priors in both directions, such that I just don’t see a whole lot of value in discussing anything but the arguments themselves, when my goal (and yours) is to figure out the level of merit of the arguments.
if one can support existential risk for non Pascal’s wager type reasons then enormous utility of the future should not be relevant.
It sounds like you’re committing the Pascal’s Wager Fallacy Fallacy. If you aren’t, then I’m not understanding your point. Large future utilities should count more than small future utilities, and multiplying by low probabilities is fine if the probabilities aren’t vanishingly low.
Choosing between A: hurt someone, but a large number of happy people get created, and B: not hurt someone, but a large number of happy people do not get created, A is both illegal and immoral.
I think there’s a quantitative tradeoff between the happiness of currently existent people and the happiness of possibly-created people. A strict rule ‘Counterfactual People Have Absolutely No Value’ leads to absurd conclusions, e.g., it’s not worthwhile to create an infinite number of infinitely happy and well-off people if the cost is that your shoulder itches for a few seconds. It’s at least a little worthwhile to create people with awesome lives, even if they should get weighted less than currently existent people.
I’ve spent enough time looking at the specific arguments for and against many of these propositions to have the contents of those arguments overwhelm my expertise priors in both directions, such that I just don’t see a whole lot of value in discussing anything but the arguments themselves, when my goal (and yours) is to figure out the level of merit of the arguments.
You don’t want the outcome to be biased by the availability of the arguments, right? Really, I think you do not account for the fact that the available arguments are merely samples from the space of possible arguments (which make different speculative assumptions, in a very large space of possible speculations). Picked non uniformly, too, as arguments for one side may be more available, or their creation may maximize personal present-day utility of more agents. Individual samples can’t be particularly informative in such a situation.
It’s at least a little worthwhile to create people with awesome lives, even if they should get weighted less than currently existent people.
The issue is that the number of people you can speculate you affect grows much faster than the prior for the speculation decreases. Constant factors do not help with that, they just push the problem a little further.
A strict rule ‘Counterfactual People Have Absolutely No Value’ leads to absurd conclusions, e.g., it’s not worthwhile to create an infinite number of infinitely happy and well-off people if the cost is that your shoulder itches for a few seconds.
I don’t see that as problematic. Ponder the alternative for a moment: you may be ok with a shoulder itch, but are you OK with 10 000 years of the absolutely worst torture imaginable, for the sake of creation of 3^^^3 or 3^^^^^3 or however many really happy people? What’s about your death vs their creation?
edit: also you might have the value of those people to yourself (as potential mates and whatnot) leaking in.
It sounds like you’re committing the Pascal’s Wager Fallacy Fallacy. If you aren’t, then I’m not understanding your point. Large future utilities should count more than small future utilities, and multiplying by low probabilities is fine if the probabilities aren’t vanishingly low.
If the probabilities aren’t vanishingly low, you reach basically same conclusions without requiring extremely large utilities. 7 billion people dying is quite a lot, too. If you see extremely large utilities on a list of requirements for caring about the issue, when you already have at least 7 billion lives at stake, then it is a Pascal’s wager.
Actually, I don’t see vanishingly small probabilities problematic, I see small probabilities where the bulk of probability mass is unaccounted for, problematic. E.g. response to low risk from a specific asteroid is fine, because it’s alternative positions in space are accounted for (and you have assurance you won’t put it on an even worse trajectory)
Furthermore, general non acceptance of an idea is evidence that the idea is not good. You can’t seriously be listing general non acceptance of your ideas by the relevant experts as the reason why you are superior to those experts, because same non acceptance lowers the probability that those ideas are correct, proportionally to how much it raises how exceptional you are for holding those views. (The biggest problem with “Bayesianism” is dis-balanced/selective updates)
Updating on someone else’s decision to accept or reject a position should depend on their reason for their position. Information cascades is relevant.
Yes, of course. But also keep in mind that wrong positions are often rejected by the mechanism that generates positions, rather than the mechanism that checks the generated positions.
I did see Eelco Hoogendoorn ’s and it is absolutely spot on.
I’m hardly a fan of Caplan, but he has some Bayesianism right:
Based on how things like this asymptote or fail altogether, he has a low prior for foom.
He has low expectation of being able to identify in advance (without the work equivalent to the creation of the AI) exact mechanisms by which it is going to asymptote or fail, irrespective of whenever it does or does not asymptote or fail, so not knowing such mechanisms does not bother him a whole lot.
Even assuming he is correct he expects a plenty of possible arguments against this position (which are reliant on speculations), as well as expects to see some arguers, because the space of speculative arguments is very huge. So such arguments are not going to move him anywhere.
People don’t do that explicitly any more than someone who’s playing football is doing Newtonian mechanics explicitly. Bayes theorem is no less fundamental than the laws of motion of the football.
Likewise for things like non-testability: nobody’s doing anything explicitly, it is just the case that due to something you guys call “conservation of expected evidence” , when there is no possibility of evidence against a proposition, then a possibility of evidence in favour of the proposition would violate the Bayes theorem.
when there is no possibility of evidence against a proposition, then a possibility of evidence in favour of the proposition would violate the Bayes theorem.
I’m not sure how you could have such a situation, given that absence of expected evidence is evidence of the absence. Do you have an example?
Well, the probabilities wouldn’t be literally zero. What I mean is that lack of a possibility of strong evidence against something, and only a possibility of very weak evidence against it (via absence of evidence) implies that strong evidence in favour of it must be highly unlikely. Worse, such evidence just gets lost among the more probable ‘evidence that looks strong but is not’.
Absence of evidence isn’t necessarily a weak kind of evidence.
If I tell you there’s a dragon sitting on my head, and you don’t see a dragon sitting on my head, then you can be fairly sure there’s not a dragon on my head.
On the other hand, if I tell you I’ve buried a coin somewhere in my magical 1cm deep garden—and you dig a random hole and don’t find it—not finding the coin isn’t strong evidence that I’ve not buried one. However, there there’s so much potential weak evidence against. If you’ve dug up all but a 1cm square of my garden—the coin’s either in that 1cm or I’m telling porkies, and what are the odds that—digging randomly—you wouldn’t have come across it by then? You can be fairly sure, even before digging up that square, that I’m fibbing.
Was what you meant analogous to one of those scenarios?
Yes, like the latter scenario. Note that the expected utility of digging is low when the evidence against from one dig is low.
edit: Also. In the former case, not seeing a dragon sitting on your head is very strong evidence against there being a dragon. Unless you invoke un-testable invisible dragons which may be transparent to x-rays, let dust pass through it unaffected, and so on. In which case, I should have a very low likelihood of being convinced that there is a dragon on your head, if I know that the evidence against would be very weak.
edit2: Russel’s teapot in the Kuiper belt is a better example still. When there can be only very weak evidence against it, the probability of encountering or discovering strong evidence in favour of it must be low also, making it not worth while to try to come up with evidence that there is a teapot in the Kuiper belt (due to low probability of success), even when the prior probability for the teapot is not very low.
Then, to extend the analogy: Imagine that digging has potentially negative utility as well as positive. I claim to have buried both a large number of nukes and a magical wand in the garden.
In order to motivate you to dig, you probably want some evidence of magical wands. In this context that would probably be recursively improving systems where, occasionally, local variations rapidly acquire super-dominance over their contemporaries when they reach some critical value. Evolution probably qualifies there—other bipedal frames with fingers aren’t particularly dominant over other creatures in the same way that we are, but at some point we got smart enough to make weapons (note that I’m not saying that was what intelligence was for though) and from then on, by comparison to all other macroscopic land-dwelling forms of life, we may as well have been god.
And since then that initial edge in dominance has only ever allowed us to become more dominant. Creatures afraid of wild animals are not able to create societies with guns and nuclear weapons—you’d never have the stability for long enough.
In order to motivate you not to dig, you probably want some evidence of nukes. In this context, recursively—I’m not sure improving is the right word here—systems with a feedback state, that create large amounts of negative value. Well, to a certain extent that’s a matter of perspective—from the perspective of extinct species the ascendancy of humanity would probably not be anything to cheer about, if they were in a position to appreciate it. But I suspect it can at least stand on its own that it tends to be the case that failure cascades are easier to make than cascade successes. One little thing goes wrong on your rocket and then the situation multiplies; a small error in alignment rapidly becomes a bigger one; or the timer on your patriot battery is losing a fraction of a second and over time your perception of where the missiles are is off significantly. - it’s only with significant effort that we create systems where errors don’t multiply.
(This is analogous to altering your expected value of information—like if earlier you’d said you didn’t want to dig and I’d said, ‘well there’s a million bucks there’ instead—you’d probably want some evidence that I had a million bucks, but given such evidence the information you’d gain from digging would be worth more.)
This seems to be fairly closely analogous to Elizer’s claims about AI, at least if I’ve understood them correctly, that we have to hit an extremely small target and it’s more likely that we’re going to blow ourselves to itty-bitty pieces/cover the universe in paperclips if we’re just fooling around hoping to hit on it by chance.
If you believe that such is the case, then the only people you’re going to want looking for that magic wand—if you let anyone do it at all—are specialists with particle detectors—indeed if your garden is in the middle of a city you’ll probably make it illegal for kids to play around anywhere near the potential bomb site.
Now, we may argue over quite how strongly we have to believe in the possible existence of magitech nukes to justify the cost of fencing off the garden—personally I think the statement:
if you take a thorough look at actually existing creatures, it’s not clear that smarter creatures have any tendency to increase their intelligence.
Is to constrain what you’ll accept for potential evidence pretty dramatically—we’re talking about systems in general, not just individual people, and recursively improving systems with high asymptotes relative to their contemporaries have happened before.
It’s not clear to me that the second claim he makes is even particularly meaningful:
In the real-world, self-reinforcing processes eventually asymptote. So even if smarter creatures were able to repeatedly increase their own intelligence, we should expect the incremental increases to get smaller and smaller over time, not skyrocket to infinity.
Sure, I think that they probably won’t go to infinity—but I don’t see any reason to suspect that they won’t converge on a much higher value than our own native ability. Pretty much all of our systems do, from calculators to cars.
We can even argue over how you separate the claims that something’s going to foom from the false claims of such (I’d suggest, initially, just seeing how many claims that something was going to foom have actually been made within the domain of technological artefacts, it may be that the base-line credibility is higher than we think.) But that’s a body of research that Caplan, as far as I’m aware, hasn’t forwarded. It’s not clear to me that it’s a body of research with the same order of difficulty as creating an actual AI either. And, in its absence, it’s not clear to me that to answer in effect, “I’ll believe it when I see the mushroom cloud.” is a particularly rational response.
I was mostly referring to the general lack of interest in the discussion of un-falsifiable propositions by the scientific community. The issue is that un-falsifiable proposition are also the ones for which it is unlikely that in the discussion you will be presented with evidence in favour of them.
The space of propositions is the garden I am speaking of. And digging up false propositions is not harmless.
With regards to the argument of yours, I think you vastly under-estimate the size of the high-dimensional space of possible software, and how distant in this space are the little islands of software that actually does something interesting, as distant from each other as Bolzmann minds are within our universe (Albeit, of course, depending on the basis, possible software is better clustered).
Those spatial analogies, they are a great fallacy generator, a machine for getting quantities off by mind-bogglingly huge factors. In your mental image, you have someone create those nukes and put them in the sand, for the hapless individuals to find. In the reality that’s not how you find nuke. You venture into this enormous space of possible designs, as vast as the distance from here to the closest exact replica of The Gadget which spontaneously formed from a supernova by the random movement of uranium atoms. When you have to look in the space this big, you don’t find this replica of The Gadget without knowing what you’re looking for quite well.
With regards to listing biases to help arguments, given that I have no expectation that one could not handwave up a fairly plausible bias that would work in the direction of a specific argument, the direct evidential value of listing biases in such manner, on the proposition, is zero (or an epsilon). You could have just as well argued that the individuals who are not afraid of cave bears get killed by the cave bears; there’s too much “give” in your argument for it to have any evidential value. I can freely ignore it without having to bother to come up with a balancing bias (as people like Caplan rightfully do, without really bothering to outline why).
-Leonard Susskind, Susskind’s Rule of Thumb
Not necessarily a great metric; working on the second-most-probable theory can be the best rational decision if the expected value of working on the most probable theory is lower due to greater cost or lower reward.
This is why many scientists are terrible philosophers of science. Not all of them, of course; Einstein was one remarkable exception. But it seems like many scientists have views of science (e.g. astonishingly naive versions of Popperianism) which completely fail to fit their own practice.
Yes. When chatting with scientists I have to intentionally remind myself that my prior should be on them being Popperian rather than Bayesian. When I forget to do this, I am momentarily surprised when I first hear them say something straightforwardly anti-Bayesian.
Examples?
Statements like “I reject the intelligence explosion hypothesis because it’s not falsifiable.”
I see. I doubt that it is as simple as naive Popperianism, however. Scientists routinely construct and screen hypotheses based on multiple factors, and they are quite good at it, compared to the general population. However, as you pointed out, many do not use or even have the language to express their rejection in a Bayesian way, as “I have estimated the probability of this hypothesis being true, and it is too low to care.” I suspect that they instinctively map intelligence explosion into the Pascal mugging reference class, together with perpetual motion, cold fusion and religion, but verbalize it in the standard Popperian language instead. After all, that is how they would explain why they don’t pay attention to (someone else’s) religion: there is no way to falsify it. I suspect that any further discussion tends to reveal a more sensible approach.
Yeah. The problem is that most scientists seem to still be taught from textbooks that use a Popperian paradigm, or at least Popperian language, and they aren’t necessarily taught probability theory very thoroughly, they’re used to publishing papers that use p-value science even though they kinda know it’s wrong, etc.
So maybe if we had an extended discussion about philosophy of science, they’d retract their Popperian statements and reformulate them to say something kinda related but less wrong. Maybe they’re just sloppy with their philosophy of science when talking about subjects they don’t put much credence in.
This does make it difficult to measure the degree to which, as Eliezer puts it, “the world is mad.” Maybe the world looks mad when you take scientists’ dinner party statements at face value, but looks less mad when you watch them try to solve problems they care about. On the other hand, even when looking at work they seem to care about, it often doesn’t look like scientists know the basics of philosophy of science. Then again, maybe it’s just an incentives problem. E.g. maybe the scientist’s field basically requires you to publish with p-values, even if the scientists themselves are secretly Bayesians.
I’m willing to bet most scientists aren’t taught these things formally at all. I never was. You pick it up out of the cultural zeitgeist, and you develop a cultural jargon. And then sometimes people who HAVE formally studied philosophy of science try to map that jargon back to formal concepts, and I’m not sure the mapping is that accurate.
I think ‘wrong’ is too strong here. Its good for some things, bad for others. Look at particle-accelerator experiments- frequentist statistics are the obvious choice because the collider essentially runs the same experiment 600 million times every second, and p-values work well to separate signal from a null-hypothesis of ‘just background’.
If there was a genuine philosophy of science illumination it would be clear that, despite the shortcomings of the logical empiricist setting in which Popper found himself , there is much more of value in a sophisticated Popperian methodological falsificationism than in Bayesianism. If scientists were interested in the most probable hypotheses, they would stay as close to the data as possible. But in fact they want interesting, informative, risky theories and genuine explanations. This goes against the Bayesian probabilist ideal. Moreover, you cannot falsify with Bayes theorem, so you’d have to start out with an exhaustive set of hypotheses that could account for data (already silly), and then you’d never get rid of them—they could only be probabilistically disconfirmed.
Strictly speaking, one can’t falsify with any method outside of deductive logic—even your own Severity Principle only claims to warrant hypotheses, not falsify their negations. Bayesian statistical analysis is just the same in this regard.
A Bayesian analysis doesn’t need to start with an exhaustive set of hypotheses to justify discarding some of them. Suppose we have a set of mutually exclusive but not exhaustive hypotheses. The posterior probability of an hypothesis under the assumption that the set is exhaustive is an upper bound for its posterior probability in an analysis with an expanded set of hypotheses. A more complete set can only make a hypotheses less likely, so if its posterior probability is already so low that it would have a negligible effect on subsequent calculations, it can safely be discarded.
I’m a Bayesian probabilist, and it doesn’t go against my ideal. I think you’re attacking philosophical subjective Bayesianism, but I don’t think that’s the kind of Bayesianism to which lukeprog is referring.
For what it’s worth, I understand well the arguments in favor of Bayes, yet I don’t think that scientific results should be published in a Bayesian manner. This is not to say that I don’t think that frequentist statistics is frequently and grossly mis-used by many scientists, but I don’t think Bayes is the solution to this. In fact, many of the problems with how statistics is used, such as implicitly performing many multiple comparisons without controlling for this, would be just as large of problems with Bayesian statistics.
Either the evidence is strong enough to overwhelm any reasonable prior, in which case frequentist statistics wlil detect the result just fine; or else the evidence is not so strong, in which case you are reduced to arguing about priors, which seems bad if the goal is to create a societal construct that reliable uncovers useful new truths.
But why not share likelihood ratios instead of posteriors, and then choose whether or not you also want to argue very much (in your scientific paper) about the priors?
What do you think “p<0.05” means?
The p-value is “the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.” It is often misinterpreted, e.g. by 68 out of 70 academic psychologists studied by Oakes (1986, pp. 79-82).
The p-value is not the same as the Bayes factor:
I wasn’t saying it was the same, my point is that reporting the data on which one can update in Bayesian manner is the norm. (As is updating, e.g. if the null hypothesis is really plausible, at p<0.05 nobody’s really going to believe you anyway)
With regards to the Bayes factor. The issue is that there is a whole continuum of alternate hypotheses. There’s no single factor between those that you can report on which could be used for combining evidence in favour of quantitatively different alternative “most supported” hypotheses. The case of the null hypothesis (vs all possible other hypotheses) is special in that regard, and so that is what a number is reported for.
With regards to the case of the ratio between evidence for two point hypotheses, as discussed in the article you link: Neyman-Pearson lemma is quite old.
With regards to the cause of experiment termination, you have to account somewhere for the fact that termination of the experiment has the potential to cherry pick and thus bias the resulting data (if that is what he’s talking about, because its not clear to me what is his point and it seems to me that he misunderstood the issue).
Furthermore, the relevant mathematics probably originates from the particle physics, where it serves a different role: a threshold on the p-value is here to quantify the worst-case likelihood that your experimental apparatus will be sending people on the wild goose chase. It has more to do with the value of the experiment than probabilities, given that priors for hypotheses in physics would require a well defined hypotheses space (which is absent). And given that the work on the production of stronger evidence is a more effective way to spend your time there than any debating of the priors. And given that the p-value related issues in any case can be utterly dwarfed by systematic errors and problems with the experimental set up, something the probability of which changes after the publication as other physicists do or do not point towards potential problems in the set up.
A side note: there’s a value of information issue here. I know that if I were to discuss Christian theology with you (not the atheism, but the fine points of the life of Jesus, that sort of thing, which I never really had time or inclination to look into), the expected value of information to you would be quite low. Because most of the time that I spent practising mathematics and such, you spent on the former. It would be especially the case if you entered some sort of very popular contest in any way requiring theological knowledge, and scored #10th of all time on a metric that someone else seen fit to chose in advance. The same goes for discussions of mathematics, but the other way around. This is also the case for any experts you are talking to. They’re rather rational people, that’s how they got to have impressive accomplishments, and a lot of practical rationality is about ignoring low expected value pursuits. Einsteins and Fermis of this world do not get to accomplish so much on so many different occasions without great innate abilities for that kind of thing. They also hold teaching positions and it is more productive for them to correct misconceptions in the eager students who are up to speed on the fundamental knowledge.
(with #10th I’m alluding to this result of mine ).
Mmm. I’ve read a lot of dumb papers where they show that their model beats a totally stupid model, rather than that their model beats the best model in the literature. In algorithm design fields, you generally need to publish a use case where your implementation of your new algorithm beats your implementation of the best other algorithms for that problem in the field (which is still gameable, because you implement both algorithms, but harder).
Thinking about the academic controversy I learned about most recently, it seems like if authors had to say “this evidence is n:1 support for our hypothesis over the hypothesis proposed in X” instead of “the evidence is n:1 support for our hypothesis over there being nothing going on” they would have a much harder time writing papers that don’t advance the literature, and you might see more scientists being convinced of other hypotheses because they have to implement them personally.
In physics a new theory has to be supported over the other theories, for example. What you’re talking about would have to be something that happens in sciences that primarily find weak effects in the noise and co-founders anyway, i.e. psychology, sociology, and the like.
I think you need to specifically mention what fields you are talking about, because not everyone knows that issues differ between fields.
With regards to malemployment debate you link, there’s a possibility that many of the college graduates have not actually learned anything that they could utilize, in the first place, and consequently there exist nothing worth describing as ‘malemployment’. Is that the alternate model you are thinking of?
Most of the examples I can think of come from those fields. There are a few papers in harder sciences which people in the field don’t take seriously because they don’t address the other prominent theories, but which people outside of the field think look serious because they’re not aware that the paper ignores other theories.
I was thinking mostly that it looked like the two authors were talking past one another. Group A says “hey, there’s heterogeneity in wages which is predicted by malemployment” whereas Group B says “but average wages are high, so there can’t be malemployment,” which ignores the heterogeneity. I do think that a signalling model of education (students have different levels of talent, and more talented students tend to go for more education, but education has little direct effect on talent) explains the heterogeneity and the wage differentials, and it would be nice to see both groups address that as well.
Once again, which education? Clearly, a training course for, say, a truck driver, is not signalling, but exactly what it says on the can: a training course for driving trucks. A language course, likewise so. Same goes for mathematics, hard sciences, and engineering disciplines. Which may perhaps be likened to necessity of training for a formula 1 driver, irrespective of the level of innate talent (within the human range of ability).
Now, if that was within the realm of actual science, something like this “signalling model of education” would be immediately invalidated by the truck driving example. No excuses. One can mend it into a “signalling model of some components of education in soft sciences”. Where there’s a big problem for “signalling” model: a PhD in those fields in particular is a poorer indicator of ability, innate and learned, than in technical fields (lower average IQs, etc), and signals very little.
edit: by the way, the innate ‘talent’ is not in any way exclusive of importance of learning; some recent research indicates that highly intelligent individuals retain neuroplasticity for longer time, which lets them acquire more skills. Which would by the way explain why child prodigies fairly often become very mediocre adults, especially whenever lack of learning is involved.
If there was a glut of trained truck drivers on the market and someone needed to recruit new crane operators, they could choose to recruit only truck drivers because having passed the truck driving course would signal that you can learn to operate heavy machinery reliably, even if nothing you learned in the truck driving course was of any value in operating cranes.
OSHA rules would still require that the crane operator passes the crane related training.
The term ‘signalling’ seem to have heavily drifted and mutated online to near meaninglessness.
If someone attends a truck driving course with the intention of driving trucks—or a math course with the intention of a: learning math and b: improving their thinking skills—that’s not signalling behaviour.
And conversely, if someone wants to demonstrate some innate or pre-existing quality (such as mathematical ability), they participate in a relevant contest, and this is signalling.
Now, there may well be a lot of people who start in an educated family and then sort of drift through the life conforming to parental wishes, and end up obtaining, say, a physics PhD. And then they go to economics or something similar where they do not utilize their training in much any way. One could deduce about these people that they are more innately intelligent than average, more wealthy than average, etc etc, and that they learned some thinking skills. The former two things are much more reliably signalled with an IQ test and a statement from IRS.
I think it’s more that the concept entered LessWrong via Robin Hanson’s expansion of the concept into his “Homo Hypocritus” theory. For examples, see every post on Overcoming Bias with a title of the form “X is not about Y”. This theory sees all communicative acts as signalling, that is to say, undertaken with the purpose of persuading someone that one possesses some desirable characteristic. To pass a mathematics test is just as much a signal of mathematical ability as to hang out with mathematicians and adopt their jargon.
There is something that distinguishes actual performance from other signals of ability: unforgeability. By doing something that only a mathematician could do, one sends a more effective signal—that is, one more likely to be believed—that one can do mathematics.
This is a radical re-understanding of communication. On this view, not one honest thing has ever been said, not one honest thing done, by anyone, ever. “Honesty” is not a part of how brains physically work. Whether we tell a truth or tell a lie, truth-telling is never part of our purpose, but no more than a means to the end of persuading people of our qualities. It is to be selected as a means only so far as it may happen to be more effective than the alternatives in a given situation. The concept of honesty as a virtue is merely part of such signalling.
The purpose of signalling desirable qualities is to acquire status, the universal currency of social interaction. Status is what determines your access to reproduction. Those who signal status best get to reproduce most. This is what our brains have evolved to do, for as long as they been able to do this at all. Every creature bigger than an earthworm does it.
Furthermore, all acts are communicative acts, and therefore all acts are signalling. Everything we do in front of someone else is signalling. Even in solitude we are signalling to ourselves, for we can more effectively utter false signals if we believe them. Every thought that goes through your head is self-signalling, including whatever thoughts you have while reading this. It’s all signalling.
Such, at least, is the theory.
...which means that describing an act as “signalling” is basically meaningless, insofar as it fails to ascribe to that act a property that distinguishes it from other acts. It’s like describing my lunch as “material”. True, yes, but uninteresting except as a launching point to distinguish among expensive and cheap signals, forgeable and unforgeable signals, purely external signals and self-signalling, etc.
That said, in most contexts when a behavior is described as “signalling” without further qualification I generally understand the speaker to be referring more specifically to cheap signalling which is reliably treated as though it were a more expensive signal. Hanging out with mathematicians and adopting their jargon without really understanding it usually falls in this category; completing a PhD program in mathematics usually doesn’t (though I could construct contrived exceptions in both cases).
That a proposition has the form “every X is a Y” does not make it uninteresting. For example: All matter is made of atoms. All humans are descended from apes. God made everything. Every prime number is a sum of three squares. Everyone requires oxygen to live. True or false, these are all meaningful, interesting statements. “All acts are acts of signalling” is similarly so.
Yes, this subtext is present whenever the concept of signalling is introduced (another example of an “all X is Y” which is nevertheless a meaningful observation).
Not really comparable to matter being made of atoms, though, as “signalling” only establishes a tautology (like all communication is communication).
“Signalling”, in this context, means “acts undertaken primarily for the purpose of gaining status by persuading others of one’s status-worthy qualities”. As such, “All communication is signalling” is not a tautology, but an empirical claim.
Many of those acts can be undertaken without having any such qualities, though.
I think Hanson’s ideas are far more applicable to Hanson’s own personal behaviour than to the world in general.
In particular, what he’s trying to do with his “Signalling theory” is not to tell us anything about human behaviour, but instead to try to imply that he is neglecting necessity for actual training, which would be consistent with him having some immense innate abilities but not trying hard.
Meanwhile out there in the real world, if you specifically want to get a job that requires you to speak Chinese, you are going to have to attend a course in Chinese, to actually learn Chinese. Unless you are actually native Chinese in which case you won’t have to attend that course. Which applies to most disciplines, with perhaps other disciplines for which the skill may not even exist monkey style imitating the rest.
Though depending on the situation I might still find that it’s useful to attend the course, so I can get certification as having gone through the course, which in the real world might be of more value than speaking Chinese without that certification.
And these sorts of certification-based (as opposed to skill-based) considerations apply to most disciplines as well.
And, of course, the fact that I’m applying for this job, which requires Chinese, is itself a choice I’m making, and we can ask why I’m making that choice, and to what extent my motives are status-seeking vs. truth-seeking vs. improvements-in-the-world-seeking vs. something else.
Conversely, if I am entirely uninterested in certification and I really am solely interested in learning Chinese for the intrinsic value of learning Chinese, I might find it’s more useful not to attend a course, but instead study Chinese on my own (e.g. via online materials and spending my afternoons playing Mahjong in Chinatown).
If you already speak Chinese, you’d just need to pass an exam, no course attached, and if you are a native speaker, you’d be correctly presumed to speak it better than someone who spent many years on a course, lived in China, etc.
I agree. I’m not defending Hanson’s theory, just saying what it is. Perhaps in more starkly extreme terms than he might, but I have never seen him put any limits on the concept. This, I am suggesting, is the origin of the broad application of the concept on LessWrong.
Quite so. But you are thinking like an engineer—that is, you are thinking in terms of actually getting things done. This is the right way to think, but it is not the way of the Hansonian fundamentalist (an imaginary figure that appears in my head when I contemplate signalling theory, and should not be confused with Robin Hanson himself).
The Hansonian fundamentalist would respond that it’s still all signalling. The only thing that he aims at getting done is the acquisition of status for himself. All else is means. The role that the actual ability to speak Chinese plays is that of an unforgeable signal, a concept which replaces that of truth, as far as what goes on inside our heads is concerned. Tarski’s definition of truth stands, but the Litany of Tarski does not. It is replaced by, “If X is true, I desire whatever attitude to X will maximise my status; if X is false, I desire whatever attitude to X will maximise my status. Let me not become attached to anything but status.”
If the job really cannot be done without good spoken Chinese, then to keep that job, you will need that ability. But if in the particular situation you correctly judged that you could get by with English and the help of a Chinese secretary, busk your way through the training course, and pull strings to keep your job if you run into difficulties, then that would be Homo Hypocritus’ choice. Homo Hypocritus does whatever will work best to convince his boss of his worthy qualities, with what lawyers call reckless disregard for the truth. Truth is never a consideration, except as a contingent means to status.
ETA:
He does have tenure at a reputable American university, which I think is not a prize handed out cheaply. OTOH, I am reminded of a cartoon whose caption is “Mad? Of course I’m mad! But I have tenure!”
At that point we aren’t really talking about signalling innate qualities, we’re talking of forgeries and pretending. Those only work at all because there are people who are not pretending.
A fly that looks like a wasp is only scary because there are wasps with venom that actually works. And those wasps have venom so potent because they actually use it to defend the hives. They don’t merely have venom to be worthy of having bright colours. Venom works directly, not through the bright colour.
One could of course forge the signals and then convince themselves that they are honestly signalling the ability to forge signals… but at the end of the day, this fly that looks like a wasp, it is just a regular fly, and it only gets an advantage from us not being fully certain that it is a regular fly. And the flies that look like wasps are not even close to displacing the other flies - there’s an upper limit on those.
Well, tenure is an example of status… and in his current field there may not be as many equivalents of “venom actually working” as in other fields so it looks like it is all about colours.
Yup, that’s true.
You can say that whether it’s signaling is determined by the motivations of the person taking the course, or the motivations of the people offering the course, or the motivations of employers hiring graduates of the course. And you can define motivation as the conscious reasons people have in their minds, or as the answer to the question of whether the person would still have taken the course if it was otherwise identical but provided no signaling benefit. And there can be multiple motivations, so you can say that something is signaling if signaling is one of the motivations, or that it’s signaling only if signaling is the only motivation.
If you make the right selections from the previous, you can argue for almost anything that it’s not signaling, or that it is for that matter.
If I wanted to defend competitions from accusations of signaling like you defended education, I could easily come up with lots of arguments. Like people doing them to challenge themselves, experience teamwork, test their limits and meet like-minded people. And the fact that lots of people that participate in competitions even though they know they don’t have a serious chance of coming on top, etc.
(Sure, but I meant that only truck drivers would be accepted into the crane operator training in the first place, because they would be more likely to pass it and perform well afterward.)
Given the way the term is actuallly used, I wouldn’t call that “signalling” because “signalling” normally refers to demonstrating that you have some trait by doing something other than performing the trait itself (if it’s capable of being performed). You can signal your wealth by buying expensive jewels, but you can’t signal your ability to buy expensive jewels by buying expensive jewels. And taking a math test to let people know that you’re good at math is not signalling, but going to a mathematicians’ club to let people know that you’re good at math may be signalling.
This seem to be the meaning common on these boards, yes.
Going to mathematicians club (and the like) is something that you can do if you aren’t any good at math, though. And it only works as a “signal” of being good at math because most people go to that club for other reasons (that would be dependent on being good at math).
Signalling was supposed to be about credibly conveying information to another party whenever there is a motivation for you to lie.
It seems that instead signalling is used to refer to behaviours portrayed in “Flowers for Charlie” episode of “It’s always sunny in Philadelphia”.
Generally, the signalling model of education refers to the wage premium paid to holders of associates, bachelors, masters, and doctoral degrees, often averaged across all majors. (There might be research into signalling with regards to vocational degrees, but I think most people that look into that are more interested in licensing / scarcity effects.)
Well, in the hard science majors, there’s considerable training, which is necessary for a large fraction of occupations. Granted, a physics PhD who became an economist may have been signalling, but it is far from the norm. What is the norm is that vast majority of individuals employed as physics PhDs would be unable to perform some parts of their work if they hadn’t undergone relevant training, just as you wouldn’t have been able to speak a foreign language or drive a car without training.
(Your point is well taken but...)
Approximately it means “I have a financial or prestige incentive to find a relationship and I work in a field that doesn’t take its science seriously”.
Or, for instance in the case of particle physics, it means the probability you are just looking at background. You are painting with an overly broad brush. Sure, p-values are overused, but there are situations where the p-value IS the right thing to look at.
No, it’s the probability that you’d see a result that extreme (or more extreme) conditioned on just looking at background. Frequentists can’t evaluate unconditional probabilities, and ‘probability that I see noise given that I see X’ (if that’s what you had in mind) is quite different from ‘probability that I see X given that I see noise’.
(Incidentally, the fact that this kind of conflation is so common is one of the strongest arguments against defaulting to p-values.)
Keep in mind that he and other physicists do not generally consider “probability that it is noise, given an observation X” to even be a statement about the world (it’s a statement about one’s personal beliefs, after all, one’s confidence in the engineering of an experimental apparatus, and so on and so forth), so they are perhaps conflating much less than it would appear under very literal reading. This is why I like the idea of using the word “plausibility” to describe beliefs, and “probability” to describe things such as the probability of an event rigorously calculated using a specific model.
edit: note by the way that physicists can consider a very strong result—e.g. those superluminal neutrinos—extremely implausible on the basis of a prior—and correctly conclude that there is most likely a problem with their machinery, on the basis of ratio between the likelihood of seeing that via noise to likelihood of seeing that via hardware fault. How’s that even possible without actually performing Bayesian inference?
edit2: also note that there is a fundamental difference as with plausibilities you will have to be careful to avoid vicious cycles in the collective reasoning. Plausibility, as needed for combining it with other plausibilities, is not a real number, it is a real number with attached description of how exactly it was made, so that evidence would not be double-counted. The number itself is of little use to communication for this reason.
It’s about the probability that there is an effect which will cause this deviation from background to become more and more supported by additional data rather than simply regress to the mean (or with your wording, the other way around). That seems fairly based-in-the-world to me.
The actual reality either has this effect, or it does not. You can quantify your uncertainty with a number, that would require you to assign some a-priori probability, which you’ll have to choose arbitrarily.
You can contrast this to a die roll which scrambles initial phase space, mapping (approximately but very close to) 1⁄6 of any physically small region of it to each number on the die, the 1⁄6 being an objective property of how symmetrical dies bounce.
Such statements are about the world, in a framework of probability.
They are specific to your idiosyncratic choice of prior, I am not interested in hearing them (in the context of science), unlike the statements about the world.
That knowledge is subjective doesn’t mean that such statements are not about the world. Furthermore, such statements can (and sometimes do) have arguments for the priors...
By this standard, any ‘statement about the world’ ignores all of the uncertainty that actually applies. Science doesn’t require you to sweep your ignorance under the rug.
Well, technically, the probability that you will end up with a result given that you are just looking at background. I.e. the probability that after the experiment you will end up looking at background thinking it is not background*, assuming it is all background.
if it is used for a threshold for such thinking
It’s really awkward to describe that in English, though, and I just assume that this is what you mean (while Bayesianists assume that you are conflating the two).
Note that the ‘brush’ I am using is essentially painting the picture “0.05 is for sissies”, not a rejection of p-values (which I may do elsewhere but with less contempt). The physics reference was to illustrate the contrast of standards between fields and why physics papers can be trusted more than medical papers.
That’s what multiple testing correction is for.
With the thresholds from physics, we’d still be figuring out if penicillin really, actually kills certain bacteria (somewhat hyperbolic, 5 sigma ~ 1 in 3.5 million).
0.05 is a practical tradeoff, for supposed Bayesians, it is still much too strict, not too lax.
I for one think that 0.05 is way too lax (other than for the purposes of seeing whenever it is worth it to conduct a bigger study and other such value-of-information related uses) and 0.05 results require rather carefully constructed meta-study to interpret correctly. Because a selection factor of 20 is well within the range attainable by dodgy practices that are almost impossible to prevent, and even in the absence of the dodgy practices, selection due to you being more likely to hear of something interesting.
I can only imagine considering it too strict if I were unaware of those issues or their importance (Bayesianism or not)
This goes much more so for weaker forms of information, such as “Here’s a plausible looking speculation I came up with”. To get anywhere with that kind of stuff one would need to somehow account for the preference towards specific lines of speculation.
edit: plus, effective cures in medicine are the ones supported by very very strong evidence, on par with particle physics (e.g. the same penicillin killing bacteria, you have really big sample sizes when you are dealing with bacteria). The weak stuff—antidepressants for which we don’t know if they lower or raise the risk of the suicide, and are uncertain whenever the effect is an artefact from using in any way whatsoever a depression score that includes weight loss and insomnia as symptoms when testing a drug that causes weight gain and sleepiness.
I think it is mostly because priors for finding a strongly effective drug are very low, so when large p-values are involved, you can only find low effect, near-placebo drugs.
edit2: Other issue is that many studies are plagued by at least some un-blinding that can modulate the placebo effect. So, I think a threshold on the strength of the effect (not just p-value) is also necessary—things that are within the potential systematic error margin from the placebo effect may mostly be a result of systematic error.
edit3: By the way, note that for a study of same size, stronger effect will result in much lower p-value, and so a higher standard on p-values does not interfere with detection of strong effects much. When you are testing an antibiotic… well, the chance probability of one bacterium dying in some short timespan may be 0.1, and with antibiotic at a fairly high concentration, 99.99999… . Needless to say, a dozen bacteria put you far beyond the standards from the particle physics, and a whole poisoned petri dish makes point moot, with all the unconfidence coming from the possibility of killing the bacteria in some other way.
It probably is too lax. I’d settle for 0.01, but 0.005 or 0.001 would be better for most applications (i.e—where you can get it). We have have the whole range of numbers between 1 in 25 and 1 in 3.5 million to choose from, and I’d like to see an actual argument before concluding that the number we picked mostly from historical accident was actually right all along. Still, a big part of the problem is the ‘p-value’ itself, not the number coming after it. Apart from the statistical issues, it’s far too often mistaken for something else, as RobbBB has pointed out elsewhere in this thread.
No, it isn’t. In an environment where the incentive to find a positive result in huge and there are all sorts of flexibilities in what particular results to report and which studies to abandon entirely, 0.05 leaves far too many false positives. I really does begin to look like this. I don’t advocate using the standards from physics but p=0.01 would be preferable.
Mind you, there is no particularly good reason why there is an arbitrary p value to equate with ‘significance’ anyhow.
Well, I would find it really awkward for a Bayesian to condone a modus operandi such as “The p-value of 0.15 indicates it is much more likely that there is a correlation than that the result is due to chance, however for all intents and purposes the scientific community will treat the correlation as non-existent, since we’re not sufficiently certain of it (even though it likely exists)”.
Similar to having choice of two roads to go down, one of which leads into the forbidden forest. Then saying “while I have decent evidence which way goes where, because I’m not yet really certain, I’ll just toss a coin.” How many false choices would you make in life, using an approach like that? Neglecting your duty to update, so to speak. A p-value of 0.15 is important evidence. A p-value of 0.05 is even more important evidence. It should not be disregarded, regardless of the perverse incentives in publishing and the false binary choice (if (p<=0.05) correlation=true, else correlation=false). However, for the medical community, a p-value of 0.15 might as well be 0.45, for practical purposes. Not published = not published.
This is especially pertinent given that many important chance discoveries may only barely reach significance initially, not because their effect size is so small, but because in medicine sample sizes often are, with the accompanying low power of discovering new effects. When you’re just a grad student with samples from e.g. 10 patients (no economic incentive yet, not yet a large trial), unless you’ve found magical ambrosia, p-values may tend to be “insignificant”, even of potentially significant breakthrough drugs .
Better to check out a few false candidates too many than to falsely dismiss important new discoveries. Falsely claiming a promising new substance to have no significant effect due to p-value shenanigans is much worse than not having tested it in the first place, since the “this avenue was fruitless” conclusion can steer research in the wrong direction (information spreads around somewhat even when unpublished, “group abc had no luck with testing substances xyz”).
IOW, I’m more concerned with false negatives (may never get discovered as such, lost chance) than with false positives (get discovered later on—in larger follow-up trials—as being false positives). A sliding p-value scale may make sense, with initial screening tests having a lax barrier signifying a “should be investigated further”, with a stricter standard for the follow-up investigations.
And this is a really, really great reason not to identify yourself as “Bayesian”. You end up not using effective methods when you can’t derive them from Bayes theorem. (Which is to be expected absent very serious training in deriving things).
Where do you think the funds for testing false candidates are going to come from? If you are checking too many false candidates, you are dismissing important new discoveries. You are also robbing time away from any exploration into the unexplored space.
edit: also I think you overestimate the extent to which promising avenues of research are “closed” by a failure to confirm. It is understood that a failure can result from a multitude of causes. Keep in mind also that with a strong effect, you have quadratically better p-value for the same sample size. You are at much less of a risk of dismissing strong results.
The way statistically significant scientific studies are currently used is not like this. The meaning conveyed and the practical effect of official people declaring statistically significant findings is not a simple declaration of the Bayesian evidence implied by the particular statistical test returning less than 0.05. Because of this, I have no qualms with saying that I would prefer lower values than p<0.05 to be used in the place where that standard is currently used. No rejection of Bayesian epistemology is implied.
No, the multiple comparisons problem, like optional stopping, and other selection effects that alter error probabilities are a much greater problem in Bayesian statistics because they regard error probabilities and the sampling distributions on which they are based as irrelevant to inference, once the data are in hand. That is a consequence of the likelihood principle (which follows from inference by Bayes theorem). I find it interesting that this blog takes a great interest in human biases, but guess what methodology is relied upon to provide evidence of those biases? Frequentist methods.
Deborah, what do you think of jsteinhardt’s Beyond Bayesians and Frequentists?
Great quote.
Unfortunately, we find ourselves in a world where the world’s policy-makers don’t just profess that AGI safety isn’t a pressing issue, they also aren’t taking any action on AGI safety. Even generally sharp people like Bryan Caplan give disappointingly lame reasons for not caring. :(
Why won’t you update towards the possibility that they’re right and you’re wrong?
This model should rise up much sooner than some very low prior complex model where you’re a better truth finder about this topic but not any topic where truth-finding can be tested reliably*, and they’re better truth finders about topics where truth finding can be tested (which is what happens when they do their work), but not this particular topic.
(*because if you expect that, then you should end up actually trying to do at least something that can be checked because it’s the only indicator that you might possibly be right about the matters that can’t be checked in any way)
Why are the updates always in one direction only? When they disagree, the reasons are “lame” according to yourself, which makes you more sure everyone’s wrong. When they agree, they agree and that makes you more sure you are right.
It’s not so much that I’m a better truth finder, it’s that I’ve had the privilege of thinking through the issues as a core component of my full time job for the past two years, and people like Caplan only raise points that have been accounted for in my model for a long time. Also, I think the most productive way to resolve these debates is not to argue the meta-level issues about social epistemology, but to have the object-level debates about the facts at issue. So if Caplan replies to Carl’s comment and my own, then we can continue the object-level debate, otherwise… the ball’s in his court.
This doesn’t appear to be accurate. E.g. Carl & Paul changed my mind about the probability of hard takeoff. And when have I said that some public figure agreeing with me made me more sure I’m right? See also my comments here.
If I mention a public figure agreeing with me, it’s generally not because this plays a significant role in my own estimates, it’s because other people think there’s a stronger correlation between social status and correctness than I do.
Yes, but why Caplan did not see it fit to think about the issue for a significant time, and you did?
There’s also the AI researchers who have had the privilege of thinking about relevant subjects for a very long time, education, and accomplishments which verify that their thinking adds up over time—and who are largely the actual source for the opinions held by the policy makers.
By the way, note that the usual method of rejection of wrong ideas, is not even coming up with wrong ideas in the first place, and general non-engagement of wrong ideas. This is because the space of wrong ideas is much larger than the space of correct ideas.
What I expect to see in the counter-factual world where the AI risk is a big problem, is that the proponents of the AI risk in that hypothetical world have far more impressive and far more relevant accomplishments and credentials.
The first problem with highly speculative topics is that great many arguments exist in favour of either opinion on a speculative topic. The second problem is that each such argument relies on a huge number of implicit or explicit assumptions that are likely to be violated due to their origin as random guesses. The third problem is that there is no expectation that the available arguments would be a representative sample of the arguments in general.
Hmm, I was under the impression that you weren’t a big supporter of the hard takeoff to begin with.
Well, your confidence should be increased by the agreement; there’s nothing wrong with that. The problem is when it is not balanced by the expected decrease by disagreement.
There are a great many differences in our world model, and I can’t talk through them all with you.
Maybe we could just make some predictions? E.g. do you expect Stephen Hawking to hook up with FHI/CSER, or not? I think… oops, we can’t use that one: he just did. (Note that this has negligible impact on my own estimates, despite him being perhaps the most famous and prestigious scientist in the world.)
Okay, well… If somebody takes a decent survey of mainstream AI people (not AGI people) about AGI timelines, do you expect the median estimate to be earlier or later than 2100? (Just kidding; I have inside information about some forthcoming surveys of this type… the median is significantly sooner than 2100.)
Okay, so… do you expect more or fewer prestigious scientists to take AI risk seriously 10 years from now? Do you expect Scott Aaronson and Peter Norvig, within 25 years, to change their minds about AI timelines, and concede that AI is fairly likely within 100 years (from now) rather than thinking that it’s probably centuries or millennia away? Or maybe you can think of other predictions to make. Though coming up with crisp predictions is time-consuming.
Well, I too expect some form of something that we would call “AI”, before 2100. I can even buy into some form of accelerating progress, albeit the progress would be accelerating before the “AI” due to the tools using relevant technologies, and would not have that sharp of a break. I even do agree that there is a certain level of risk involved in all the future progress including progress of the software.
I have a sense you misunderstood me. I picture this parallel world where legitimate, rational inferences about the AI risk exist, and where this risk is worth working at in 2013 and stands out among the other risks, as well as any other pre-requisites for making MIRI worthwhile hold. And in this imaginary world, I expect massively larger support than “Steven Hawkins hooked up with FHI” or what ever you are outlining here.
You do frequently lament that the AI risk is underfunded, under-supported, and there’s under-awareness about it. In the hypothetical world, this is not the case and you can only lament that the rational spending should be 2 billions rather than 1 billion.
edit: and of course, my true rejection is that I do not actually see rational inferences leading there. The imaginary world stuff is just a side-note to explain how non-experts generally look at it.
edit2: and I have nothing against FHI’s existence and their work. I don’t think they are very useful, or address any actual safety issues which may arise, though, but with them I am fairly certain they aren’t doing any harm either (Or at least, the possible harm would be very small). Promoting the idea that AI is possible within 100 years, however, is something that increases funding for AI all across the board.
Right, this just goes back to the same disagreement in our models I was trying to address earlier by making predictions. Let me try something else, then. Here are some relevant parts of my model:
I expect most highly credentialed people to not be EAs in the first place.
I expect most highly credentialed people to not be familiar with the arguments for caring about the far future.
I expect most highly credential people to be mostly just aware of risks they happen to have heard about (e.g. climate change, asteroids, nuclear war), rather than attempting a systematic review of risks (e.g. by reading the GCR volume).
I expect most highly credentialed people to respond fairly well when actuarial risk is easily calculated (e.g. asteroid risk), and not-so-well when it’s more difficult to calculate (e.g. many insurance companies went bankrupt after 9/11).
I expect most highly credentialed people to have spent little time on explicit calibration training.
I expect most highly credentialed people to not systematically practice debiasing like some people practice piano.
I expect most highly credentialed people to know very little about AI, and very little about AI risk.
I expect that in general, even those highly credentialed people who intuitively think AI risk is a big deal will not even contact the people who think about AI risk for a living in order to ask about their views and their reasons for them, due to basic VoI failure.
I expect most highly credentialed people to have fairly reasonable views within their own field, but to often have crazy views “outside the laboratory.”
I expect most highly credentialed people to not have a good understanding of Bayesian epistemology.
I expect most highly credentialed people to continue working on, and caring about, whatever their career has been up to that point, rather than suddenly switching career paths on the basis of new information and an EV calculation.
I expect most highly credentialed people to not understand lots of pieces of “black swan epistemology” like this one and this one.
etc.
Luke, why are you arguing with Dmytry?
The question should not be about “highly credentialed” people alone, but about how they fare compared to people who are rather very low “credentialed”.
In particular, on your list, I expect people with fairly low credentials to fare much worse, especially at identification of the important issues as well as on rational thinking. Those combine multiplicatively, making it exceedingly unlikely—despite the greater numbers of the credential-less masses—that people who lead the work on an important issue would have low credentials.
What’s EA? Effective altruism? If it’s an existential risk, it kills everyone, selfishness suffices just fine.
Ohh, come on. That is in no way a demonstration that insurance companies in general follow faulty strategies, and especially is not a demonstration that you could do better.
Indeed.
A selfish person protecting against existential risk builds a bunker and stocks it with sixty years of foodstuffs. That doesn’t exactly help much.
For what existential risks is this actually an effective strategy?
A global pandemic that kills everyone?
The quality of life in a bunker is really damn low. Not to mention that you presumably won’t survive this particular risk in a bunker.
No doubt! I wasn’t comparing highly credentialed people to low-credentialed people in general. I was comparing highly credentialed people to Bostrom, Yudkowsky, Shulman, etc.
But why exactly would you expect conventional researchers in AI and related technologies (also including provable software, as used in the aerospace industry, and a bunch of other topics), with credentials and/or accomplishments in said fields, to fare worse on that list’s score?
Furthermore, with regards to the rationality, risks of mistake, and such… very little was done that can be checked for correctness in a clear cut way—most is of such nature that even when wrong it would not be possible to conclusively demonstrate it wrong. The few things that can be checked… look, when you write an article like this , discussing irrationality of Enrico Fermi, there’s a substantial risk of appearing highly arrogant (and irrational) if you get the technical details wrong. It is a miniature version of AI risk problem—you need to understand the subject, and if you don’t, there’s negative consequences. It is much, much easier to not goof up in things like that, than AI direction.
As you guys are researching into actual AI technologies, the issue is that one should be able to deem your effort less of a risk. Mere “we are trying to avoid risk and we think they don’t” can’t do. The cost of a particularly bad friendly AI goof-up is a sadistic AI (to borrow the term from Omohundro). A sadistic AI can probably run far more tortured minds than a friendly AI can run minds, by a very huge factor, so the risk of a goof up must be quite a lot lower than anyone demonstrated.
BTW, I went back and numbered the items in my list so they’re easier to refer to.
Because very few people in general, including credentialed AI people, satisfy (1), (2), (3), (5), (6), (7)†, (8), (10), and (12), but Bostrom, Yudkowsky and Shulman rather uncontroversially do satisfy those items. I also expect B/Y/S to outperform most credentialed experts on (4), (9), and (11), but I understand that’s a subjective judgment call and it would take a long time for me to communicate my reasons.
† The AI risk part of 7, anyway. Obviously, AI people specifically know a lot about AI.
Edit: Also, I’ll briefly mention that I haven’t downvoted any of your comments in this conversation.
Ok, let’s go over your list, for the AI people.
If EA is effective altruism, that’s not relevant because one doesn’t have to be an altruist to care about existential risks.
I expect them to be able to come up with that independently if it is a good idea.
I expect intelligent people to be able to foresee risks, especially when prompted by the cultural baggage (modern variations on the theme of Golem)
Well, that ought to imply some generally better ability to evaluate hard to calculate probabilities, which would imply that you guys should be able to make quite a bit of money.
The question is how well are they calibrated, not how much time they spent. You guys see miscalibration of famous people everywhere, even in Enrico Fermi.
Once again, how unbiased is what’s important, not how much time spent on a very specific way to acquire an ability. I expect most accomplished people to have encountered far more feedback on being right / being wrong through their education and experience.
Doesn’t apply to people in AI related professions.
The way to raise VoI is prior history of thinking about something else for a living, with impressive results.
Well, less credentialed people are just like this except they don’t have a laboratory inside of which they are sane, that’s usually why they are less credentialed in the first place.
Of your 3, I only weakly expect Bostrom to have learned the necessary fundamentals for actually applying Bayes theorem correctly in somewhat non-straightforward cases.
Yes, the basic formula is simple, but derivations are subtle and complex for non independent evidence or cases involving loops in the graph or all those other things…
It’s like arguing that you are better equipped for a job at Weta Digital than any employee there because you know quantum electrodynamics (the fundamentals of light propagation), and they’re using geometrical optics.
I expect many AI researchers to understand the relevant mathematics a lot, lot better than the 3 on your list.
And I expect credentialed people in general to have a good understanding of the variety of derivative tricks that are used to obtain effective results under uncertainty when the Bayes theorem can not be effectively applied.
Yeah, well, and I expect non-credentialed people to have too much to lose from backing out of it in the event that the studies return a negative.
You lose me here.
I would make a different list, anyway. There’s my list:
Relevant expertise as measured by educational credentials and/or accomplishments. Expertise is required for correctly recognizing risks (e.g. an astronomer is better equipped for recognizing risks from the outer space, a physicist for recognizing faults in a nuclear power plant design, et cetera)
Proven ability to make correct inferences (largely required for 1).
Self preservation (most of us have it)
Lack of 1 is an automatic dis-qualifier in my list. It doesn’t matter how much you are into things that you think are important for identifying, say, faults in a nuclear power plant design. If you are not an engineer, a physicist, or the like, you aren’t going to qualify for that job via some list you make yourself, which conveniently omits (1).
edit: list copy paste failed.
I disagree with many of your points, but I don’t have time to reply to all that, so to avoid being logically rude I’ll at least reply to what seems to be your central point, about “relevant expertise as measured by educational credentials and/or accomplishments.”
Who has educational credentials and/or accomplishments relevant to future AGI designs or long-term tech forecasting? Also, do you particularly disagree with what I wrote in AGI Impact Experts and Friendly AI Experts?
Also, in general, I’ll just remind everyone reading this that I don’t think these meta-level debates about proper social epistemology are as productive as object-level debates about strategically relevant facts (e.g. facts relevant to the theses in this post). Argument screens off authority, and all that.
Edit: Also, my view of Holden Karnofsky might be illustrative. I take Holden Karnofsky more seriously than almost anyone on the cost-effectiveness of global health interventions, despite the fact that he has 0 relevant degrees, 0 papers published in relevant journals, 0 awards for global health work, etc. Degrees and papers and so on are only proxy variables for what we really care about, and are easily screened off by more relevant variables, both for the case of Karnofsky on global health and for the case of Bostrom, Yudkowsky, Shulman, etc. on AI risk.
For Karnofsky and to some extent Bostrom yes, Shulman is debatable, Yudkowsky tried to get screened (tried to write a programming language, for example, wrote a lot of articles on various topics, many of them wrong, tried to write technical papers (TDT), really badly), and failed to pass the screening by a very big margin. Entirely irrational arguments about 10% counter-factual impact of his are also a part of failure. Omohundro passed with flying colours (his PhD is almost entirely irrelevant at that point, as it is screened off by his accomplishments in AI).
Exactly. All of this is wasted effort once either FAI or UFAI is developed.
There’s the more relevant accomplishments, there are less relevant accomplishments, and lacks of accomplishment.
I agree that a discussion of strategically relevant facts would be much more productive. I don’t see facts here. I see many speculations. I see a lot of making things up to fit the conclusion.
If I were to tell you that I can, for example, win a very high stakes programming contest (with a difficult, open problem that has many potential solutions that can be ranked in terms of quality), the discussion of my approach to the contest problem between you and me would be almost useless for your or my prediction of victory (provided that basic standards of competence are met), irrespective of whenever my idea is good. Prior track record, on the other hand, would be a good predictor. This is how it is for a very well defined problem. It is not going to be better for a less well understood problem.
‘EA’ here refers to the traits a specific community seems to exemplify (though those traits may occur outside the community). So more may be suggested than the words ‘effective’ and ‘altruism’ contain.
In terms of the terms, I think ‘altruism’ here is supposed to be an inclination to behave a certain way, not an other-privileging taste or ideology. Think ‘reciprocal altruism’. You can be an egoist who’s an EA, provided your selfish calculation has led you to the conclusion that you should devote yourself to efficiently funneling money to the world’s poorest, efficiently reducing existential risks, etc. I’m guessing by ‘EA’ Luke has in mind a set of habits of looking at existential risks that ‘Effective Altruists’ tend to exemplify, e.g., quantifying uncertainty, quantifying benefit, strongly attending to quantitative differences, trying strongly to correct for a specific set of biases (absurdity bias, status quo bias, optimism biases, availability biases), relying heavily on published evidence, scrutinizing the methodology and interpretation of published evidence....
My own experience is that I independently came up with a lot of arguments from the Sequences, but didn’t take them sufficiently seriously, push them hard enough, or examine them in enough detail. There seems to be a big gap between coming up with an abstract argument for something while you’re humming in the shower, and actually living your life in a way that’s consistent with your believing the argument is sound.
But we are speaking of credentialed people. They’re fairly driven.
Furthermore, general non acceptance of an idea is evidence that the idea is not good. You can’t seriously be listing general non acceptance of your ideas by the relevant experts as the reason why you are superior to those experts, because same non acceptance lowers the probability that those ideas are correct, proportionally to how much it raises how exceptional you are for holding those views. (The biggest problem with “Bayesianism” is dis-balanced/selective updates)
In particular, when it comes to the interview that he linked for reasons why value the future…
First off, if one can support existential risk for non Pascal’s wager type reasons then enormous utility of the future should not be relevant. If it is actually a requirement then I don’t think there’s anything to discuss here.
Secondarily, the most common norm of morality (Assuming we ignore things like Sharia), as specified in the laws of progressive countries, or as extrapolation of legal progress in less progressive ones, is to value the future people (we disapprove of smoking while pregnant), but not value counter-factual creation of future people (we allow abortion, and especially when the child would be disadvantaged and not have a fair chance). Rather than inferring the prevailing morality from the law and discussing it, various bad ideas are invented and discussed to make the argument appear stronger than it really is.
It is not that I am not exposed to this worldview. I am. It is that choosing between A: hurt someone, but a large number of happy people will be created, and B: not hurt someone, but a large number of happy people will not be created (with the deliberate choice having the causal impact on the hurting and creation), A is both illegal and immoral.
When I hear that Joe has a new argument against a belief of mine, then my confidence in my belief lowers a bit, and my confidence in Joe’s competence also lowers a bit. If I then go on to actually evaluate the argument in detail and discover that it’s an extraordinarily poor one, this should generally increase my confidence to higher than it was before I heard that Joe had an argument, and it should further lower my confidence in Joe’s competence.
I’ve spent enough time looking at the specific arguments for and against many of these propositions to have the contents of those arguments overwhelm my expertise priors in both directions, such that I just don’t see a whole lot of value in discussing anything but the arguments themselves, when my goal (and yours) is to figure out the level of merit of the arguments.
It sounds like you’re committing the Pascal’s Wager Fallacy Fallacy. If you aren’t, then I’m not understanding your point. Large future utilities should count more than small future utilities, and multiplying by low probabilities is fine if the probabilities aren’t vanishingly low.
I think there’s a quantitative tradeoff between the happiness of currently existent people and the happiness of possibly-created people. A strict rule ‘Counterfactual People Have Absolutely No Value’ leads to absurd conclusions, e.g., it’s not worthwhile to create an infinite number of infinitely happy and well-off people if the cost is that your shoulder itches for a few seconds. It’s at least a little worthwhile to create people with awesome lives, even if they should get weighted less than currently existent people.
You don’t want the outcome to be biased by the availability of the arguments, right? Really, I think you do not account for the fact that the available arguments are merely samples from the space of possible arguments (which make different speculative assumptions, in a very large space of possible speculations). Picked non uniformly, too, as arguments for one side may be more available, or their creation may maximize personal present-day utility of more agents. Individual samples can’t be particularly informative in such a situation.
The issue is that the number of people you can speculate you affect grows much faster than the prior for the speculation decreases. Constant factors do not help with that, they just push the problem a little further.
I don’t see that as problematic. Ponder the alternative for a moment: you may be ok with a shoulder itch, but are you OK with 10 000 years of the absolutely worst torture imaginable, for the sake of creation of 3^^^3 or 3^^^^^3 or however many really happy people? What’s about your death vs their creation?
edit: also you might have the value of those people to yourself (as potential mates and whatnot) leaking in.
forgot to address this:
If the probabilities aren’t vanishingly low, you reach basically same conclusions without requiring extremely large utilities. 7 billion people dying is quite a lot, too. If you see extremely large utilities on a list of requirements for caring about the issue, when you already have at least 7 billion lives at stake, then it is a Pascal’s wager.
Actually, I don’t see vanishingly small probabilities problematic, I see small probabilities where the bulk of probability mass is unaccounted for, problematic. E.g. response to low risk from a specific asteroid is fine, because it’s alternative positions in space are accounted for (and you have assurance you won’t put it on an even worse trajectory)
Updating on someone else’s decision to accept or reject a position should depend on their reason for their position. Information cascades is relevant.
Yes, of course. But also keep in mind that wrong positions are often rejected by the mechanism that generates positions, rather than the mechanism that checks the generated positions.
After reading Robin’s exposition of Bryan’s thesis, I would disagree that his reasons are disappointingly lame.
Which could either indicate that the reasons are good or that your standards are lower than Luke’s and so trigger no disappointment.
Bryan is expressing a “standard economic intuition” but… did you see Carl’s comment reply on Caplan’s post, and also mine?
I did see Eelco Hoogendoorn ’s and it is absolutely spot on.
I’m hardly a fan of Caplan, but he has some Bayesianism right:
Based on how things like this asymptote or fail altogether, he has a low prior for foom.
He has low expectation of being able to identify in advance (without the work equivalent to the creation of the AI) exact mechanisms by which it is going to asymptote or fail, irrespective of whenever it does or does not asymptote or fail, so not knowing such mechanisms does not bother him a whole lot.
Even assuming he is correct he expects a plenty of possible arguments against this position (which are reliant on speculations), as well as expects to see some arguers, because the space of speculative arguments is very huge. So such arguments are not going to move him anywhere.
People don’t do that explicitly any more than someone who’s playing football is doing Newtonian mechanics explicitly. Bayes theorem is no less fundamental than the laws of motion of the football.
Likewise for things like non-testability: nobody’s doing anything explicitly, it is just the case that due to something you guys call “conservation of expected evidence” , when there is no possibility of evidence against a proposition, then a possibility of evidence in favour of the proposition would violate the Bayes theorem.
I’m not sure how you could have such a situation, given that absence of expected evidence is evidence of the absence. Do you have an example?
Well, the probabilities wouldn’t be literally zero. What I mean is that lack of a possibility of strong evidence against something, and only a possibility of very weak evidence against it (via absence of evidence) implies that strong evidence in favour of it must be highly unlikely. Worse, such evidence just gets lost among the more probable ‘evidence that looks strong but is not’.
Ah, I think I follow you.
Absence of evidence isn’t necessarily a weak kind of evidence.
If I tell you there’s a dragon sitting on my head, and you don’t see a dragon sitting on my head, then you can be fairly sure there’s not a dragon on my head.
On the other hand, if I tell you I’ve buried a coin somewhere in my magical 1cm deep garden—and you dig a random hole and don’t find it—not finding the coin isn’t strong evidence that I’ve not buried one. However, there there’s so much potential weak evidence against. If you’ve dug up all but a 1cm square of my garden—the coin’s either in that 1cm or I’m telling porkies, and what are the odds that—digging randomly—you wouldn’t have come across it by then? You can be fairly sure, even before digging up that square, that I’m fibbing.
Was what you meant analogous to one of those scenarios?
Yes, like the latter scenario. Note that the expected utility of digging is low when the evidence against from one dig is low.
edit: Also. In the former case, not seeing a dragon sitting on your head is very strong evidence against there being a dragon. Unless you invoke un-testable invisible dragons which may be transparent to x-rays, let dust pass through it unaffected, and so on. In which case, I should have a very low likelihood of being convinced that there is a dragon on your head, if I know that the evidence against would be very weak.
edit2: Russel’s teapot in the Kuiper belt is a better example still. When there can be only very weak evidence against it, the probability of encountering or discovering strong evidence in favour of it must be low also, making it not worth while to try to come up with evidence that there is a teapot in the Kuiper belt (due to low probability of success), even when the prior probability for the teapot is not very low.
Then, to extend the analogy: Imagine that digging has potentially negative utility as well as positive. I claim to have buried both a large number of nukes and a magical wand in the garden.
In order to motivate you to dig, you probably want some evidence of magical wands. In this context that would probably be recursively improving systems where, occasionally, local variations rapidly acquire super-dominance over their contemporaries when they reach some critical value. Evolution probably qualifies there—other bipedal frames with fingers aren’t particularly dominant over other creatures in the same way that we are, but at some point we got smart enough to make weapons (note that I’m not saying that was what intelligence was for though) and from then on, by comparison to all other macroscopic land-dwelling forms of life, we may as well have been god.
And since then that initial edge in dominance has only ever allowed us to become more dominant. Creatures afraid of wild animals are not able to create societies with guns and nuclear weapons—you’d never have the stability for long enough.
In order to motivate you not to dig, you probably want some evidence of nukes. In this context, recursively—I’m not sure improving is the right word here—systems with a feedback state, that create large amounts of negative value. Well, to a certain extent that’s a matter of perspective—from the perspective of extinct species the ascendancy of humanity would probably not be anything to cheer about, if they were in a position to appreciate it. But I suspect it can at least stand on its own that it tends to be the case that failure cascades are easier to make than cascade successes. One little thing goes wrong on your rocket and then the situation multiplies; a small error in alignment rapidly becomes a bigger one; or the timer on your patriot battery is losing a fraction of a second and over time your perception of where the missiles are is off significantly. - it’s only with significant effort that we create systems where errors don’t multiply.
(This is analogous to altering your expected value of information—like if earlier you’d said you didn’t want to dig and I’d said, ‘well there’s a million bucks there’ instead—you’d probably want some evidence that I had a million bucks, but given such evidence the information you’d gain from digging would be worth more.)
This seems to be fairly closely analogous to Elizer’s claims about AI, at least if I’ve understood them correctly, that we have to hit an extremely small target and it’s more likely that we’re going to blow ourselves to itty-bitty pieces/cover the universe in paperclips if we’re just fooling around hoping to hit on it by chance.
If you believe that such is the case, then the only people you’re going to want looking for that magic wand—if you let anyone do it at all—are specialists with particle detectors—indeed if your garden is in the middle of a city you’ll probably make it illegal for kids to play around anywhere near the potential bomb site.
Now, we may argue over quite how strongly we have to believe in the possible existence of magitech nukes to justify the cost of fencing off the garden—personally I think the statement:
Is to constrain what you’ll accept for potential evidence pretty dramatically—we’re talking about systems in general, not just individual people, and recursively improving systems with high asymptotes relative to their contemporaries have happened before.
It’s not clear to me that the second claim he makes is even particularly meaningful:
Sure, I think that they probably won’t go to infinity—but I don’t see any reason to suspect that they won’t converge on a much higher value than our own native ability. Pretty much all of our systems do, from calculators to cars.
We can even argue over how you separate the claims that something’s going to foom from the false claims of such (I’d suggest, initially, just seeing how many claims that something was going to foom have actually been made within the domain of technological artefacts, it may be that the base-line credibility is higher than we think.) But that’s a body of research that Caplan, as far as I’m aware, hasn’t forwarded. It’s not clear to me that it’s a body of research with the same order of difficulty as creating an actual AI either. And, in its absence, it’s not clear to me that to answer in effect, “I’ll believe it when I see the mushroom cloud.” is a particularly rational response.
I was mostly referring to the general lack of interest in the discussion of un-falsifiable propositions by the scientific community. The issue is that un-falsifiable proposition are also the ones for which it is unlikely that in the discussion you will be presented with evidence in favour of them.
The space of propositions is the garden I am speaking of. And digging up false propositions is not harmless.
With regards to the argument of yours, I think you vastly under-estimate the size of the high-dimensional space of possible software, and how distant in this space are the little islands of software that actually does something interesting, as distant from each other as Bolzmann minds are within our universe (Albeit, of course, depending on the basis, possible software is better clustered).
Those spatial analogies, they are a great fallacy generator, a machine for getting quantities off by mind-bogglingly huge factors. In your mental image, you have someone create those nukes and put them in the sand, for the hapless individuals to find. In the reality that’s not how you find nuke. You venture into this enormous space of possible designs, as vast as the distance from here to the closest exact replica of The Gadget which spontaneously formed from a supernova by the random movement of uranium atoms. When you have to look in the space this big, you don’t find this replica of The Gadget without knowing what you’re looking for quite well.
With regards to listing biases to help arguments, given that I have no expectation that one could not handwave up a fairly plausible bias that would work in the direction of a specific argument, the direct evidential value of listing biases in such manner, on the proposition, is zero (or an epsilon). You could have just as well argued that the individuals who are not afraid of cave bears get killed by the cave bears; there’s too much “give” in your argument for it to have any evidential value. I can freely ignore it without having to bother to come up with a balancing bias (as people like Caplan rightfully do, without really bothering to outline why).
Hm. A generalized phenomenon of overwhelming physicist underconfidence could account for a reasonable amount of the QM affair.