I think you are abusing/misusing the concept of falsifiability here. Ditto for empiricism. You aren’t the only one to do this, I’ve seen it happen a lot over the years and it’s very frustrating. I unfortunately am busy right now but would love to give a fuller response someday, especially if you are genuinely interested to hear what I have to say (which I doubt, given your attitude towards MIRI).
I unfortunately am busy right now but would love to give a fuller response someday, especially if you are genuinely interested to hear what I have to say (which I doubt, given your attitude towards MIRI).
I’m a bit surprised you suspect I wouldn’t be interested in hearing what you have to say?
I think the amount of time I’ve spent engaging with MIRI perspectives over the years provides strong evidence that I’m interested in hearing opposing perspectives on this issue. I’d guess I’ve engaged with MIRI perspectives vastly more than almost everyone on Earth who explicitly disagrees with them as strongly as I do (although obviously some people like Paul Christiano and other AI safety researchers have engaged with them even more than me).
(I might not reply to you, but that’s definitely not because I wouldn’t be interested in what you have to say. I read virtually every comment-reply to me carefully, even if I don’t end up replying.)
I want to publicly endorse and express appreciation for Matthew’s apparent good faith.
Every time I’ve ever seen him disagreeing about AI stuff on the internet (a clear majority of the times I’ve encountered anything he’s written), he’s always been polite, reasonable, thoughtful, and extremely patient. Obviously conversations sometimes entail people talking past each other, but I’ve seen him carefully try to avoid miscommunication, and (to my ability to judge) strawmanning.
Here’s a new approach: Your list of points 1 − 7. Would you also make those claims about me? (i.e. replace references to MIRI with references to Daniel Kokotajlo.)
You’ve made detailed predictions about what you expect in the next several years, on numerous occasions, and made several good-faith attempts to elucidate your models of AI concretely. There are many ways we disagree, and many ways I could characterize your views, but “unfalsifiable” is not a label I would tend to use for your opinions on AI. I do not mentally lump you together with MIRI in any strong sense.
OK, glad to hear. And thank you. :) Well, you’ll be interested to know that I think of my views on AGI as being similar to MIRI’s, just less extreme in various dimensions. For example I don’t think literally killing everyone is the most likely outcome, but I think it’s a very plausible outcome. I also don’t expect the ‘sharp left turn’ to be particularly sharp, such that I don’t think it’s a particularly useful concept. I also think I’ve learned a lot from engaging with MIRI and while I have plenty of criticisms of them (e.g. I think some of them are arrogant and perhaps even dogmatic) I think they have been more epistemically virtuous than the average participant in the AGI risk conversation, even the average ‘serious’ or ‘elite’ participant.
I don’t think [AGI/ASI] literally killing everyone is the most likely outcome
Huh, I was surprised to read this. I’ve imbibed a non-trivial fraction of your posts and comments here on LessWrong, and, before reading the above, my shoulder Daniel definitely saw extinction as the most likely existential catastrophe.
If you have the time, I’d be very interested to hear what you do think is the most likely outcome. (It’s very possible that you have written about this before and I missed it—my bad, if so.)
(My model of Daniel thinks the AI will likely take over, but probably will give humanity some very small fraction of the universe, for a mixture of “caring a tiny bit” and game-theoretic reasons)
(Fwiw, I don’t find the ‘caring a tiny bit’ story very reassuring, for the same reasons as Wei Dai, although I do find the acausal tradestory for why humans might be left with Earth somewhat heartening. (I’m assuming that by ‘game-theoretic reasons’ you mean acausal trade.))
Yep, Habryka is right. Also, I agree with Wei Dai re: reassuringness. I think literal extinction is <50% likely, but this is cold comfort given the badness of some of the plausible alternatives, and overall I think the probability of something comparably bad happening is >50%.
Followup: Matthew and I ended up talking about it in person. tl;dr of my position is that
Falsifiability is a symmetric two-place relation; one cannot say “X is unfalsifiable,” except as shorthand for saying “X and Y make the same predictions,” and thus Y is equally unfalsifiable. When someone is going around saying “X is unfalsifiable, therefore not-X,” that’s often a misuse of the concept—what they should say instead is “On priors / for other reasons (e.g. deference) I prefer not-X to X; and since both theories make the same predictions, I expect to continue thinking this instead of updating, since there won’t be anything to update on
.What is the point of falsifiability-talk then? Well, first of all, it’s quite important to track when two theories make the same predictions, or the same-predictions-till-time-T. It’s an important part of the bigger project of extracting predictions from theories so they can be tested. It’s exciting progress when you discover that two theories make different predictions, and nail it down well enough to bet on. Secondly, it’s quite important to track when people are making this worse rather than easier—e.g. fortunetellers and pundits will often go out of their way to avoid making any predictions that diverge from what their interlocutors already would predict. Whereas the best scientists/thinkers/forecasters, the ones you should defer to, should be actively trying to find alpha and then exploit it by making bets with people around them. So falsifiability-talk is useful for evaluating people as epistemically virtuous or vicious. But note that if this is what you are doing, it’s all a relative thing in a different way—in the case of MIRI, for example, the question should be “Should I defer to them more, or less, than various alternative thinkers A B and C? --> Are they generally more virtuous about making specific predictions, seeking to make bets with their interlocutors, etc. than A B or C?”
So with that as context, I’d say that (a) It’s just wrong to say ‘MIRI’s theories of doom are unfalsifiable.’ Instead say ‘unfortunately for us (not for the plausibility of the theories), both MIRI’s doom theories and (insert your favorite non-doom theories here) make the same predictions until it’s basically too late.’ (b) One should then look at MIRI and be suspicious and think ‘are they systematically avoiding making bets, making specific predictions, etc. relative to the other people we could defer to? Are they playing the sneaky fortuneteller or pundit’s game?’ to which I think the answer is ‘no not at all, they are actually more epistemically virtuous in this regard than the average intellectual. That said, they aren’t the best either—some other people in the AI risk community seem to be doing better than them in this regard, and deserve more virtue points (and possibly deference points) therefore.’ E.g. I think both Matthew and I have more concrete forecasting track records than Yudkowsky?
Falsifiability is not symmetric. Consider two theories:
Theory X: Jesus will come again.
Theory Y: Jesus will not come again.
If Jesus comes again tomorrow, this falsifies theory Y and confirms theory X. If Jesus does not come again tomorrow, neither theory is falsified or confirmed. So we can say that X is unfalsifiable (with respect to a finite time frame) and Y is falsifiable.
Another example:
Theory X: blah blah and therefore the sky is green
Theory Y: blah blah and therefore the sky is not green
Theory Z: blah blah and therefore the sky could be green or not green.
Here, theory X and Y are falsifiable with respect to the color of the sky and theory Z is not.
Theory X: Jesus will come again: Presumably this theory assigns some probability mass >0 to observing Jesus tomorrow, whereas theory Y assigns ~0. If jesus is not observed tomorrow, that’s a small amount of evidence for theory Y and a small amount of evidence against theory X. So you can say that theory X has been partially falsified. Repeat this enough times, and then you can say theory X has been fully falsified, or close enough. (Your credence in theory X will never drop to 0 probably, but that’s fine, that’s also true of all sorts of physical theories in good standing e.g. all the major theories of cosmology and cognitive science, which allow for tiny probabilities of arbitrary sequences of experiences happening in e.g. Boltzmann Brains)
With the sky color example: My way of thinking about falsifiability is, we say two theories are falsifiable relative to each other if there is evidence we expect to encounter that will distinguish them / cause us to shift our relative credence in them.
In the case of Theory Z, there is an implicit theory Z2 which is “NOT blah blah, and therefore the sky could be green or not green.” (Presumably that’s what you are holding in the back of your mind as the alternative to Z, when you imagine updating for or against Z on the basis of seeing blue sky, and decide that you wouldn’t?) Because the theory Z3 “NOT blah blah and therefore the sky is blue” would be confirmed by seeing a blue sky, and if somehow you were splitting your credence between Z and Z3, then you would decrease your credence in Z if you saw a blue sky.
A theory or hypothesis is falsifiable if it can be logically contradicted by an empirical test.
Whereas your definition is:
Falsifiability is a symmetric two-place relation; one cannot say “X is unfalsifiable,” except as shorthand for saying “X and Y make the same predictions,” and thus Y is equally unfalsifiable.
In one of the examples I gave earlier:
Theory X: blah blah and therefore the sky is green
Theory Y: blah blah and therefore the sky is not green
Theory Z: blah blah and therefore the sky could be green or not green.
None of X, Y, or Z are Unfalsifiable-Daniel with respect to each other, because they all make different predictions. However, X and Y are Falsifiable-Wikipedia, whereas Z is Unfalsifiable-Wikipedia.
MIRI researchers rarely provide any novel predictions about what will happen before AI doom, making their theories of doom appear unfalsifiable.
Barnett is using something like the Wikipedia definition of falsifiability here. It’s unfair to accuse him of abusing or misusing the concept when he’s using it in a very standard way.
So, by the Wikipedia definition, it seems that all the mainstream theories of cosmology are unfalsifiable, because they allow for tiny probabilities of boltmann brains etc. with arbitrary experiences. There is literally nothing you could observe that would rule them out / logically contradict them.
Also, in practice, it’s extremely rare for a theory to be ruled out or even close-to-ruled out from any single observation or experiment. Instead, evidence accumulates in a bunch of minor and medium-sized updates.
I think cosmology theories have to be phrased as including background assumptions like “I am not a Boltzmann brain” and “this is not a simulation” and such. Compare Acknowledging Background Information with P(Q|I) for example. Given that, they are Falsifiable-Wikipedia.
I view Falsifiable-Wikipedia in a similar way to Occam’s Razor. The true epistemology has a simplicity prior, and Occam’s Razor is a shadow of that. The true epistemology considers “empirical vulnerability” / “experimental risk” to be positive. Possibly because it falls out of Bayesian updates, possibly because they are “big if true”, possibly for other reasons. Falsifiability is a shadow of that.
In that context, if a hypothesis makes no novel predictions, and the predictions it makes are a superset of the predictions of other hypotheses, it’s less empirically vulnerable, and in some relative sense “unfalsifiable”, compared to those other hypotheses.
I personally wouldn’t include it, because essentially everything (given a powerful enough model of computation) could be simulated, and this is why the simulation hypothesis is so bad in casual discourse: It explains everything, which means it explains nothing that is specific to our universe:
Also note that Barnett said “any novel predictions” which is not part of the wikipedia definition of falsifiability right? The wikipedia definition doesn’t make reference to an existing community of scientists who already made predictions, such that a new hypothesis can be said to have made novel vs. non-novel predictions.
I totally agree btw that it matters sociologically who is making novel predictions and who is sticking with the crowd. And I do in fact ding MIRI points for this relative to some other groups. However I think relative to most elite opinion-formers on AGI matters, MIRI performs better than average on this metric.
But note that this ‘novel predictions’ metric is about people/institutions, not about hypotheses.
However I think relative to most elite opinion-formers on AGI matters, MIRI performs better than average on this metric.
Agree with this, with the caveat that I think all of their rightness relative to others fundamentally was in believing that short timelines were plausible enough, combined with believing in AI being the most major force of the 21st century by far, compared to other technologies, and basically a lot of their other specific predictions are likely to be pretty wrong.
I like this comment here about a useful comparison point to MIRI, where physicists were right about the higgs boson existing, but wrong on the theories like supersymmetry where people expected the higgs mass to be naturally stabilized, and assuming supersymmetry is correct for our universe, the theory cannot stabilize the mass of the higgs, or solve the hierarchy problem:
I think I agree with this—but do you see how it makes me frustrated to hear people dunk on MIRI’s doomy views as unfalsifiable? Here’s what happened in a nutshell:
MIRI: “AGI is coming and it will kill everyone.” Everyone else: “AGI is not coming and if it did it wouldn’t kill everyone.” time passes, evidence accumulates... Everyone else: “OK, AGI is coming, but it won’t kill everyone” Everyone else: “Also, the hypothesis that it won’t kill everyone is unfalsifiable so we shouldn’t believe it.”
Yeah, I think this is actually a problem I see here, though admittedly I often see the hypotheses be vaguely formulated, and I kind of agree with Jotto999 that the verbal forecasts give far too much room for leeway here:
I like that metric, but the metric I’m discussing is more:
Are they proposing clear hypotheses?
Do their hypotheses make novel testable predictions?
Are they making those predictions explicit?
So for example, looking at MIRI’s very first blog post in 2007: The Power of Intelligence. I used the first just to avoid cherry-picking.
Hypothesis: intelligence is powerful. (yes it is)
This hypothesis is a necessary precondition for what we’re calling “MIRI doom theory” here. If intelligence is weak then AI is weak and we are not doomed by AI.
Predictions that I extract:
An AI can do interesting things over the Internet without a robot body.
An AI can get money.
An AI can be charismatic.
An AI can send a ship to Mars.
An AI can invent a grand unified theory of physics.
An AI can prove the Riemann Hypothesis.
An AI can cure obesity, cancer, aging, and stupidity.
Not a novel hypothesis, nor novel predictions, but also not widely accepted in 2007. As predictions they have aged very well, but they were unfalsifiable. If 2025 Claude had no charisma, it would not falsify the prediction that an AI can be charismatic.
I don’t mean to ding MIRI any points here, relative or otherwise, it’s just one blog post, I don’t claim it supports Barnett’s complaint by itself. I mostly joined the thread to defend the concept of asymmetric falsifiability.
Martin Randall extracted the practical consequences of this here:
In that context, if a hypothesis makes no novel predictions, and the predictions it makes are a superset of the predictions of other hypotheses, it’s less empirically vulnerable, and in some relative sense “unfalsifiable”, compared to those other hypotheses.
I think you are abusing/misusing the concept of falsifiability here. Ditto for empiricism. You aren’t the only one to do this, I’ve seen it happen a lot over the years and it’s very frustrating. I unfortunately am busy right now but would love to give a fuller response someday, especially if you are genuinely interested to hear what I have to say (which I doubt, given your attitude towards MIRI).
I’m a bit surprised you suspect I wouldn’t be interested in hearing what you have to say?
I think the amount of time I’ve spent engaging with MIRI perspectives over the years provides strong evidence that I’m interested in hearing opposing perspectives on this issue. I’d guess I’ve engaged with MIRI perspectives vastly more than almost everyone on Earth who explicitly disagrees with them as strongly as I do (although obviously some people like Paul Christiano and other AI safety researchers have engaged with them even more than me).
(I might not reply to you, but that’s definitely not because I wouldn’t be interested in what you have to say. I read virtually every comment-reply to me carefully, even if I don’t end up replying.)
I apologize, I shouldn’t have said that parenthetical.
I want to publicly endorse and express appreciation for Matthew’s apparent good faith.
Every time I’ve ever seen him disagreeing about AI stuff on the internet (a clear majority of the times I’ve encountered anything he’s written), he’s always been polite, reasonable, thoughtful, and extremely patient. Obviously conversations sometimes entail people talking past each other, but I’ve seen him carefully try to avoid miscommunication, and (to my ability to judge) strawmanning.
Thank you Mathew. Keep it up. : )
Here’s a new approach: Your list of points 1 − 7. Would you also make those claims about me? (i.e. replace references to MIRI with references to Daniel Kokotajlo.)
You’ve made detailed predictions about what you expect in the next several years, on numerous occasions, and made several good-faith attempts to elucidate your models of AI concretely. There are many ways we disagree, and many ways I could characterize your views, but “unfalsifiable” is not a label I would tend to use for your opinions on AI. I do not mentally lump you together with MIRI in any strong sense.
OK, glad to hear. And thank you. :) Well, you’ll be interested to know that I think of my views on AGI as being similar to MIRI’s, just less extreme in various dimensions. For example I don’t think literally killing everyone is the most likely outcome, but I think it’s a very plausible outcome. I also don’t expect the ‘sharp left turn’ to be particularly sharp, such that I don’t think it’s a particularly useful concept. I also think I’ve learned a lot from engaging with MIRI and while I have plenty of criticisms of them (e.g. I think some of them are arrogant and perhaps even dogmatic) I think they have been more epistemically virtuous than the average participant in the AGI risk conversation, even the average ‘serious’ or ‘elite’ participant.
Huh, I was surprised to read this. I’ve imbibed a non-trivial fraction of your posts and comments here on LessWrong, and, before reading the above, my shoulder Daniel definitely saw extinction as the most likely existential catastrophe.
If you have the time, I’d be very interested to hear what you do think is the most likely outcome. (It’s very possible that you have written about this before and I missed it—my bad, if so.)
(My model of Daniel thinks the AI will likely take over, but probably will give humanity some very small fraction of the universe, for a mixture of “caring a tiny bit” and game-theoretic reasons)
Thanks, that’s helpful!
(Fwiw, I don’t find the ‘caring a tiny bit’ story very reassuring, for the same reasons as Wei Dai, although I do find the acausal trade story for why humans might be left with Earth somewhat heartening. (I’m assuming that by ‘game-theoretic reasons’ you mean acausal trade.))
Yep, Habryka is right. Also, I agree with Wei Dai re: reassuringness. I think literal extinction is <50% likely, but this is cold comfort given the badness of some of the plausible alternatives, and overall I think the probability of something comparably bad happening is >50%.
Followup: Matthew and I ended up talking about it in person. tl;dr of my position is that
Falsifiability is a symmetric two-place relation; one cannot say “X is unfalsifiable,” except as shorthand for saying “X and Y make the same predictions,” and thus Y is equally unfalsifiable. When someone is going around saying “X is unfalsifiable, therefore not-X,” that’s often a misuse of the concept—what they should say instead is “On priors / for other reasons (e.g. deference) I prefer not-X to X; and since both theories make the same predictions, I expect to continue thinking this instead of updating, since there won’t be anything to update on
.What is the point of falsifiability-talk then? Well, first of all, it’s quite important to track when two theories make the same predictions, or the same-predictions-till-time-T. It’s an important part of the bigger project of extracting predictions from theories so they can be tested. It’s exciting progress when you discover that two theories make different predictions, and nail it down well enough to bet on. Secondly, it’s quite important to track when people are making this worse rather than easier—e.g. fortunetellers and pundits will often go out of their way to avoid making any predictions that diverge from what their interlocutors already would predict. Whereas the best scientists/thinkers/forecasters, the ones you should defer to, should be actively trying to find alpha and then exploit it by making bets with people around them. So falsifiability-talk is useful for evaluating people as epistemically virtuous or vicious. But note that if this is what you are doing, it’s all a relative thing in a different way—in the case of MIRI, for example, the question should be “Should I defer to them more, or less, than various alternative thinkers A B and C? --> Are they generally more virtuous about making specific predictions, seeking to make bets with their interlocutors, etc. than A B or C?”
So with that as context, I’d say that (a) It’s just wrong to say ‘MIRI’s theories of doom are unfalsifiable.’ Instead say ‘unfortunately for us (not for the plausibility of the theories), both MIRI’s doom theories and (insert your favorite non-doom theories here) make the same predictions until it’s basically too late.’ (b) One should then look at MIRI and be suspicious and think ‘are they systematically avoiding making bets, making specific predictions, etc. relative to the other people we could defer to? Are they playing the sneaky fortuneteller or pundit’s game?’ to which I think the answer is ‘no not at all, they are actually more epistemically virtuous in this regard than the average intellectual. That said, they aren’t the best either—some other people in the AI risk community seem to be doing better than them in this regard, and deserve more virtue points (and possibly deference points) therefore.’ E.g. I think both Matthew and I have more concrete forecasting track records than Yudkowsky?
Falsifiability is not symmetric. Consider two theories:
Theory X: Jesus will come again.
Theory Y: Jesus will not come again.
If Jesus comes again tomorrow, this falsifies theory Y and confirms theory X. If Jesus does not come again tomorrow, neither theory is falsified or confirmed. So we can say that X is unfalsifiable (with respect to a finite time frame) and Y is falsifiable.
Another example:
Theory X: blah blah and therefore the sky is green
Theory Y: blah blah and therefore the sky is not green
Theory Z: blah blah and therefore the sky could be green or not green.
Here, theory X and Y are falsifiable with respect to the color of the sky and theory Z is not.
Here’s how I’d deal with those examples:
Theory X: Jesus will come again: Presumably this theory assigns some probability mass >0 to observing Jesus tomorrow, whereas theory Y assigns ~0. If jesus is not observed tomorrow, that’s a small amount of evidence for theory Y and a small amount of evidence against theory X. So you can say that theory X has been partially falsified. Repeat this enough times, and then you can say theory X has been fully falsified, or close enough. (Your credence in theory X will never drop to 0 probably, but that’s fine, that’s also true of all sorts of physical theories in good standing e.g. all the major theories of cosmology and cognitive science, which allow for tiny probabilities of arbitrary sequences of experiences happening in e.g. Boltzmann Brains)
With the sky color example:
My way of thinking about falsifiability is, we say two theories are falsifiable relative to each other if there is evidence we expect to encounter that will distinguish them / cause us to shift our relative credence in them.
In the case of Theory Z, there is an implicit theory Z2 which is “NOT blah blah, and therefore the sky could be green or not green.” (Presumably that’s what you are holding in the back of your mind as the alternative to Z, when you imagine updating for or against Z on the basis of seeing blue sky, and decide that you wouldn’t?) Because the theory Z3 “NOT blah blah and therefore the sky is blue” would be confirmed by seeing a blue sky, and if somehow you were splitting your credence between Z and Z3, then you would decrease your credence in Z if you saw a blue sky.
Thanks for explaining. I think we have a definition dispute. Wikipedia:Falsifiability has:
Whereas your definition is:
In one of the examples I gave earlier:
Theory X: blah blah and therefore the sky is green
Theory Y: blah blah and therefore the sky is not green
Theory Z: blah blah and therefore the sky could be green or not green.
None of X, Y, or Z are Unfalsifiable-Daniel with respect to each other, because they all make different predictions. However, X and Y are Falsifiable-Wikipedia, whereas Z is Unfalsifiable-Wikipedia.
I prefer the Wikipedia definition. To say that two theories produce exactly the same predictions, I would instead say they are indistinguishable, similar to this Phyiscs StackExchange: Are different interpretations of quantum mechanics empirically distinguishable?.
In the ancestor post, Barnett writes:
Barnett is using something like the Wikipedia definition of falsifiability here. It’s unfair to accuse him of abusing or misusing the concept when he’s using it in a very standard way.
Very good point.
So, by the Wikipedia definition, it seems that all the mainstream theories of cosmology are unfalsifiable, because they allow for tiny probabilities of boltmann brains etc. with arbitrary experiences. There is literally nothing you could observe that would rule them out / logically contradict them.
Also, in practice, it’s extremely rare for a theory to be ruled out or even close-to-ruled out from any single observation or experiment. Instead, evidence accumulates in a bunch of minor and medium-sized updates.
I think cosmology theories have to be phrased as including background assumptions like “I am not a Boltzmann brain” and “this is not a simulation” and such. Compare Acknowledging Background Information with P(Q|I) for example. Given that, they are Falsifiable-Wikipedia.
I view Falsifiable-Wikipedia in a similar way to Occam’s Razor. The true epistemology has a simplicity prior, and Occam’s Razor is a shadow of that. The true epistemology considers “empirical vulnerability” / “experimental risk” to be positive. Possibly because it falls out of Bayesian updates, possibly because they are “big if true”, possibly for other reasons. Falsifiability is a shadow of that.
In that context, if a hypothesis makes no novel predictions, and the predictions it makes are a superset of the predictions of other hypotheses, it’s less empirically vulnerable, and in some relative sense “unfalsifiable”, compared to those other hypotheses.
I personally wouldn’t include it, because essentially everything (given a powerful enough model of computation) could be simulated, and this is why the simulation hypothesis is so bad in casual discourse: It explains everything, which means it explains nothing that is specific to our universe:
https://arxiv.org/abs/1806.08747
Also note that Barnett said “any novel predictions” which is not part of the wikipedia definition of falsifiability right? The wikipedia definition doesn’t make reference to an existing community of scientists who already made predictions, such that a new hypothesis can be said to have made novel vs. non-novel predictions.
I totally agree btw that it matters sociologically who is making novel predictions and who is sticking with the crowd. And I do in fact ding MIRI points for this relative to some other groups. However I think relative to most elite opinion-formers on AGI matters, MIRI performs better than average on this metric.
But note that this ‘novel predictions’ metric is about people/institutions, not about hypotheses.
Agree with this, with the caveat that I think all of their rightness relative to others fundamentally was in believing that short timelines were plausible enough, combined with believing in AI being the most major force of the 21st century by far, compared to other technologies, and basically a lot of their other specific predictions are likely to be pretty wrong.
I like this comment here about a useful comparison point to MIRI, where physicists were right about the higgs boson existing, but wrong on the theories like supersymmetry where people expected the higgs mass to be naturally stabilized, and assuming supersymmetry is correct for our universe, the theory cannot stabilize the mass of the higgs, or solve the hierarchy problem:
https://www.lesswrong.com/posts/ZLAnH5epD8TmotZHj/you-can-in-fact-bamboozle-an-unaligned-ai-into-sparing-your#Ha9hfFHzJQn68Zuhq
I think I agree with this—but do you see how it makes me frustrated to hear people dunk on MIRI’s doomy views as unfalsifiable? Here’s what happened in a nutshell:
MIRI: “AGI is coming and it will kill everyone.”
Everyone else: “AGI is not coming and if it did it wouldn’t kill everyone.”
time passes, evidence accumulates...
Everyone else: “OK, AGI is coming, but it won’t kill everyone”
Everyone else: “Also, the hypothesis that it won’t kill everyone is unfalsifiable so we shouldn’t believe it.”
Yeah, I think this is actually a problem I see here, though admittedly I often see the hypotheses be vaguely formulated, and I kind of agree with Jotto999 that the verbal forecasts give far too much room for leeway here:
I like Eli Tyre’s comment here:
https://www.lesswrong.com/posts/ZEgQGAjQm5rTAnGuM/beware-boasting-about-non-existent-forecasting-track-records#Dv7aTjGXEZh6ALmZn
I like that metric, but the metric I’m discussing is more:
Are they proposing clear hypotheses?
Do their hypotheses make novel testable predictions?
Are they making those predictions explicit?
So for example, looking at MIRI’s very first blog post in 2007: The Power of Intelligence. I used the first just to avoid cherry-picking.
Hypothesis: intelligence is powerful. (yes it is)
This hypothesis is a necessary precondition for what we’re calling “MIRI doom theory” here. If intelligence is weak then AI is weak and we are not doomed by AI.
Predictions that I extract:
An AI can do interesting things over the Internet without a robot body.
An AI can get money.
An AI can be charismatic.
An AI can send a ship to Mars.
An AI can invent a grand unified theory of physics.
An AI can prove the Riemann Hypothesis.
An AI can cure obesity, cancer, aging, and stupidity.
Not a novel hypothesis, nor novel predictions, but also not widely accepted in 2007. As predictions they have aged very well, but they were unfalsifiable. If 2025 Claude had no charisma, it would not falsify the prediction that an AI can be charismatic.
I don’t mean to ding MIRI any points here, relative or otherwise, it’s just one blog post, I don’t claim it supports Barnett’s complaint by itself. I mostly joined the thread to defend the concept of asymmetric falsifiability.
Martin Randall extracted the practical consequences of this here: