I linked this to an IRC channel full of people skeptical of SI. One person commented that
the reply doesn’t seem to be saying much
and another that
I think most arguments are ’yes we are bad but we will improve’ and some opinion based statement about how FAI is the most improtant thing on the world.
Which was somewhat my reaction as well—I can’t put a finger on it and say exactly what it is that’s wrong, but somehow it feels like this post isn’t “meaty” enough to elicit much of a reaction, positive or negative. Which on the other feels odd, since e.g. the “SI’s mission assumes a scenario that is far less conjunctive than it initially appears” heading makes an important point that SI hasn’t really communicated well in the past. Maybe it just got buried under the other stuff, or something.
That’s an unfortunate response, given that I offered a detailed DH6-level disagreement (quote the original article directly, and refute the central points), and also offered important novel argumentation not previously published by SI. I’m not sure what else people could have wanted.
If somebody figures out why Kaj and some others had the reaction they did, I’m all ears.
I can’t speak for anyone else, and had been intending to sit this one out, since my reactions to this post were not really the kind of reaction you’d asked for.
But, OK, my $0.02.
The claim that an organization is exceptionally well-suited to convert money into existential risk mitigation is an extraordinary one, and extraordinary claims require extraordinary evidence. This puts a huge burden on you, as the person attempting to provide that evidence.
So, I’ll ask you: do you think your response provides such evidence?
If you do, then your problem seems to be (as others have suggested) one of document organization. Perhaps starting out with an elevator-pitch answer to the question “Why should I believe that SI is capable of this extraordinary feat?” might be a good idea.
Because my take-away from reading this post was “Well, nobody else is better suited to do it, and SI does some cool movement-building stuff (the Sequences, the Rationality Camps, and HPMoR) that attracts smart people and encourages them to embrace a more rational approach to their lives, and SI is fixing some of its organizational and communication problems but we need more money to really make progress on our core mission.”
Which, if I try to turn it into an answer to the initial question, gives me “Well, we’re better-suited than anyone else because, unlike them, we’re focused on the right problem… even though you can’t really tell, because what we are really focused on is movement-building, but once we get a few million dollars and the support of superhero mathematicians, we will totally focus on the right problem, unlike anyone else.”
If that is in fact your answer, then one thing that might help is to make a more credible visible precommitment to that eventuality.
For example: if you had that “few million dollars a year” revenue stream, and if you had the superhero mathematician, what exactly would you do with them for, say, the first six months? Lay out that project plan in detail, establish what your criteria would be to make sure you were still focused on the right problem three months in, and set up an escrow fund (a la Kickstarter, where the funds are returned if the target is not met) to support that project plan so people who are skeptical of SI’s organizational ability to actually do any of that stuff have a way of supporting the plan IFF they’re wrong about SI, without waiting for their wrongness to be demonstrated before providing the support.
If your answer is in fact something else, then stating it more clearly might help.
The claim that an organization is exceptionally well-suited to convert money into existential risk mitigation is an extraordinary one, and extraordinary claims require extraordinary evidence.
Reminder: I don’t know if you were committing this particular error internally, but, at the least, the sentence is liable to cause the error externally, so: Large consequences != prior improbability. E.g. although global warming has very large consequences, and even implies that we should take large actions, it isn’t improbable a priori that carbon dioxide should trap heat in the atmosphere—it’s supposed to happen, according to standard physics. And so demanding strong evidence that global warming is anthropogenic is bad probability theory and decision theory. Expensive actions imply a high value of information, meaning that if we happen to have access to cheap, powerfully distinguishing evidence about global warming we should look at it; but if that evidence is not available, then we go from the default extrapolation from standard physics and make policy on that basis—not demand more powerful evidence on pain of doing nothing.
The claim that SIAI is currently best-suited to convert marginal dollars into FAI and/or general x-risk mitigation has large consequences. Likewise claims like “most possible self-improving AIs will kill you, although there’s an accessible small space of good designs”. This is not the same as saying that if the other facts of the world are what they appear at face value to be, these claims should require extraordinary evidence before we believe them.
Since reference class tennis is also a danger (i.e, if you want to conclude that a belief is false, you can always find a reference class in which to put it where most beliefs are false, e.g. classifying global warming as an “apocalyptic belief”), one more reliable standard to require before saying “Extraordinary claims require extraordinary evidence” is to ask what prior belief needs to be broken by the extraordinary evidence, and how well-supported that prior belief may be. Suppose global warming is real—what facet of existing scientific understanding would need to change? None, in fact; it is the absence of anthropogenic global warming that would imply change in our current beliefs, so that’s what would require the extraordinary evidence to power it. In the same sense, an AI showing up as early as 2025, self-improving, and ending the world, doesn’t make us say “What? Impossible!” with respect to any current well-supported scientific belief. And if SIAI manages to get together a pack of topnotch mathematicians and solve the FAI problem, it’s not clear to me that you can pinpoint a currently-well-supported element of the world-model which gets broken.
The idea that the proposition contains too much burdensome detail—as opposed to an extraordinary element—would be a separate discussion. There are fewer details required than many strawman versions would have it; and often what seems like a specific detail is actually just an antiprediction, i.e., UFAI is not about a special utility function but about the whole class of non-Friendly utility functions. Nonetheless, if someone’s thought processes were dominated by model risk, but they nonetheless actually cared about Earth’s survival, and were generally sympathetic to SIAI even as they distrusted the specifics, it seems to me that they should support CFAR, part of whose rationale is explicitly the idea that Earth gets a log(number of rationalists) saving throw bonus on many different x-risks.
I am coming to the conclusion that “extraordinary claims require extraordinary evidence” is just bad advice, precisely because it causes people to conflate large consequences and prior improbability. People are fond of saying it about cryonics, for example.
At least sometimes, people may say “extraordinary claims require extraordinary evidence” when they mean “your large novel claim has set off my fraud risk detector; please show me how you’re not a scam.”
In other words, the caution being expressed is not about prior probabilities in the natural world, but rather the intentions and morals of the claimant.
Well, consider strategic point of view. Suppose that a system (humans) is known for it’s poor performance at evaluating the claims without performing direct experimentation. Long, long history of such failures.
Consider also that a false high-impact claim can ruin ability of this system to perform it’s survival function, with again a long history of such events; the damage is proportionally to the claimed impact. (Mayans are a good example, killing people so that the sun will rise tomorrow; great utilitarian rationalists they were; believing that their reasoning is perfect enough to warrant such action. Note that donating to a wrong charity instead of a right one kills people)
When we anticipate that a huge percentage of the claims will be false, we can build the system to require evidence that if the claim was false the system would be in a small probability world (i.e. require that for a claim evidence was collected so that p(evidence | ~claim)/p(evidence | claim) is low), to make the system, once deployed, fall off the cliffs less often. The required strength of the evidence is then increasing with impact of the claim.
It is not an ideal strategy, but it is the one that works given the limitations. There are other strategies and it is not straightforward to improve performance (and easy to degrade performance by making idealized implicit assumptions).
I don’t know if you were committing this particular error internally, but, at the least, the sentence is liable to cause the error externally, so: Large consequences != prior improbability.
What I meant when I described the claim (hereafter “C”) that SI is better suited to convert dollars to existential risk mitigation than any other charitable organization as “extraordinary” was that priors for C are low (C is false for most organizations, and therefore likely to be false for SI absent additional evidence about SI), not that C has large consequences (although that is true as well).
Yes, this might be a failing of using the wrong reference class (charitable organizations in general) to establish one’s priors., as you suggest. The fact remains that when trying to solicit broad public support, or support from an organization like GiveWell, it’s likely that SI will be evaluated within the reference class of other charities. If using that reference class leads to improperly low priors for C, it seems SI has a few strategic choices:
1) Convince GiveWell, and donors in general, that SI is importantly unlike other charities, and should not be evaluated as though it were like them—in other words, win at reference class tennis.
2) Ignore donors in general and concentrate its attention primarily on potential donors who already use the correct reference class.
3) Provide enough evidence to convince even someone who starts out with improperly low priors drawn from the incorrect reference class of “SI is a charity” to update to a sufficiently high estimate of C that donating money to SI seems reasonable (in practice, I think this is what has happened and is happening with anthropogenic climate change).
4) Look for alternate sources of funding besides charitable donations.
One way to approach strategy #1 is the one you use here—shift the conversation from whether or not SI can actually spend money effectively to mitigate existential risk to whether or not uFAI/FAI by 2025 (or some other near-mode threshold) is plausible.
That’s not a bad tactic; it works pretty well in general.
Your statement was that it was an extraordinary claim that SIAI provided x-risk reduction—why then would SIAI be compared to most other charities, which don’t provide x-risk reduction, and don’t claim to provide x-risk reduction? The AI-risk item was there for comparison of standards, as was global warming; i.e., if you claim that you doubt X because of Y, but Y implies doubting Z, but you don’t doubt Z, you should question whether you’re really doubting X because of Y.
why then would SIAI be compared to most other charities, which don’t provide x-risk reduction, and don’t claim to provide x-risk reduction?
Are you trying to argue that it isn’t in fact being compared to other charities? (Specifically, by GiveWell?) Or merely that if it is, those doing such comparison are mistaken?
If you’re arguing the former… huh. I will admit, in that case, that almost everything I’ve said in this thread is irrelevant to your point, and I’ve completely failed to follow your argument. If that’s the case, let me know and I’ll back up and re-read your argument in that context.
If you’re arguing the latter, well, I’m happy to grant that, but I’m not sure how relevant it is to Luke’s goal (which I take to be encouraging Holden to endorse SI as a charitable donation).
If SI wants to argue that GiveWell’s expertise with evaluating other charities isn’t relevant to evaluating SI because SI ought not be compared to other charities in the first place, that’s a coherent argument (though it raises the question of why GiveWell ever got involved in evaluating SI to begin with… wasn’t that at SI’s request? Maybe not. Or maybe it was, but SI now realizes that was a mistake. I don’t know.)
But as far as I can tell that’s not the argument SI is making in Luke’s reply to Holden. (Perhaps it ought to be? I don’t know.)
I worry that this conversation is starting to turn around points of phrasing, but… I think it’s worth separating the ideas that you ought to be doing x-risk reduction and that SIAI is the most efficient way to do it, which is why I myself agreed strongly with your own, original phrasing, that the key claim is providing the most efficient x-risk reduction. If someone’s comparing SIAI to Rare Diseases in Cute Puppies or anything else that isn’t about x-risk, I’ll leave that debate to someone else—I don’t think I have much comparative advantage in talking about it.
Further, it seems to me that Holden is implicitly comparing SI to other charitable-giving opportunities when he provides GW’s evaluation of SI, rather than comparing SI to other x-risk-reduction opportunities. I tentatively infer, from the fact that you consider responding to such a comparison something you should leave to others but you’re participating in a discussion of how SI ought to respond to Holden, that you don’t agree that Holden is engaging in such a comparison.
If you’re right, then I don’t know what Holden is doing, and I probably don’t have a clue how Luke ought to reply to Holden.
Holden is comparing SI to other giving opportunities, not just to giving opportunities that may reduce x-risk. That’s not a part of the discussion Eliezer feels he should contribute to, though. I tried to address it in the first two sections of my post above, and then in part 3 I talked about why both FHI and SI contribute unique and important value to the x-risk reduction front.
In other words: I tried to explain that for many people, x-risk is Super Duper Important, and so for those people, what matters is which charities among those reducing x-risk they should support. And then I went on to talk about SI’s value for x-risk reduction in particular.
Much of the debate over x-risk as a giving opportunity in general has to do with Holden’s earlier posts about expected value estimates, and SI’s post on that subject (written by Steven Kaas) is still under development.
here are fewer details required than many strawman versions would have it; and often what seems like a specific detail is actually just an antiprediction, i.e., UFAI is not about a special utility function but about the whole class of non-Friendly utility functions.
If by “utility function” you mean “a computable function, expressible using lambda calculus” (or Turing machine tape or python code, that’s equivalent), then the arguing that majority of such functions lead to a model-based utility-based agent killing you, is a huge stretch, as such functions are not grounded and the correspondence of model with the real world is not a sub-goal to finding maximum of such function.
The claim that an organization is exceptionally well-suited to convert money into existential risk mitigation is an extraordinary one… “Why should I believe that SI is capable of this extraordinary feat?”
SI is not exceptionally well-suited for x-risk mitigation relative to some ideal organization, but relative to the alternatives (as you said). But the reason I gave for this was not “unlike them, we’re focused on the right problem”, though I think that’s true. Instead, the reasons I gave (twice!) were:
SI has successfully concentrated lots of attention, donor support, and human capital. Also, SI has learned many lessons about how to run a very tricky kind of organization. AI risk reduction is a mission that (1) is beyond most people’s time horizons for caring, (2) is hard to understand and visualize, (3) pattern-matches to science fiction and apocalyptic religion, (4) suffers under complicated and necessarily uncertain strategic considerations (compare to the simplicity of bed nets), (5) has a very small pool of people from which to recruit researchers, etc. SI has lots of experience with these issues; experience that probably takes a long time and lots of money to acquire.
As for getting back to the original problem rather than just doing movement-building, well… that’s what I’ve been fighting for since I first showed up at SI, via Open Problems in Friendly AI. And now it’s finally happening, after SPARC.
if you had that “few million dollars a year” revenue stream, and if you had the superhero mathematician, what exactly would you do with them for, say, the first six months? Lay out that project plan in detail, establish what your criteria would be to make sure you were still focused on the right problem three months in, and set up an escrow fund (a la Kickstarter, where the funds are returned if the target is not met) to support that project plan so people who are skeptical of SI’s organizational ability to actually do any of that stuff have a way of supporting the plan IFF they’re wrong about SI, without waiting for their wrongness to be demonstrated before providing the support.
Yes, this is a promising idea. It’s also probably 40-100 hours of work, and there are many other urgent things for us to do as well. That’s not meant as a dismissal, just as a report from the ground of “Okay, yes, everyone’s got a bunch of great ideas, but where are the resources I’m supposed to use to do all those cool things? I’ve been working my ass off but I can’t do even more stuff that people want without more resources.”
It’s also probably 40-100 hours of work, and there are many other urgent things for us to do as well.
Absolutely. As I said in the first place, I hadn’t initially intended to reply to this, as I didn’t think my reactions were likely to be helpful given the situation you’re in. But your followup comment seemed more broadly interested in what people might have found compelling, and less in specific actionable suggestions, than your original post. So I decided to share my thoughts on the former question.
I totally agree that you might not have the wherewithal to do the things that people might find compelling, and I understand how frustrating that is.
It might help emotionally to explicitly not-expect that convincing people to donate large sums of money to your organization is necessarily something that you, or anyone, are able to do with a human amount of effort. Not that this makes the problem any easier, but it might help you cope better with the frustration of being expected to put forth an amount of effort that feels unreasonably superhuman.
Or it might not.
Instead, the reasons I gave (twice!) were: [..]
I’ll observe that the bulk of the text you quote here is not reasons to believe SI is capable of it, but reasons to believe the task is difficult. What’s potentially relevant to the former question is:
SI has successfully concentrated lots of attention, donor support, and human capital. Also, SI has learned many lessons [and] has lots of experience with these issues;
If that is your primary answer to “Why should I believe SI is capable of mitigating x-risk given $?”, then you might want to show why the primary obstacles to mitigating x-risk are psychological/organizational issues rather than philosophical/technical ones, such that SI’s competence at addressing the former set is particularly relevant. (And again, I’m not asserting that showing this is something you are able to do, or ought to be able to do. It might not be. Heck, the assertion might even be false, in which case you actively ought not be able to show it.)
You might also want to make more explicit the path from “we have experience addressing these psychological/organizational issues” to “we are good at addressing these psychological/organizational issues (compared to relevant others)”. Better still might be to focus your attention on demonstrating the latter and ignore the former altogether.
My statement “SI has successfully concentrated lots of attention, donor support, and human capital [and also] has learned many lessons [and] has lots of experience with [these unusual, complicated] issues” was in support of “better to help SI grow and improve rather than start a new, similar AI risk reduction organization”, not in support of “SI is capable of mitigating x-risk given money.”
However, if I didn’t also think SI was capable of reducing x-risk given money, then I would leave SI and go do something else, and indeed will do so in the future if I come to believe that SI is no longer capable of reducing x-risk given money. How to Purchase AI Risk Reduction is a list of things that (1) SI is currently doing to reduce AI risk, or that (2) SI could do almost immediately (to reduce AI risk) if it had sufficient funding.
My statement [..] was in support of “better to help SI grow and improve rather than start a new, similar AI risk reduction organization”, not in support of “SI is capable of mitigating x-risk given money.”
Ah, OK. I misunderstood that; thanks for the clarification. For what it’s worth, I think the case for “support SI >> start a new organization on a similar model” is pretty compelling.
And, yes, the “How to Purchase AI Risk Reduction” series is an excellent step in the direction of making SI’s current and planned activities, and how they relate to your mission, more concrete and transparent. Yay you!
I strongly agree with this comment, and also have a response to Eliezer’s response to it. While I share TheOtherDave’s views, as TheOtherDave noted, he doesn’t necessarily share mine!
It’s not the large consequences that make it a priori unlikely that an organization is really good at mitigating existential risks—it’s the objectively small probabilities and lack of opportunity to learn by trial and error.
If your goal is to prevent heart attacks in chronically obese, elderly people, then you’re dealing with reasonably large probabilities. For example, the AHA estimates that a 60-year-old, 5′8″ man weighing 220 pounds has a 10% chance of having a heart attack in the next 10 years. You can fiddle with their calculator here. This is convenient, because you can learn by trial or error whether your strategies are succeeding. If only 5% of a group of the elderly obese under your treatment have heart attacks over the next 10 years, then you’re probably doing a good job. If 12% have heart attacks, you should probably try another tactic. These are realistic swings to expect from an effective treatment—it might really be possible to cut the rate of heart attacks in half among a particular population.This study, for example, reports a 25% relative risk reduction. If an organization claims to be doing really well at preventing heart attacks, it’s a credible signal—if they weren’t doing well, someone could check their results and prove it, which would be embarrassing for the organization. So, that kind of claim only needs a little bit of evidence to support it.
On the other hand, any given existential risk has a small chance of happening, a smaller chance of being mitigated, and, by definition, little or no opportunity to learn by trial and error. For example, the odds of an artificial intelligence explosion in the next 10 years might be 1%. A team of genius mathematicians funded with $5 million over the next 10 years might be able to reduce that risk to 0.8%. However, this would be an extraordinarily difficult thing to estimate. These numbers come from back-of-the-envelope Fermi calculations, not from hard data. They can’t come from hard data—by definition, existential risks haven’t happened yet. Suppose 10 years go by, and the Singularity Institute gets plenty of funding, and they declare that they successfully reduced the risk of unfriendly AI down to 0.5%, and that they are on track to do the same for the next decade. How would anyone even go about checking this claim?
An unfriendly intelligence explosion, by its very nature, will use tactics and weaknesses that we are not presently aware of. If we learn about some of these weaknesses and correct them, then uFAI would use other weaknesses. The Singularity Institute wants to promote the development of a provably friendly AI; the thought is that if the AI’s source code can be shown mathematically to be friendly, then, as long as the proof is correct and the code is faithfully entered by the programmers and engineers, we can achieve absolute protection against uFAI, because the FAI will be smart enough to figure that out for us. But while it’s very plausible to think that we will face significant AI risk in the next 30 years (i.e., the risk arises under a disjunctive list of conditions), it’s not likely that we will face AI risk, and that AI will turn out to have the capacity to exponentially self-improve, and that there is a theoretical piece of source code that would be friendly, and that at least one such code can provably be shown to be friendly, and that a team of genius mathematicians will actually find that proof, and that these mathematicians will prevail upon a group of engineers to build the FAI before anyone else builds a competing model. This is a conjunctive scenario.
It’s not at all clear to me how just generally having a team of researchers who are moderately familiar with the properties of the mathematical objects that determine the friendliness of AI could do anything to reduce existential risk if this conjunctive scenario doesn’t come to pass. In other words, if we get self-replicating autonomous moderately intelligent AIs, or if it turns out that there’s no such thing as a mathematical proof of friendliness, or if AI first comes about by way of whole brain emulation, then I don’t understand how the Singularity Institute proposes to make itself useful. It’s not a crazy thought that having a ready-made team of seasoned amateurs ready to tackle the problems of AI would yield better results than having to improvise a response team from scratch...but there are other charitable proposals (including proposals to reduce other kinds of x-risk) that I find considerably more compelling. If you want me to donate to the Singularity Institute, you’ll have to come up with a better plan than “This incredibly specific scenario might come to pass and we have a small chance of being able to mitigate the consequences if it does, and even if the scenario doesn’t come to pass, it would still probably be good to have people like us on hand to cope with unspecified similar problems in unspecified ways.”
By way of analogy, a group of forward-thinking humanitarians in 1910 could have plausibly argued that somebody ought to start getting ready to think about ways to help protect the world against the unknown risks of new discoveries in theoretical physics...but they probably would have been better off thinking up interesting ways of stopping World War I or a re-occurrence of the dreaded 1893 Russian Flu. The odds that even a genius team of humanitarian physicists would have anticipated the specific course that cutting-edge physics would take—involving radioactivity, chain reactions, uranium enrichment, and implosion bombs—just from baseline knowledge about Bohr’s model of the atom and Marie Curie’s discovery of radioactivity—are already incredibly low. The further odds that they would take useful steps, in the 1910s, to devise and execute an effective plan to stop the development of nuclear weapons or even to ensure that they were not used irresponsibly, seem astronomically low. The team might manage, in a general way, to help improve the security controls on known radioactive materials—but, as actually happened, new materials were found to be radioactive, and new ways were found of artificially enhancing the radioactivity of a substance, and in any event most governments had secret stockpiles of fissile material that would not have been reached by ordinary security controls.
Today, we know a little something about computer science, and it’s understandable to want to develop expertise in how to keep computers safe—but we can’t anticipate the specific course of discoveries in cutting-edge computer science, and even if we could, it’s unlikely that we’ll be able to take action now to help us cope with them, and if our guesses about the future prove to be close but not exactly accurate, then it’s even more unlikely that the plans we make now based on our guesses will wind up being useful.
That’s why I prefer to donate to charities that are attempting either to (a) alleviate suffering that is currently and verifiably happening, e.g., Deworm the World, or (b) obviously useful for preventing existential risks in a disjunctive way, e.g., the Millenium Seed Bank. I have nothing against the SI—I wish you well and hope you grow and succeed. I think you’re doing better than the vast majority of charities out there. I just also think there are even better uses for my money.
EDIT: Clarified that my views may be different from TheOtherDave’s, even though I agree with his views.
It didn’t have the same cohesiveness as Holden’s original post; there were many more dangling threads, to borrow the same metaphor I used to say why his post was so interesting. You wrote it as a technical, thoroughly cited response and literature review instead of a heartfelt, wholly self-contained Mission Statement, and you made it very clear of that by stating at least 10 times that there was much more info ‘somewhere else’ (in conversations, in people’s heads, yet to be written, etc.).
He wrote an intriguing short story, you wrote a dry paper.
Edit: Also, the answer to every question seems to be, “That will be in Eliezer’s next Sequence,” which postpones further debate.
I doubt random skeptics on the internet followed links to papers. Their thoughts are unlikely to be diagnostic. The group of people who disagree with you and will earnestly go through all the arguments is small. Also, explanations of the form “Yes this was a problem but we’re going to fix it.” are usually just read as rationalizations. It sounds a bit like “Please, sir, give me another chance. I know I can do better” or “I’m sorry I cheated on you. It will never happen again”. The problems actually have to be fixed before the argument is rebutted. It will go better when you can say things like “We haven’t had any problems of this kind in 5 years”.
The group of people who disagree with you and will earnestly go through all the arguments is small.
It is also really small for e.g. perpetual motion device constructed using gears, weights, and levers—very few people would even look at blueprint. It is a bad strategy to dismiss critique on grounds that the critic did not read the whole. Meta considerations work sometimes.
Sensible priors for p(our survival at risk|rather technically unaccomplished are the most aware of the risk) and p(rather technically unaccomplished are the most aware of the risk|our survival at risk) are very, very low. Meanwhile p(rather technically unaccomplished are the most aware of the risk|our survival is not actually at risk) is rather high (its commonly the case that someone’s scared of something). p(high technical ability) is low to start with, p(highest technical ability) is very very low, and p(high technical ability | no technical achievement) is much lower still especially given reasonable awareness that technical achievement is instrumental to being taken seriously. p(ability to self deceive) is not very low, p(ability to deceive oneself and others) is not very low, there is a well known tendency to overspend on safety (see TSA), the notion of the living machine killing it’s creator is very very old, and there’s a plenty of movies to that point. In absence of some sort of achievement that is highly unlikely to be an evaluation error, the probability that you guys matter is very low. That’s partly what Holden told about. The strongest point of his—you are not performing to the standards—even if he buys into AI danger or FAI importance he would not recommend donating to you.
I expected more disagreement than this. Was my post really that persuasive?
I linked this to an IRC channel full of people skeptical of SI. One person commented that
and another that
Which was somewhat my reaction as well—I can’t put a finger on it and say exactly what it is that’s wrong, but somehow it feels like this post isn’t “meaty” enough to elicit much of a reaction, positive or negative. Which on the other feels odd, since e.g. the “SI’s mission assumes a scenario that is far less conjunctive than it initially appears” heading makes an important point that SI hasn’t really communicated well in the past. Maybe it just got buried under the other stuff, or something.
I found the “less conjunctive” section very persuasive, suspect Kaj may be right about it getting burried.
That’s an unfortunate response, given that I offered a detailed DH6-level disagreement (quote the original article directly, and refute the central points), and also offered important novel argumentation not previously published by SI. I’m not sure what else people could have wanted.
If somebody figures out why Kaj and some others had the reaction they did, I’m all ears.
I can’t speak for anyone else, and had been intending to sit this one out, since my reactions to this post were not really the kind of reaction you’d asked for.
But, OK, my $0.02.
The claim that an organization is exceptionally well-suited to convert money into existential risk mitigation is an extraordinary one, and extraordinary claims require extraordinary evidence. This puts a huge burden on you, as the person attempting to provide that evidence.
So, I’ll ask you: do you think your response provides such evidence?
If you do, then your problem seems to be (as others have suggested) one of document organization. Perhaps starting out with an elevator-pitch answer to the question “Why should I believe that SI is capable of this extraordinary feat?” might be a good idea.
Because my take-away from reading this post was “Well, nobody else is better suited to do it, and SI does some cool movement-building stuff (the Sequences, the Rationality Camps, and HPMoR) that attracts smart people and encourages them to embrace a more rational approach to their lives, and SI is fixing some of its organizational and communication problems but we need more money to really make progress on our core mission.”
Which, if I try to turn it into an answer to the initial question, gives me “Well, we’re better-suited than anyone else because, unlike them, we’re focused on the right problem… even though you can’t really tell, because what we are really focused on is movement-building, but once we get a few million dollars and the support of superhero mathematicians, we will totally focus on the right problem, unlike anyone else.”
If that is in fact your answer, then one thing that might help is to make a more credible visible precommitment to that eventuality.
For example: if you had that “few million dollars a year” revenue stream, and if you had the superhero mathematician, what exactly would you do with them for, say, the first six months? Lay out that project plan in detail, establish what your criteria would be to make sure you were still focused on the right problem three months in, and set up an escrow fund (a la Kickstarter, where the funds are returned if the target is not met) to support that project plan so people who are skeptical of SI’s organizational ability to actually do any of that stuff have a way of supporting the plan IFF they’re wrong about SI, without waiting for their wrongness to be demonstrated before providing the support.
If your answer is in fact something else, then stating it more clearly might help.
Reminder: I don’t know if you were committing this particular error internally, but, at the least, the sentence is liable to cause the error externally, so: Large consequences != prior improbability. E.g. although global warming has very large consequences, and even implies that we should take large actions, it isn’t improbable a priori that carbon dioxide should trap heat in the atmosphere—it’s supposed to happen, according to standard physics. And so demanding strong evidence that global warming is anthropogenic is bad probability theory and decision theory. Expensive actions imply a high value of information, meaning that if we happen to have access to cheap, powerfully distinguishing evidence about global warming we should look at it; but if that evidence is not available, then we go from the default extrapolation from standard physics and make policy on that basis—not demand more powerful evidence on pain of doing nothing.
The claim that SIAI is currently best-suited to convert marginal dollars into FAI and/or general x-risk mitigation has large consequences. Likewise claims like “most possible self-improving AIs will kill you, although there’s an accessible small space of good designs”. This is not the same as saying that if the other facts of the world are what they appear at face value to be, these claims should require extraordinary evidence before we believe them.
Since reference class tennis is also a danger (i.e, if you want to conclude that a belief is false, you can always find a reference class in which to put it where most beliefs are false, e.g. classifying global warming as an “apocalyptic belief”), one more reliable standard to require before saying “Extraordinary claims require extraordinary evidence” is to ask what prior belief needs to be broken by the extraordinary evidence, and how well-supported that prior belief may be. Suppose global warming is real—what facet of existing scientific understanding would need to change? None, in fact; it is the absence of anthropogenic global warming that would imply change in our current beliefs, so that’s what would require the extraordinary evidence to power it. In the same sense, an AI showing up as early as 2025, self-improving, and ending the world, doesn’t make us say “What? Impossible!” with respect to any current well-supported scientific belief. And if SIAI manages to get together a pack of topnotch mathematicians and solve the FAI problem, it’s not clear to me that you can pinpoint a currently-well-supported element of the world-model which gets broken.
The idea that the proposition contains too much burdensome detail—as opposed to an extraordinary element—would be a separate discussion. There are fewer details required than many strawman versions would have it; and often what seems like a specific detail is actually just an antiprediction, i.e., UFAI is not about a special utility function but about the whole class of non-Friendly utility functions. Nonetheless, if someone’s thought processes were dominated by model risk, but they nonetheless actually cared about Earth’s survival, and were generally sympathetic to SIAI even as they distrusted the specifics, it seems to me that they should support CFAR, part of whose rationale is explicitly the idea that Earth gets a log(number of rationalists) saving throw bonus on many different x-risks.
I am coming to the conclusion that “extraordinary claims require extraordinary evidence” is just bad advice, precisely because it causes people to conflate large consequences and prior improbability. People are fond of saying it about cryonics, for example.
At least sometimes, people may say “extraordinary claims require extraordinary evidence” when they mean “your large novel claim has set off my fraud risk detector; please show me how you’re not a scam.”
In other words, the caution being expressed is not about prior probabilities in the natural world, but rather the intentions and morals of the claimant.
We need two new versions of the advice, to satisfy everyone.
Version for scientists: “improbable claims require extraordinary evidence”.
Versions for politicians: “inconvenient claims require extraordinary evidence”.
Well, consider strategic point of view. Suppose that a system (humans) is known for it’s poor performance at evaluating the claims without performing direct experimentation. Long, long history of such failures.
Consider also that a false high-impact claim can ruin ability of this system to perform it’s survival function, with again a long history of such events; the damage is proportionally to the claimed impact. (Mayans are a good example, killing people so that the sun will rise tomorrow; great utilitarian rationalists they were; believing that their reasoning is perfect enough to warrant such action. Note that donating to a wrong charity instead of a right one kills people)
When we anticipate that a huge percentage of the claims will be false, we can build the system to require evidence that if the claim was false the system would be in a small probability world (i.e. require that for a claim evidence was collected so that p(evidence | ~claim)/p(evidence | claim) is low), to make the system, once deployed, fall off the cliffs less often. The required strength of the evidence is then increasing with impact of the claim.
It is not an ideal strategy, but it is the one that works given the limitations. There are other strategies and it is not straightforward to improve performance (and easy to degrade performance by making idealized implicit assumptions).
What I meant when I described the claim (hereafter “C”) that SI is better suited to convert dollars to existential risk mitigation than any other charitable organization as “extraordinary” was that priors for C are low (C is false for most organizations, and therefore likely to be false for SI absent additional evidence about SI), not that C has large consequences (although that is true as well).
Yes, this might be a failing of using the wrong reference class (charitable organizations in general) to establish one’s priors., as you suggest. The fact remains that when trying to solicit broad public support, or support from an organization like GiveWell, it’s likely that SI will be evaluated within the reference class of other charities. If using that reference class leads to improperly low priors for C, it seems SI has a few strategic choices:
1) Convince GiveWell, and donors in general, that SI is importantly unlike other charities, and should not be evaluated as though it were like them—in other words, win at reference class tennis.
2) Ignore donors in general and concentrate its attention primarily on potential donors who already use the correct reference class.
3) Provide enough evidence to convince even someone who starts out with improperly low priors drawn from the incorrect reference class of “SI is a charity” to update to a sufficiently high estimate of C that donating money to SI seems reasonable (in practice, I think this is what has happened and is happening with anthropogenic climate change).
4) Look for alternate sources of funding besides charitable donations.
One way to approach strategy #1 is the one you use here—shift the conversation from whether or not SI can actually spend money effectively to mitigate existential risk to whether or not uFAI/FAI by 2025 (or some other near-mode threshold) is plausible.
That’s not a bad tactic; it works pretty well in general.
Your statement was that it was an extraordinary claim that SIAI provided x-risk reduction—why then would SIAI be compared to most other charities, which don’t provide x-risk reduction, and don’t claim to provide x-risk reduction? The AI-risk item was there for comparison of standards, as was global warming; i.e., if you claim that you doubt X because of Y, but Y implies doubting Z, but you don’t doubt Z, you should question whether you’re really doubting X because of Y.
Are you trying to argue that it isn’t in fact being compared to other charities? (Specifically, by GiveWell?) Or merely that if it is, those doing such comparison are mistaken?
If you’re arguing the former… huh. I will admit, in that case, that almost everything I’ve said in this thread is irrelevant to your point, and I’ve completely failed to follow your argument. If that’s the case, let me know and I’ll back up and re-read your argument in that context.
If you’re arguing the latter, well, I’m happy to grant that, but I’m not sure how relevant it is to Luke’s goal (which I take to be encouraging Holden to endorse SI as a charitable donation).
If SI wants to argue that GiveWell’s expertise with evaluating other charities isn’t relevant to evaluating SI because SI ought not be compared to other charities in the first place, that’s a coherent argument (though it raises the question of why GiveWell ever got involved in evaluating SI to begin with… wasn’t that at SI’s request? Maybe not. Or maybe it was, but SI now realizes that was a mistake. I don’t know.)
But as far as I can tell that’s not the argument SI is making in Luke’s reply to Holden. (Perhaps it ought to be? I don’t know.)
I worry that this conversation is starting to turn around points of phrasing, but… I think it’s worth separating the ideas that you ought to be doing x-risk reduction and that SIAI is the most efficient way to do it, which is why I myself agreed strongly with your own, original phrasing, that the key claim is providing the most efficient x-risk reduction. If someone’s comparing SIAI to Rare Diseases in Cute Puppies or anything else that isn’t about x-risk, I’ll leave that debate to someone else—I don’t think I have much comparative advantage in talking about it.
I agree with you on all of those points.
Further, it seems to me that Holden is implicitly comparing SI to other charitable-giving opportunities when he provides GW’s evaluation of SI, rather than comparing SI to other x-risk-reduction opportunities.
I tentatively infer, from the fact that you consider responding to such a comparison something you should leave to others but you’re participating in a discussion of how SI ought to respond to Holden, that you don’t agree that Holden is engaging in such a comparison.
If you’re right, then I don’t know what Holden is doing, and I probably don’t have a clue how Luke ought to reply to Holden.
Holden is comparing SI to other giving opportunities, not just to giving opportunities that may reduce x-risk. That’s not a part of the discussion Eliezer feels he should contribute to, though. I tried to address it in the first two sections of my post above, and then in part 3 I talked about why both FHI and SI contribute unique and important value to the x-risk reduction front.
In other words: I tried to explain that for many people, x-risk is Super Duper Important, and so for those people, what matters is which charities among those reducing x-risk they should support. And then I went on to talk about SI’s value for x-risk reduction in particular.
Much of the debate over x-risk as a giving opportunity in general has to do with Holden’s earlier posts about expected value estimates, and SI’s post on that subject (written by Steven Kaas) is still under development.
If by “utility function” you mean “a computable function, expressible using lambda calculus” (or Turing machine tape or python code, that’s equivalent), then the arguing that majority of such functions lead to a model-based utility-based agent killing you, is a huge stretch, as such functions are not grounded and the correspondence of model with the real world is not a sub-goal to finding maximum of such function.
SI is not exceptionally well-suited for x-risk mitigation relative to some ideal organization, but relative to the alternatives (as you said). But the reason I gave for this was not “unlike them, we’re focused on the right problem”, though I think that’s true. Instead, the reasons I gave (twice!) were:
As for getting back to the original problem rather than just doing movement-building, well… that’s what I’ve been fighting for since I first showed up at SI, via Open Problems in Friendly AI. And now it’s finally happening, after SPARC.
Yes, this is a promising idea. It’s also probably 40-100 hours of work, and there are many other urgent things for us to do as well. That’s not meant as a dismissal, just as a report from the ground of “Okay, yes, everyone’s got a bunch of great ideas, but where are the resources I’m supposed to use to do all those cool things? I’ve been working my ass off but I can’t do even more stuff that people want without more resources.”
Absolutely. As I said in the first place, I hadn’t initially intended to reply to this, as I didn’t think my reactions were likely to be helpful given the situation you’re in. But your followup comment seemed more broadly interested in what people might have found compelling, and less in specific actionable suggestions, than your original post. So I decided to share my thoughts on the former question.
I totally agree that you might not have the wherewithal to do the things that people might find compelling, and I understand how frustrating that is.
It might help emotionally to explicitly not-expect that convincing people to donate large sums of money to your organization is necessarily something that you, or anyone, are able to do with a human amount of effort. Not that this makes the problem any easier, but it might help you cope better with the frustration of being expected to put forth an amount of effort that feels unreasonably superhuman.
Or it might not.
I’ll observe that the bulk of the text you quote here is not reasons to believe SI is capable of it, but reasons to believe the task is difficult. What’s potentially relevant to the former question is:
If that is your primary answer to “Why should I believe SI is capable of mitigating x-risk given $?”, then you might want to show why the primary obstacles to mitigating x-risk are psychological/organizational issues rather than philosophical/technical ones, such that SI’s competence at addressing the former set is particularly relevant. (And again, I’m not asserting that showing this is something you are able to do, or ought to be able to do. It might not be. Heck, the assertion might even be false, in which case you actively ought not be able to show it.)
You might also want to make more explicit the path from “we have experience addressing these psychological/organizational issues” to “we are good at addressing these psychological/organizational issues (compared to relevant others)”. Better still might be to focus your attention on demonstrating the latter and ignore the former altogether.
Thank you for understanding. :)
My statement “SI has successfully concentrated lots of attention, donor support, and human capital [and also] has learned many lessons [and] has lots of experience with [these unusual, complicated] issues” was in support of “better to help SI grow and improve rather than start a new, similar AI risk reduction organization”, not in support of “SI is capable of mitigating x-risk given money.”
However, if I didn’t also think SI was capable of reducing x-risk given money, then I would leave SI and go do something else, and indeed will do so in the future if I come to believe that SI is no longer capable of reducing x-risk given money. How to Purchase AI Risk Reduction is a list of things that (1) SI is currently doing to reduce AI risk, or that (2) SI could do almost immediately (to reduce AI risk) if it had sufficient funding.
Ah, OK. I misunderstood that; thanks for the clarification.
For what it’s worth, I think the case for “support SI >> start a new organization on a similar model” is pretty compelling.
And, yes, the “How to Purchase AI Risk Reduction” series is an excellent step in the direction of making SI’s current and planned activities, and how they relate to your mission, more concrete and transparent. Yay you!
I strongly agree with this comment, and also have a response to Eliezer’s response to it. While I share TheOtherDave’s views, as TheOtherDave noted, he doesn’t necessarily share mine!
It’s not the large consequences that make it a priori unlikely that an organization is really good at mitigating existential risks—it’s the objectively small probabilities and lack of opportunity to learn by trial and error.
If your goal is to prevent heart attacks in chronically obese, elderly people, then you’re dealing with reasonably large probabilities. For example, the AHA estimates that a 60-year-old, 5′8″ man weighing 220 pounds has a 10% chance of having a heart attack in the next 10 years. You can fiddle with their calculator here. This is convenient, because you can learn by trial or error whether your strategies are succeeding. If only 5% of a group of the elderly obese under your treatment have heart attacks over the next 10 years, then you’re probably doing a good job. If 12% have heart attacks, you should probably try another tactic. These are realistic swings to expect from an effective treatment—it might really be possible to cut the rate of heart attacks in half among a particular population.This study, for example, reports a 25% relative risk reduction. If an organization claims to be doing really well at preventing heart attacks, it’s a credible signal—if they weren’t doing well, someone could check their results and prove it, which would be embarrassing for the organization. So, that kind of claim only needs a little bit of evidence to support it.
On the other hand, any given existential risk has a small chance of happening, a smaller chance of being mitigated, and, by definition, little or no opportunity to learn by trial and error. For example, the odds of an artificial intelligence explosion in the next 10 years might be 1%. A team of genius mathematicians funded with $5 million over the next 10 years might be able to reduce that risk to 0.8%. However, this would be an extraordinarily difficult thing to estimate. These numbers come from back-of-the-envelope Fermi calculations, not from hard data. They can’t come from hard data—by definition, existential risks haven’t happened yet. Suppose 10 years go by, and the Singularity Institute gets plenty of funding, and they declare that they successfully reduced the risk of unfriendly AI down to 0.5%, and that they are on track to do the same for the next decade. How would anyone even go about checking this claim?
An unfriendly intelligence explosion, by its very nature, will use tactics and weaknesses that we are not presently aware of. If we learn about some of these weaknesses and correct them, then uFAI would use other weaknesses. The Singularity Institute wants to promote the development of a provably friendly AI; the thought is that if the AI’s source code can be shown mathematically to be friendly, then, as long as the proof is correct and the code is faithfully entered by the programmers and engineers, we can achieve absolute protection against uFAI, because the FAI will be smart enough to figure that out for us. But while it’s very plausible to think that we will face significant AI risk in the next 30 years (i.e., the risk arises under a disjunctive list of conditions), it’s not likely that we will face AI risk, and that AI will turn out to have the capacity to exponentially self-improve, and that there is a theoretical piece of source code that would be friendly, and that at least one such code can provably be shown to be friendly, and that a team of genius mathematicians will actually find that proof, and that these mathematicians will prevail upon a group of engineers to build the FAI before anyone else builds a competing model. This is a conjunctive scenario.
It’s not at all clear to me how just generally having a team of researchers who are moderately familiar with the properties of the mathematical objects that determine the friendliness of AI could do anything to reduce existential risk if this conjunctive scenario doesn’t come to pass. In other words, if we get self-replicating autonomous moderately intelligent AIs, or if it turns out that there’s no such thing as a mathematical proof of friendliness, or if AI first comes about by way of whole brain emulation, then I don’t understand how the Singularity Institute proposes to make itself useful. It’s not a crazy thought that having a ready-made team of seasoned amateurs ready to tackle the problems of AI would yield better results than having to improvise a response team from scratch...but there are other charitable proposals (including proposals to reduce other kinds of x-risk) that I find considerably more compelling. If you want me to donate to the Singularity Institute, you’ll have to come up with a better plan than “This incredibly specific scenario might come to pass and we have a small chance of being able to mitigate the consequences if it does, and even if the scenario doesn’t come to pass, it would still probably be good to have people like us on hand to cope with unspecified similar problems in unspecified ways.”
By way of analogy, a group of forward-thinking humanitarians in 1910 could have plausibly argued that somebody ought to start getting ready to think about ways to help protect the world against the unknown risks of new discoveries in theoretical physics...but they probably would have been better off thinking up interesting ways of stopping World War I or a re-occurrence of the dreaded 1893 Russian Flu. The odds that even a genius team of humanitarian physicists would have anticipated the specific course that cutting-edge physics would take—involving radioactivity, chain reactions, uranium enrichment, and implosion bombs—just from baseline knowledge about Bohr’s model of the atom and Marie Curie’s discovery of radioactivity—are already incredibly low. The further odds that they would take useful steps, in the 1910s, to devise and execute an effective plan to stop the development of nuclear weapons or even to ensure that they were not used irresponsibly, seem astronomically low. The team might manage, in a general way, to help improve the security controls on known radioactive materials—but, as actually happened, new materials were found to be radioactive, and new ways were found of artificially enhancing the radioactivity of a substance, and in any event most governments had secret stockpiles of fissile material that would not have been reached by ordinary security controls.
Today, we know a little something about computer science, and it’s understandable to want to develop expertise in how to keep computers safe—but we can’t anticipate the specific course of discoveries in cutting-edge computer science, and even if we could, it’s unlikely that we’ll be able to take action now to help us cope with them, and if our guesses about the future prove to be close but not exactly accurate, then it’s even more unlikely that the plans we make now based on our guesses will wind up being useful.
That’s why I prefer to donate to charities that are attempting either to (a) alleviate suffering that is currently and verifiably happening, e.g., Deworm the World, or (b) obviously useful for preventing existential risks in a disjunctive way, e.g., the Millenium Seed Bank. I have nothing against the SI—I wish you well and hope you grow and succeed. I think you’re doing better than the vast majority of charities out there. I just also think there are even better uses for my money.
EDIT: Clarified that my views may be different from TheOtherDave’s, even though I agree with his views.
I should say, incidentally (since this was framed as agreement to my comment) that Mass_Driver’s point is rather different from mine.
One sad answer is that your post is boring, which is another way of saying it doesn’t have enough Dark Arts to be sufficiently persuasive.
-Sister Y
It didn’t have the same cohesiveness as Holden’s original post; there were many more dangling threads, to borrow the same metaphor I used to say why his post was so interesting. You wrote it as a technical, thoroughly cited response and literature review instead of a heartfelt, wholly self-contained Mission Statement, and you made it very clear of that by stating at least 10 times that there was much more info ‘somewhere else’ (in conversations, in people’s heads, yet to be written, etc.).
He wrote an intriguing short story, you wrote a dry paper.
Edit: Also, the answer to every question seems to be, “That will be in Eliezer’s next Sequence,” which postpones further debate.
I doubt random skeptics on the internet followed links to papers. Their thoughts are unlikely to be diagnostic. The group of people who disagree with you and will earnestly go through all the arguments is small. Also, explanations of the form “Yes this was a problem but we’re going to fix it.” are usually just read as rationalizations. It sounds a bit like “Please, sir, give me another chance. I know I can do better” or “I’m sorry I cheated on you. It will never happen again”. The problems actually have to be fixed before the argument is rebutted. It will go better when you can say things like “We haven’t had any problems of this kind in 5 years”.
It is also really small for e.g. perpetual motion device constructed using gears, weights, and levers—very few people would even look at blueprint. It is a bad strategy to dismiss critique on grounds that the critic did not read the whole. Meta considerations work sometimes.
Sensible priors for p(our survival at risk|rather technically unaccomplished are the most aware of the risk) and p(rather technically unaccomplished are the most aware of the risk|our survival at risk) are very, very low. Meanwhile p(rather technically unaccomplished are the most aware of the risk|our survival is not actually at risk) is rather high (its commonly the case that someone’s scared of something). p(high technical ability) is low to start with, p(highest technical ability) is very very low, and p(high technical ability | no technical achievement) is much lower still especially given reasonable awareness that technical achievement is instrumental to being taken seriously. p(ability to self deceive) is not very low, p(ability to deceive oneself and others) is not very low, there is a well known tendency to overspend on safety (see TSA), the notion of the living machine killing it’s creator is very very old, and there’s a plenty of movies to that point. In absence of some sort of achievement that is highly unlikely to be an evaluation error, the probability that you guys matter is very low. That’s partly what Holden told about. The strongest point of his—you are not performing to the standards—even if he buys into AI danger or FAI importance he would not recommend donating to you.