The specific case I’ve made for “just build the damn FAI” does not revolve only around astronomical waste, but subtheses like:
Stable goals in sufficiently advanced self-improving minds imply very strong path dependence on the point up to where the mind is sufficiently advanced, and no ability to correct mistakes beyond that point
Friendly superintelligences negate other x-risks once developed
CEV (more generally indirect normativity) implies that there exists a broad class of roughly equivalently-expectedly-good optimal states if we can pass a satisficing test (i.e., in our present state of uncertainty, we would expect something like CEV to be around as good as it gets, given our uncertainty on the details of goodness, assuming you can build a CEV-SI); there is not much gain from making FAI programmers marginally nicer people or giving them marginally better moral advice provided that they are satisficingly non-jerks who try to build an indirectly normative AI
Very little path dependence of the far future on anything except the satisficing test of building a good-enough FAI, because a superintelligent singleton has enough power to correct any bad inertia going into that point
The Fragility of Value thesis implies that value drops off very fast short of a CEV-style FAI (making the kinds of mistakes that people like to imagine leading to flawed-utopia story outcomes will actually just kill you instantly when blown up to a superintelligent scale) so there’s not much point in trying to make things nicer underneath this threshold
FAI is hard (relative to the nearly nonexistent quantity and quality of work that we’ve seen most current AGI people intending to put into it, or mainstream leaders anticipating a need for, or current agencies funding, FAI is technically far harder than that); so most of the x-risk comes from failure to solve the technical problem
Trying to ensure that “Western democracies remain the most advanced and can build AI first” or “ensuring that evil corporations don’t have so much power that they can influence AI-building” is missing the point (and a rather obvious attempt to map the problem into someone’s favorite mundane political hobbyhorse) because goodness is not magically sneezed into the AI from well-intentioned builders, and favorite-good-guy-of-the-week is not making anything like a preliminary good-faith-effort to do high-quality work on technical FAI problems, and probably won’t do so tomorrow either
You can make a case for MIRI with fewer requirements than that, but my model of the future is that it’s just a pass-fail test on building indirectly normative stable self-improving AI, before any event occurs which permanently prevents anyone from building FAI (mostly self-improving UFAI (possibly neuromorphic) but also things like nanotechnological warfare). If you think that building FAI is a done deal because it’s such an easy problem (or because likely builders are already guaranteed to be supercompetent), you’d focus on preventing nanotechnological warfare or something along those lines. To me it looks more like we’re way behind on our dues.
Eliezer, I see this post as a response to Nick Bostrom’s papers on astronomical waste and not a response to your arguments that FAI is an important cause. I didn’t intend for this post to be any kind of evaluation of FAI as a cause or MIRI as an organization supporting that cause. Evaluating FAI as a cause would require lots of analysis I didn’t attempt, including:
Whether many of the claims you have made above are true
How effectively we can expect humanity in general to respond to AI risk absent our intervention
How tractable the cause of improving humanity’s response is
How much effort is currently going into this cause
Whether the cause could productively absorb additional resources
What our leading alternatives are
My arguments are most relevant to evaluating FAI as a cause for people whose interest in FAI depends heavily on their acceptance of Bostrom’s astronomical waste argument. Based on informal conversations, there seem to be a number of people who fall into this category. My own view is that whether FAI is a promising cause is not heavily dependent on astronomical waste considerations, and more dependent on many of these messy details.
Mm, k. I was trying more to say that I got the same sense from your post that Nick Bostrom seems to have gotten at the point where he worried about completely general and perfectly sterile analytic philosophy. Maxipok isn’t derived just from the astronomical waste part, it’s derived from pragmatic features of actual x-risk problems that lead to ubiquitous threshold effects that define “okayness”—most obviously Parfit’s “Extinguishing the last 1000 people is much worse than extinguishing seven billion minus a thousand people” but also including things like satisficing indirect normativity and unfriendly AIs going FOOM. The degree to which x-risk thinking has properly adapted to the pragmatic landscape, not just been derived starting from very abstract a priori considerations, was what gave me that worried sense of overabstraction while reading the OP; and that trigged my reflex to start throwing out concrete examples to see what happened to the abstract analysis in that case.
It may be overly abstract. I’m a philosopher by training and I have a tendency to get overly abstract (which I am working on).
I agree that there are important possibilities with threshold effects, such as extinction and perhaps including your point about threshold effects with indirect normativity AIs. I also think that other scenarios, such as Robin Hanson’s scenario, other decentralized market/democracy set-ups, and other scenarios we can’t think of are live possibilities. More continuous trajectory changes may be very relevant in these other scenarios.
I see Nick’s post as pointing out a nontrivial minimum threshold that x-risk reduction opportunities need to meet in order to be more promising than broad interventions, even within the astronomical waste framework. I agree that you have to look at the particulars of the x-risk reduction opportunities, and of the broad intervention opportunities, that are on the table, in order to argue for focus on broad interventions. But that’s a longer discussion.
I agree but remark that so long as at least one x-risk reduction effort meets this minimum threshold, we can discard all non-xrisk considerations and compare only x-risk impacts to x-risk impacts, which is how I usually think in practice. The question “Can we reduce all impacts to probability of okayness?” seems separate from “Are there mundane-seeming projects which can achieve comparably sized xrisk impacts per dollar as side effects?”, and neither tells us to consider non-xrisk impacts of projects. This is the main thrust of the astronomical waste argument and it seems to me that this still goes through.
There may be highly targeted interventions (other than x-risk reduction efforts) which can have big trajectory changes (including indirectly improving humans’ ability to address x-risks).
With consideration #1 in mind, in deciding whether to support x-risk interventions, one has to consider room for more funding and marginal diminishing returns on investment.
(I recognize that the claims in this comment aren’t present in the comment that you responded to, and that I’m introducing them anew here.)
Mm, I’m not sure what the intended import of your statement is, can we be more concrete? This sounds like something I would say in explaining why I directed some of my life effort toward CFAR—along with, “Because I found that really actually in practice the number of rationalists seemed like a sharp limiting factor on the growth of x-risk efforts, if I’d picked something lofty-sounding in theory that was supposed to have a side impact I probably wouldn’t have guessed as well” and “Keeping in mind that the top people at CFAR are explicitly x-risk aware and think of that impact as part of their job”.
Something along the lines of CFAR could fit the bill. I suspect CFAR could have a bigger impact if it targeted people with stronger focus on global welfare, and/or people with greater influence, than the typical CFAR participant. But I recognize that CFAR is still in a nascent stage, so that it’s necessary to cooptimize for the development of content, and growth.
I believe that there are other interventions that would also fit the bill, which I’ll describe in later posts.
CFAR is indeed so cooptimizing and trying to maximize net impact over time; if you think that a different mix would produce a greater net impact, make the case! CFAR isn’t a side-effect project where you just have to cross your fingers and hope that sort of thing happens by coincidence while the leaders are thinking about something else, it’s explicitly aimed that way.
There may be highly targeted interventions (other than x-risk reduction efforts) which can have big trajectory changes (including indirectly improving humans’ ability to address x-risks).
This is, more or less, the intended purpose behind spending all this energy on studying rationality rather than directly researching FAI. I’m not saying I agree with that reasoning, by the way. But that was the initial reasoning behind Less Wrong, for better or worse. Would we be farther ahead if rather than working on rationality, Eliezer started working immediately on FAI? Maybe, but but likely not. I could see it being argued both ways. But anyway, this shows an actual, very concrete, example of this kind of intervention.
Another issue is that if you accept the claims in the post, when you are comparing the ripple effects of different interventions, you can’t just compare the ripple effects on x-risk. Ripple effects on other trajectory changes are non-negligible as well.
I worry that this post seems very abstract.
The specific case I’ve made for “just build the damn FAI” does not revolve only around astronomical waste, but subtheses like:
Stable goals in sufficiently advanced self-improving minds imply very strong path dependence on the point up to where the mind is sufficiently advanced, and no ability to correct mistakes beyond that point
Friendly superintelligences negate other x-risks once developed
CEV (more generally indirect normativity) implies that there exists a broad class of roughly equivalently-expectedly-good optimal states if we can pass a satisficing test (i.e., in our present state of uncertainty, we would expect something like CEV to be around as good as it gets, given our uncertainty on the details of goodness, assuming you can build a CEV-SI); there is not much gain from making FAI programmers marginally nicer people or giving them marginally better moral advice provided that they are satisficingly non-jerks who try to build an indirectly normative AI
Very little path dependence of the far future on anything except the satisficing test of building a good-enough FAI, because a superintelligent singleton has enough power to correct any bad inertia going into that point
The Fragility of Value thesis implies that value drops off very fast short of a CEV-style FAI (making the kinds of mistakes that people like to imagine leading to flawed-utopia story outcomes will actually just kill you instantly when blown up to a superintelligent scale) so there’s not much point in trying to make things nicer underneath this threshold
FAI is hard (relative to the nearly nonexistent quantity and quality of work that we’ve seen most current AGI people intending to put into it, or mainstream leaders anticipating a need for, or current agencies funding, FAI is technically far harder than that); so most of the x-risk comes from failure to solve the technical problem
Trying to ensure that “Western democracies remain the most advanced and can build AI first” or “ensuring that evil corporations don’t have so much power that they can influence AI-building” is missing the point (and a rather obvious attempt to map the problem into someone’s favorite mundane political hobbyhorse) because goodness is not magically sneezed into the AI from well-intentioned builders, and favorite-good-guy-of-the-week is not making anything like a preliminary good-faith-effort to do high-quality work on technical FAI problems, and probably won’t do so tomorrow either
You can make a case for MIRI with fewer requirements than that, but my model of the future is that it’s just a pass-fail test on building indirectly normative stable self-improving AI, before any event occurs which permanently prevents anyone from building FAI (mostly self-improving UFAI (possibly neuromorphic) but also things like nanotechnological warfare). If you think that building FAI is a done deal because it’s such an easy problem (or because likely builders are already guaranteed to be supercompetent), you’d focus on preventing nanotechnological warfare or something along those lines. To me it looks more like we’re way behind on our dues.
Eliezer, I see this post as a response to Nick Bostrom’s papers on astronomical waste and not a response to your arguments that FAI is an important cause. I didn’t intend for this post to be any kind of evaluation of FAI as a cause or MIRI as an organization supporting that cause. Evaluating FAI as a cause would require lots of analysis I didn’t attempt, including:
Whether many of the claims you have made above are true
How effectively we can expect humanity in general to respond to AI risk absent our intervention
How tractable the cause of improving humanity’s response is
How much effort is currently going into this cause
Whether the cause could productively absorb additional resources
What our leading alternatives are
My arguments are most relevant to evaluating FAI as a cause for people whose interest in FAI depends heavily on their acceptance of Bostrom’s astronomical waste argument. Based on informal conversations, there seem to be a number of people who fall into this category. My own view is that whether FAI is a promising cause is not heavily dependent on astronomical waste considerations, and more dependent on many of these messy details.
Mm, k. I was trying more to say that I got the same sense from your post that Nick Bostrom seems to have gotten at the point where he worried about completely general and perfectly sterile analytic philosophy. Maxipok isn’t derived just from the astronomical waste part, it’s derived from pragmatic features of actual x-risk problems that lead to ubiquitous threshold effects that define “okayness”—most obviously Parfit’s “Extinguishing the last 1000 people is much worse than extinguishing seven billion minus a thousand people” but also including things like satisficing indirect normativity and unfriendly AIs going FOOM. The degree to which x-risk thinking has properly adapted to the pragmatic landscape, not just been derived starting from very abstract a priori considerations, was what gave me that worried sense of overabstraction while reading the OP; and that trigged my reflex to start throwing out concrete examples to see what happened to the abstract analysis in that case.
It may be overly abstract. I’m a philosopher by training and I have a tendency to get overly abstract (which I am working on).
I agree that there are important possibilities with threshold effects, such as extinction and perhaps including your point about threshold effects with indirect normativity AIs. I also think that other scenarios, such as Robin Hanson’s scenario, other decentralized market/democracy set-ups, and other scenarios we can’t think of are live possibilities. More continuous trajectory changes may be very relevant in these other scenarios.
For what it’s worth, I loved this post and don’t think it was very abstract. Then again, my background is also in philosophy.
I see Nick’s post as pointing out a nontrivial minimum threshold that x-risk reduction opportunities need to meet in order to be more promising than broad interventions, even within the astronomical waste framework. I agree that you have to look at the particulars of the x-risk reduction opportunities, and of the broad intervention opportunities, that are on the table, in order to argue for focus on broad interventions. But that’s a longer discussion.
I agree but remark that so long as at least one x-risk reduction effort meets this minimum threshold, we can discard all non-xrisk considerations and compare only x-risk impacts to x-risk impacts, which is how I usually think in practice. The question “Can we reduce all impacts to probability of okayness?” seems separate from “Are there mundane-seeming projects which can achieve comparably sized xrisk impacts per dollar as side effects?”, and neither tells us to consider non-xrisk impacts of projects. This is the main thrust of the astronomical waste argument and it seems to me that this still goes through.
It’s important to note that:
There may be highly targeted interventions (other than x-risk reduction efforts) which can have big trajectory changes (including indirectly improving humans’ ability to address x-risks).
With consideration #1 in mind, in deciding whether to support x-risk interventions, one has to consider room for more funding and marginal diminishing returns on investment.
(I recognize that the claims in this comment aren’t present in the comment that you responded to, and that I’m introducing them anew here.)
Mm, I’m not sure what the intended import of your statement is, can we be more concrete? This sounds like something I would say in explaining why I directed some of my life effort toward CFAR—along with, “Because I found that really actually in practice the number of rationalists seemed like a sharp limiting factor on the growth of x-risk efforts, if I’d picked something lofty-sounding in theory that was supposed to have a side impact I probably wouldn’t have guessed as well” and “Keeping in mind that the top people at CFAR are explicitly x-risk aware and think of that impact as part of their job”.
Something along the lines of CFAR could fit the bill. I suspect CFAR could have a bigger impact if it targeted people with stronger focus on global welfare, and/or people with greater influence, than the typical CFAR participant. But I recognize that CFAR is still in a nascent stage, so that it’s necessary to cooptimize for the development of content, and growth.
I believe that there are other interventions that would also fit the bill, which I’ll describe in later posts.
CFAR is indeed so cooptimizing and trying to maximize net impact over time; if you think that a different mix would produce a greater net impact, make the case! CFAR isn’t a side-effect project where you just have to cross your fingers and hope that sort of thing happens by coincidence while the leaders are thinking about something else, it’s explicitly aimed that way.
This is, more or less, the intended purpose behind spending all this energy on studying rationality rather than directly researching FAI. I’m not saying I agree with that reasoning, by the way. But that was the initial reasoning behind Less Wrong, for better or worse. Would we be farther ahead if rather than working on rationality, Eliezer started working immediately on FAI? Maybe, but but likely not. I could see it being argued both ways. But anyway, this shows an actual, very concrete, example of this kind of intervention.
Another issue is that if you accept the claims in the post, when you are comparing the ripple effects of different interventions, you can’t just compare the ripple effects on x-risk. Ripple effects on other trajectory changes are non-negligible as well.
I agree with Jonah’s point and think my post supports it.