Ok, but that doesn’t increase the probability to ‘medium’ from the very low initial probability of MIRI or another organization benefiting from MIRI’s work solving the extremely hard problem of Friendly AI before anyone else screws it up.
I’ve read all your posts in the threads linked by the OP, and if multiplying the high beneficial impact of Friendly AI by the low probability of success isn’t allowed, I honestly don’t see why I should donate to MIRI.
If this was a regular math problem and it wasn’t world-shakingly important, why wouldn’t you expect that funding workshops and then researchers would cause progress on it?
Assigning a very low probability to progress rests on a sort of backwards reasoning wherein you expect it to be difficult to do things because they are important. The universe contains no such rule. They’re just things.
It’s hard to add a significant marginal fractional pull to a rope that many other people are pulling on. But this is not a well-tugged rope!
I’m not assigning a low probability to progress, I’m assigning a low probability to success.
Where FAI research is concerned, progress is only relevant in as much as it increases the probability of success, right?
Unlike a regular math problem, you’ve only got one shot at getting it right, and you’re in a race with other researchers who are working on an easier problem (seed AI, Friendly or not). It doesn’t matter if you’re 80% of the way there if we all die first.
Edited to add and clarify: Even accounting for the progress I think you’re likely to make, the probability of success remains low, and that’s what I care about.
A non-exhaustive list of some reasons why I strongly disagree with this combination of views:
AI which is not vastly superhuman can be restrained from crime, because humans can be so restrained, and with AI designers have the benefits of the ability to alter the mind’s parameters (desires, intuitions, capability for action, duration of extended thought, etc) inhibitions, test copies in detail, read out its internal states, and so on, making the problem vastly easier (although control may need to be tight if one is holding back an intelligence explosion while this is going on)
If 10-50 humans can solve AI safety (and build AGI!) in less than 50 years, then 100-500 not very superhuman AIs at 1200x speedup should be able to do so in less than a month
There are a variety of mechanisms by which humans could monitor, test, and verify the work conducted by such systems
The AIs can also work on incremental improvements to the control mechanisms being used initially, with steady progress allowing greater AI capabilities to develop better safety measures, until one approaches perfect safety
If a small group can solve all the relevant problems over a few decades, then probably a large portion of the AI community (and beyond) can solve the problems in a fraction of the time if mobilized
As AI becomes visibly closer such mobilization becomes more likely
Developments in other fields may make things much easier: better forecasting, cognitive enhancement, global governance, brain emulations coming first, global peace/governance
The broad shape of AI risk is known and considered much more widely than MIRI: people like Bill Gates and Peter Norvig consider it, but think that acting on it now is premature; if they saw AGI as close, or were creating it themselves, they would attend to the control problems
Paul Christiano, and now you, have started using the phrase “AI control problems”. I’ve gone along with it in my discussions with Paul, but before many people start adopting it maybe we ought to talk about whether it makes sense to frame the problem that way (as opposed to “Friendly AI”). I see a number of problems with it:
Control != Safe or Friendly. An AI can be perfectly controlled by a human and be extremely dangerous, because most humans aren’t very altruistic or rational.
The framing implicitly suggests (and you also explicitly suggest) that the control problem can be solved incrementally. But I think we have reason to believe this is not the case, that in short “safety for superintelligent AIs” = “solving philosophy/metaphilosophy” which can’t be done by “incremental improvements to the control mechanisms being used initially”.
“Control” suggests that the problem falls in the realm of engineering (i.e., belongs to the reference class of “control problems” in engineering, such as “aircraft flight control”), whereas, again, I think the real problem is one of philosophy (plus lots of engineering as well of course, but philosophy is where most of the difficulty lies). This makes a big difference in trying to predict the success of various potential attempts to solve the problem, and I’m concerned that people will underestimate the difficulty of the problem or overestimate the degree to which it’s parallelizable or generally amenable to scaling with financial/human resources, if the problem becomes known as “AI control”.
Do you disagree with this, on either the terminological issue (“AI control” suggests “incremental engineering problem”) or the substantive issue (the actual problem we face is more like philosophy than engineering)? If the latter, I’m surprised not to have seen you talk about your views on this topic earlier, unless you did and I missed it?
Nick Bostrom uses the term in his book, and it’s convenient for separating out pre-existing problems with “we don’t know what to do with our society long term, nor is it engineered to achieve that” and the particular issues raised by AI.
But I think we have reason to believe this is not the case, that in short “safety for superintelligent AIs” = “solving philosophy/metaphilosophy” which can’t be done by “incremental improvements to the control mechanisms being used initially”.
In the situation I mentioned, not vastly superintelligent initially (and capabilities can vary along multiple dimensions, e.g. one can have many compartmentalized copies of an AI system that collectively deliver a huge number of worker-years without any one of them possessing extraordinary capabilities.
What is your take on the strategy-swallowing point: if humans can do it, then not very superintelligent AIs can.
“Control” suggests that the problem falls in the realm of engineering (i.e., belongs to the reference class of “control problems” in engineering, such as “aircraft flight control”)...I’m concerned that people will underestimate the difficulty of the problem or overestimate the degree to which it’s parallelizable or generally amenable to scaling with financial/human resources, if the problem becomes known as “AI control”.
There is an ambiguity there. I’ll mention it to Nick. But, e.g. Friendliness just sounds silly. I use “safe” too, but safety can be achieved just by limiting capabilities, which doesn’t reflect the desire to realize the benefits.
What is your take on the strategy-swallowing point: if humans can do it, then not very superintelligent AIs can.
It’s easy to imagine AIXI-like Bayesian EU maximizers that are powerful optimizers but incapable of solving philosophical problems like consciousness, decision theory, and foundations of mathematics, which seem to be necessary in order to build an FAI. It’s possible that that’s wrong, that one can’t actually get to “not very superintelligent AIs” unless they possessed the same level of philosophical ability that humans have, but it certainly doesn’t seem safe to assume this.
BTW, what does “strategy-swallowing” mean? Just “strategically relevant”, or more than that?
But, e.g. Friendliness just sounds silly. I use “safe” too, but safety can be achieved just by limiting capabilities, which doesn’t reflect the desire to realize the benefits.
I suggested “optimal AI” to Luke earlier, but he didn’t like that. Here are some more options to replace “Friendly AI” with: human-optimal AI, normative AI (rename what I called “normative AI” in this post to something else), AI normativity. It would be interesting and useful to know what options Eliezer considered and discarded before settling on “Friendly AI”, and what options Nick considered and discarded before settling on “AI control”.
(I wonder why Nick doesn’t like to blog. It seems like he’d want to run at least some of the more novel or potentially controversial ideas in his book by a wider audience, before committing them permanently to print.)
It’s easy to imagine AIXI-like Bayesian EU maximizers that are powerful optimizers but incapable of solving philosophical problems like consciousness, decision theory, and foundations of mathematics, which seem to be necessary in order to build an FAI. It’s possible that that’s wrong, that one can’t actually get to “not very superintelligent AIs” unless they possessed the same level of philosophical ability that humans have, but it certainly doesn’t seem safe to assume this.
Such systems, hemmed in and restrained, could certainly work on better AI designs, and predict human philosophical judgments. Predicting human philosophical judgments accurately and reporting those predictions is close enough.
Nick considered and discarded before settling on “AI control”.
“Control problem.”
It seems like he’d want to run at least some of the more novel or potentially controversial ideas in his book by a wider audience, before committing them permanently to print.)
He circulates them to reviewers, in wider circles as the book becomes more developed. And blogging half-finished idea on the internet is exactly what one shouldn’t do if one is worried about committing controversial ideas to print.
And blogging half-finished idea on the internet is exactly what one shouldn’t do if one is worried about committing controversial ideas to print.
In case this is why you don’t tend to talk about your ideas in public either, except in terse (and sometimes cryptic) comments or in fully polished papers, I wanted to note that I’ve never had a cause to regret blogging (or posting to mailing lists) any of my half-finished ideas. As long as your signal to noise ratio is fairly high, people will remember the stuff you get right and forget the stuff you get wrong. The problem I see with committing ideas to print (as in physical books) is that books don’t come with comments attached pointing out all the parts that are wrong or questionable.
Such systems, hemmed in and restrained, could certainly work on better AI designs, and predict human philosophical judgments. Predicting human philosophical judgments accurately and reporting those predictions is close enough.
If such a system is powerful enough to predict human philosophical judgments using its general intelligence, without specifically having been programmed with a correct solution for metaphilosophy, it seems very likely that it would already be strongly superintelligent in many other fields, and hence highly dangerous.
(Since you seem to state this confidently but don’t give much detail, I wonder if you’ve discussed the idea elsewhere at greater length. For example I’m assuming that you’d ask the AI to answer questions like “What would Eliezer conclude about second-order logic after thinking undisturbed about it for 100 years?” but maybe you have something else in mind?)
He circulates them to reviewers, in wider circles as the book becomes more developed. And blogging half-finished idea on the internet is exactly what one shouldn’t do if one is worried about committing controversial ideas to print.
I guess I actually meant “potentially wrong” rather than “controversial”, and I was suggesting that he blog about them after privately circulating to reviewers, but before publishing in print.
For example I’m assuming that you’d ask the AI to answer questions like “What would Eliezer conclude about second-order logic after thinking undisturbed about it for 100 years?” but maybe you have something else in mind?)
The thought is much more bite-sized and tractable questions to work with less individually capable systems (with shorter time horizons, etc) like: “find a machine-checkable proof of this lemma” or “I am going to read one of these 10 papers to try to shed light on my problem using random selection, score each with the predicted rating I will give the paper’s usefulness after reading it.” I discussed this in a presentation at the FHI (focused on WBE, where the issue of unbalanced abilities relative to humans does not apply), and the concepts will be discussed in Nick’s book.
Based on the two examples you give, which seem to suggest a workflow with a substantial portion still being done by humans (perhaps even the majority of the work in the case of the more philosophical parts of the problem), I don’t see how you’d arrive at this earlier conclusion:
If 10-50 humans can solve AI safety (and build AGI!) in less than 50 years, then 100-500 not very superhuman AIs at 1200x speedup should be able to do so in less than a month
Do you have any materials from the FHI presentation or any other writeup that you can share, that might shed more light? If not, I guess I can wait for the book...
It’s hard to discuss your specific proposal without understanding it in more detail, but in general I worry that the kind of AI you suggest would be much better at helping to improve AI capability than at helping to solve Friendliness, since solving technical problems is likely to be more of a strength for such an AI than predicting human philosophical judgments, and unless humanity develops much better coordination abilities than it has now (so that everyone can agree or be forced to refrain from trying to develop strongly superintelligent AIs until the Friendliness problem is solved), such an AI isn’t likely to ultimately contribute to a positive outcome.
Yes, the range of follow-up examples there was a bit too narrow, I was starting from the other end and working back. Smaller operations could be chained, parallelized (with limited thinking time and capacity per unit), used to check on each other in tandem with random human monitoring and processing, and otherwise leveraged to minimize the human bottleneck element.
solve Friendliness, since solving technical problems is likely to be more of a strength for such an AI than predicting human philosophical judgments,
A strong skew of abilities away from those directly useful for Friendliness development makes things worse, but leaves a lot of options. Solving technical problems can let you work to, e.g.
Create AIs with ability distributions directed more towards “philosophical” problems
Create AIs with simple sensory utility functions that are easier to ‘domesticate’ (short time horizons, satiability, dependency on irreplaceable cryptographic rewards that only the human creators can provide, etc)
Solve the technical problems of making a working brain emulation model
Create software to better detect and block unintended behavior,
coordination
Yes, that’s the biggest challenge for such bootstrapping approaches, which depends on the speedup in safety development one gets out of early models, the degree of international peace and cooperation, and so forth.
Smaller operations could be chained, parallelized (with limited thinking time and capacity per unit), used to check on each other in tandem with random human monitoring and processing, and otherwise leveraged to minimize the human bottleneck element.
This strikes me as quite risky, as the amount of human monitoring has to be really minimal in order to solve a 50-year problem in 1 month, and earlier experiences with slower and less capable AIs seem unlikely to adequately prepare the human designers to come up with fully robust control schemes, especially if you are talking about a time scale of months. Can you say a bit more about the conditions you envision where this proposal would be expected to make a positive impact? It seems to me like it might be a very narrow range of conditions. For example if the degree of international peace and cooperation is very high, then a better alternative may be an international agreement to develop WBE tech while delaying AGI, or an international team to take as much time as needed to build FAI while delaying other forms of AGI.
I tend to think that such high degrees of global coordination are implausible, and therefore put most of my hope in scenarios where some group manages to obtain a large tech lead over the rest of the world and are thereby granted a measure of strategic initiative in choosing how best to navigate the intelligence explosion. Your proposal might be useful in such a scenario, if other seemingly safer alternatives (like going for WBE, or having genetically enhanced humans build FAI with minimal AGI assistance) are out of reach due to time or resource constraints. It’s still unclear to me why you called your point “strategy-swallowing” though, or what that phrase means exactly. Can you please explain?
I certainly didn’t say that would be risk-free, but it interacts with other drag factors on very high estimates of risk. In the full-length discussion of it, I pair it with discussion of historical lags in tech development between leader and follower in technological arms races (longer than one month) and factors relative to corporate and international espionage, raise the possibility of global coordination (or at least between the leader and next closest follower), and so on.
It also interacts with technical achievements in producing ‘domesticity’ short of exact unity of will.
It’s still unclear to me why you called your point “strategy-swallowing” though, or what that phrase means exactly.
When strategy A to a large extent can capture the impacts of strategy B.
I certainly didn’t say that would be risk-free, but it interacts with other drag factors on very high estimates of risk.
If you’re making the point as part of an argument against “either Eliezer’s FAI plan succeeds, or the world dies” then ok, that makes sense. ETA: But it seems like it would be very easy to take “if humans can do it, then not very superintelligent AIs can” out of context, so I’d suggest some other way of making this point.
When strategy A to a large extent can capture the impacts of strategy.
Sorry, I’m still not getting it. What does “impacts of strategy” mean here?
it’s convenient for separating out pre-existing problems with “we don’t know what to do with our society long term, nor is it engineered to achieve that” and the particular issues raised by AI.
I don’t think that separation is a good idea. Not knowing what to do with our society long term is a relatively tolerable problem until an upcoming change raises a significant prospect of locking-in some particular vision of society’s future. (Wei-Dai raises similar points in your exchange of replies, but I thought this framing might still be helpful.)
If we are talking about goal definition evaluating AI (and Paul was probably thinking in the context of some sort of indirect normativity), “control” seems like a reasonable fit. The primary philosophical issue for that part of the problem is decision theory.
(I agree that it’s a bad term for referring to FAI itself, if we don’t presuppose a method of solution that is not Friendliness-specific.)
What do you think is MIRI’s probability of having been valuable, conditioned on a nice intergalactic future being true?
More than 10%, definitely. Maybe 50%?
A non-exhaustive list of some reasons why I strongly disagree with this combination of views
Not that it should be used to dismiss any of your arguments, but reading your other comments in this thread I thought you must be playing devil’s advocate. Your phrasing here seems to preclude that possibility.
If you are so strongly convinced that while AGI is a non-negligible x-risk, MIRI will probably turn out to have been without value even if a good AGI outcome were to be eventually achieved, why are you a research fellow there?
I’m puzzled. Let’s consider an edge case: even if MIRI’s factual research turned out to be strictly non-contributing to an eventual solution, there’s no reasonable doubt that it has raised awareness of the issue significantly (in relative terms).
Would the current situation with the CSER or FHI be unchanged or better if MIRI had never existed? Do you think those have a good chance of being valuable in bringing about a good outcome? Answering ‘no’ to the former and ‘yes’ to the latter would transitively imply that MIRI is valuable as well.
I.e. that alone—nevermind actual research contributions—would make it valuable in hindsight, given an eventual positive outcome. Yet you’re strongly opposed to that view?
The “combination of views” includes both high probability of doom, and quite high probability of MIRI making the counterfactual difference given survival. The points I listed address both.
If you are so strongly convinced that while AGI is a non-negligible x-risk, MIRI will probably turn out to have been without value even if a good AGI outcome were to be eventually achieved, why are you a research fellow there?
I think MIRI’s expected impact is positive and worthwhile. I’m glad that it exists, and that it and Eliezer specifically have made the contributions they have relative to a world in which they never existed. A small share of the value of the AI safety cause can be quite great. That is quite consistent with thinking that “medium probability” is a big overestimate for MIRI making the counterfactual difference, or that civilization is almost certainly doomed from AI risk otherwise.
Lots of interventions are worthwhile even if a given organization working on them is unlikely to make the counterfactual difference. Most research labs working on malaria vaccines won’t invent one, most political activists won’t achieve big increases in foreign aid or immigration levels or swing an election, most counterproliferation expenditures won’t avert nuclear war, asteroid tracking was known ex ante to be far more likely to discover we were safe than that there was an asteroid on its way and ready to be stopped by a space mission.
The threshold for an x-risk charity of moderate scale to be worth funding is not a 10% chance of literally counterfactually saving the world from existential catastrophe. Annual world GDP is $80,000,000,000,000, and wealth including human capital and the like will be in the quadrillions of dollars. A 10% chance of averting x-risk would be worth trillions of present dollars.
We’ve spent tens of billions of dollars on nuclear and bio risks, and even $100,000,000+ on asteroids (averting dinosaur-killer risk on the order of 1 in 100,000,000 per annum). At that exchange rate again a 10% x-risk impact would be worth trillions of dollars, and governments and philanthropists have shown that they are ready to spend on x-risk or GCR opportunities far, far less likely to make a counterfactual difference than 10%.
I see. We just used different thresholds for valuable, you used “high probability of MIRI making the counterfactual difference given survival”, while for me just e.g. speeding Norvig/Gates/whoever a couple years along the path until they devote efforts to FAI would be valuable, even if it were unlikely to Make The Difference (tm).
Whoever would turn out to have solved the problem, it’s unlikely that their AI safety evaluation process (“Should I do this thing?”) would work in a strict vacuum, i.e. whoever will one day have evaluated the topic and made up their mind to Save The World will be highly likely to have stumbled upon MIRI’s foundational work. Given that at least some of the steps in solving the problem are likely to be quite serial (sequential) in nature, the expected scenario would be that MIRI’s legacy would at least provide some speed-up; a contribution which, again, I’d call valuable, even if it were unlikely to make or break the future.
If the Gates Foundation had someone evaluate the evidence for AI-related x-risk right now, you probably wouldn’t expect MIRI research, AI researcher polls, philosophical essays etc. to be wholly disregarded.
I used that threshold because the numbers being thrown around in the thread were along those lines, and are needed for the “medium probability” referred to in the OP. So counterfactual impact of MIRI never having existed on x-risk is the main measure under discussion here. I erred in quoting your sentence in a way that might have made that hard to interpret.
If the Gates Foundation had someone evaluate the evidence for AI-related x-risk right now, you probably wouldn’t expect MIRI research, AI researcher polls, philosophical essays etc. to be wholly disregarded.
That’s right, and one reason that I think that MIRI’s existence has reduced expected x-risk, although by less than a 10% probability.
The view presented by Furcas, of probable doom, and “[m]ore than 10%, definitely. Maybe 50%” probability that MIRI will be valuable given the avoidance of doom, which in the context of existential risk seems to mean averting the risk.
It seems to me that if I believed what I infer you believe, I would be donating to MIRI while frantically trying to figure out some way to have my doomed world actually be saved.
It seems to me that if I believed what I infer you believe, I would be donating to MIRI
Why? You (and everybody else) will almost certainly fail anyway, and you say I shouldn’t multiply this low probability by the utility of saving the world.
while frantically trying to figure out some way to have my doomed world actually be saved.
The only way I see is what MIRI is doing.
Edited to add: While this is interesting, what I was really asking in my first post is, if you think the odds of MIRI succeeding are not low, why do you think so?
Because sometimes the impossible can be done, and I don’t know how to estimate the probability of that. What would you have estimated in advance, without knowing the result, was the chance of success for the AI-Box Experiment? How about if I told you that I was going to write the most popular Harry Potter fanfiction in the world and use it to recruit International Mathematical Olympiad medalists? There may be true impossibilities in this world. Eternal life may be one such, if the character of physical law is what is it appears to be, to our sorrow. I do not think that FAI is one of those. So I am going to try. We can work out what the probability of success was after we have succeeded. The chance which is gained is not gained by turning away or by despair, but by continuing to engage with and attack the problem, watching for opportunities and constantly advancing.
If you don’t believe me about that aspect of heroic epistemology, feel free not to believe me about not multiplying small probabilities either.
Not easily. Antiantiheroic epistemology might be a better term, i.e., I think that a merely accurate epistemology doesn’t have a built-in mechanism which prevents people from thinking they can do things because the outside view says it’s nonvirtuous to try to distinguish yourself within reference class blah. Antiantiheroic epistemology doesn’t say that it’s possible to distinguish yourself within reference class blah so much as it thinks that the whole issue is asking the wrong question and you should mostly be worrying about staying engaged with the object-level problem because this is how you learn more and gain the ability to take opportunities as they arrive. An antiheroic epistemology that throws up some reference class or other saying this is impossible will regard you as trying to distinguish yourself within this reference class, but this is not what the antiantiheroic epistemology is actually about; that’s an external indictment of nonvirtuosity arrived at by additional modus ponens to conclusions on which antiantiheroic epistemology sees no reason to expend cognitive effort.
Obviously from my perspective non-antiheroic epistemology cancels out to mere epistemology, simpler for the lack of all this outside-view-social-modesty wasted motion, but to just go around telling you “That’s not how epistemology works, of course!” would be presuming a known standard which is logically rude (I think you are doing this, though not too flagrantly).
An archetypal example of antiantiheroic epistemology is Harry in Methods of Rationality, who never bothers to think about any of this reference class stuff or whether he’s being immodest, just his object-level problems in taking over the universe, except once when Hermione challenges him on it and Harry manages to do one thing a normal wizard can’t. Harry doesn’t try to convince himself of anything along those lines, or think about it without Hermione’s prompting. It just isn’t something that occurs to him might be a useful thought process.
I don’t think it’s a useful thought process either, and rationalizing elaborate reasons why I’m allowed to be a hero wouldn’t be useful either (Occam’s Imaginary Razor: decorating my thought processes with supportive tinsel will just slow down any changes I need to make), which is why I tend to be annoyed by the entire subject and wish people would get back to the object level instead of meta demands for modesty that come with no useful policy suggestions about ways to do anything better. Tell me a better object-level way to save the world and we can talk about my doing that instead.
“Antiantiheroic epistemology might be a better term, i.e., I think that a merely accurate epistemology doesn’t have a built-in mechanism which prevents people from thinking they can do things because the outside view says it’s nonvirtuous to try to distinguish yourself within reference class blah. ”
Taken literally, I can’t possibly disagree with this, but it doesn’t seem to answer my question, which is “where is the positive evidence that one is not supposed to ignore.” I favor combining many different kinds of evidence, including sparse data. And that can and does lead to very high expectations for particular individuals.
For example, several of my college fraternity brothers are now billionaires. Before facebook Mark Zuckerberg was clearly the person with the highest entrepreneurial potential that I knew, based on his intelligence, motivation, ambition, and past achievements in programming, business, and academics. People described him to me as resembling a young Bill Gates. His estimated expected future wealth based on that data if pursuing entrepreneurship, and informed by the data about the relationship of all of the characteristics I could track with it, was in the 9-figure range. Then add in that facebook was a very promising startup (I did some market sizing estimates for it, and people who looked at it and its early results were reliably impressed).
Moving from entrepreneurship to politics, one can predict success to a remarkable degree with evidence like “Eton graduate, Oxford PPE graduate with first class degree, Oxford Union leader, interested in politics, starts in an entry-level political adviser job with a party.” See this post or this paper. Most of the distance in log odds to reliably becoming Prime Minister, let alone Member of Cabinet or Parliament, can be crossed with objective indicators. Throw in a bit more data about early progress, media mentions, and the like and the prediction improves still more.
I would then throw in other evidence, like the impressiveness of the person’s public speaking relative to other similar people, their number and influence of friends and contacts in high places relative to other similar types (indicating both social capital, and skill at getting more), which could improve or worsen the picture. There is still a sizable chunk of randomness in log terms, as political careers are buffeted by switches in party control, the economy, the rise and fall of factions that carry members with them, and other hard-to-control factors at many stages. So I can and do come to expect that someone will probably get federal political office, and have a good shot at Cabinet, and less so for PM. But within the real distribution of characteristics I won’t be convinced that a young person will probably become PM, which would require almost zero noise.
In science I can be convinced a young star is a good prospect for Nobel or Fields medal caliber work. But I would need stronger evidence than we have seen for anyone to expect that they would do this 10 times (since no one has done so). I am sympathetic to Wei Dai’s comment
Carl referenced “Staring Into the Singularity” as an early indicator of your extraordinary verbal abilities (which explains much if not all of your subsequent successes). It suggests that’s how you initially attracted his attention. The same is certainly true for me. I distinctly recall saying to myself “I should definitely keep track of this guy” when I read that, back in the extropian days. Is that enough for you to count as “people who you met because of their previous success”?
To state my overall position on the topic being discussed, I think according to “non-heroic epistemology”, after someone achieves an “impossible success”, you update towards them being able to achieve further successes of roughly the same difficulty and in related fields that use similar skills, but the posterior probabilities of them solving much more difficult problems or in fields that use very different skills remain low (higher relative to the prior, but still low in an absolute sense). Given my understanding of the distribution of cognitive abilities in humans, I don’t see why I would ever “give up” this epistemology, unless you achieved a level of success that made me suspect that you’re an alien avatar or something.
I would be quite surprised to see you reliably making personal mathematical contributions at the level of the best top math and AI people. I would not be surprised to see MIRI workshop participants making progress on the problems at a level consistent with the prior evidence of their ability, and somewhat higher per unit time because workshops harvest ideas generated over a longer period, are solely dedicated to research, have a lot of collaboration and cross-fertilization, and may benefit from improved motivation and some nice hacking of associated productivity variables. And I would not be surprised at a somewhat higher than typical rate of interesting (to me, etc) results because of looking at strange problems.
I would be surprised if the strange problems systematically deliver relatively huge gains on actual AI problems (and this research line is supposed to deliver AGI as a subset of FAI before others get AGI so it must have great utility in AGI design), i.e. if the strange problems are super-promising by the criteria that Pearl or Hinton or Ng or Norvig are using but neglected by blunder. I would be surprised if the distance to AGI is crossable in 20 years.
I don’t think it’s a useful thought process either, and rationalizing elaborate reasons why I’m allowed to be a hero wouldn’t be useful either (Occam’s Imaginary Razor: decorating my thought processes with supportive tinsel will just slow down any changes I need to make), which is why I tend to be annoyed by the entire subject and wish people would get back to the object level instead of meta demands for modesty that come with no useful policy suggestions about ways to do anything better. Tell me a better object-level way to save the world and we can talk about my doing that instead.
You are asking other people for their money and time, when they have other opportunities. To do that they need an estimate of the chance of MIRI succeeding, considering things like AI timelines, the speed of takeoff given powerful AI, competence of other institutions, the usefulness of MIRI’s research track, the feasibility of all alternative solutions to AI risk/AI control problems, how much MIRI-type research will be duplicated by researchers interested for other reasons over what timescales, and many other factors including the ability to execute given the difficulty of the problems and likelihood of relevance. So they need adequate object-level arguments about those contributing factors, or some extraordinary evidence to trust your estimates of all of them over the estimates of others without a clear object-level case. Some of the other opportunities available to them that they need to compare against MIRI:
Build up general altruistic capacities through things like the effective altruist movement or GiveWell’s investigation of catastrophic risks, (which can address AI in many ways, including ones now not visible, and benefit from much greater resources as well as greater understanding from being closer to AI); noting that these seem to scale faster and spill over
Invest money in an investment fund for the future which can invest more (in total and as a share of effort) when there are better opportunities, either by the discovery of new options, or the formation of better organizations or people (which can receive seed funding from such a trust)
Enhance decision-making and forecasting capabilities with things like the IARPA forecasting tournaments, science courts, etc, to improve reactions to developments including AI and others (recalling that most of the value of MIRI in your model comes from major institutions being collectively foolish or ignorant regarding AI going forward)
Prediction markets, meta-research, and other institutional changes
Work like Bostrom’s, seeking out crucial considerations and laying out analysis of issues such as AI risk for the world to engage with and to let key actors see the best arguments and reasons bearing on the problems
Pursue cognitive enhancement technologies or education methods (you give CFAR in this domain) to improve societal reaction to such problems
Find the most effective options for synthetic biology threats (GiveWell will be looking through them) and see if that is a more promising angle
You are asking other people for their money and time, when they have other opportunities. To do that they need an estimate of the chance of MIRI succeeding
No they don’t; they could be checking relative plausibility of causing an OK outcome without trying to put absolute numbers on a probability estimate, and this is reasonable due to the following circumstances:
The life lesson I’ve learned is that by the time you really get anywhere, if you get anywhere, you’ll have encountered some positive miracles, some negative miracles, your plans will have changed, you’ll have found that the parts which took the longest weren’t what you thought they would be, and that other things proved to be much easier than expected. Your successes won’t come through foreseen avenues, and neither will your failures. But running through it all will be the fundamental realization that everything you accomplished, and all the unforeseen opportunities you took advantage of, were things that you would never have received if you hadn’t attacked the hardest part of the problem that you knew about straight-on, without distraction.
How do you estimate probabilities like that? I honestly haven’t a clue. Now, we all still have to maximize expected utility, but the heuristic I’m applying to do that (which at the meta level I think is the planning heuristic with the best chance of actually working) is to ask “Is there any better way of attacking the hardest part of the problem?” or “Is there any better course of action which doesn’t rely on someone else performing a miracle?” So far as I can tell, these other proposed courses of action don’t attack the hardest part of the problem for humanity’s survival, but rely on someone else performing a miracle. I cannot make myself believe that this would really actually work. (And System 2 agrees that System 1′s inability to really believe seems well-founded.)
Since I’m acting on such reasons and heuristics as “If you don’t attack the hardest part of the problem, no one else will” and “Beware of taking the easy way out” and “Don’t rely on someone else to perform a miracle”, I am indeed willing to term what I’m doing “heroic epistemology”. It’s just that I think such reasoning is, you know, actually correct and normative under these conditions.
If you don’t mind mixing the meta-level and the object-level, then I find any reasoning along the lines of “The probability of our contributing to solving FAI is too low, maybe we can have a larger impact by working on synthetic biology defense and hoping a miracle happens elsewhere” much less convincing than the meta-level observation, “That’s a complete Hail Mary pass, if there’s something you think is going to wipe out humanity then just work on that directly as your highest priority.” All the side cleverness, on my view, just adds up to losing the chance that you get by engaging directly with the problem and everything unforeseen that happens from there.
Another way of phrasing this is that if we actually win, I fully expect the counterfactual still-arguing-about-this version of 2013-Carl to say, “But we succeeded through avenue X, while you were then advocating avenue Y, which I was right to say wouldn’t work.” And to this the counterfactual reply of Eliezer will be, “But Carl, if I’d taken your advice back then, I wouldn’t have stayed engaged with the problem long enough to discover and comprehend avenue X and seize that opportunity, and this part of our later conversation was totally foreseeable in advance.” Hypothetical oblivious!Carl then replies, “But the foreseeable probability should still have been very low” or “Maybe you or someone else would’ve tried Y without that detour, if you’d worked on Z earlier” where Z was not actually uniquely suggested as the single best alternative course of action at the time. If there’s a reply that counterfactual non-oblivious Carl can make, I can’t foresee it from here, under those hypothetical circumstances unfolding as I describe (and you shouldn’t really be trying to justify yourself under those hypothetical circumstances, any more than I should be making excuses in advance for what counterfactual Eliezer says after failing, besides “Oops”).
My reasoning here is, from my internal perspective, very crude, because I’m not sure I really actually trust non-crude reasoning. There’s this killer problem that’s going to make all that other stuff pointless. I see a way to make progress on it, on the object level; the next problem up is visible and can be attacked. (Even this wasn’t always true, and I stuck with the problem anyway long enough to get to the point where I could state the tiling problem.) Resources should go to attacking this visible next step on the hardest problem. An exception to this as top priority maximization was CFAR, via “teaching rationality demonstrably channels more resources toward FAI; and CFAR which will later be self-sustaining is just starting up; plus CFAR might be useful for a general saving throw bonus; plus if a rational EA community had existed in 1996 it would have shaved ten years off the timeline and we could easily run into that situation again; plus I’m not sure MIRI will survive without CFAR”. Generalizing, young but hopefully self-sustaining initiatives can be plausibly competitive with MIRI for small numbers of marginal dollars, provided that they’re sufficiently directly linked to FAI down the road. Short of that, it doesn’t really make sense to ignore the big killer problem and hope somebody else handles it later. Not really actually.
If the year was 1960, which would you rather have?
10 smart people trying to build FAI for 20 years, 1960-1980
A billion dollars, a large supporting movement, prediction markets and science courts that make the state of the evidence on AI transparent, and teams working on FAI, brain emulations, cognitive enhancement, and more but starting in 1990 (in expectation closer to AI)
At any given time there are many problems where solutions are very important, but the time isn’t yet right to act on them, rather than on the capabilities to act on them, and also to deal with the individually unexpected problems that come along so regularly. Investment-driven and movement-building-driven discount rates are relevant even for existential risk.
GiveWell has grown in influence much faster than the x-risk community while working on global health, and are now in the process of investigating and pivoting towards higher leverage causes, with global catastrophic risk among the top three under consideration.
I’d rather have both, hence diverting some marginal resources to CFAR until it was launched, then switching back to MIRI. Is there a third thing that MIRI should divert marginal resources to right now?
I have just spent a month in England interacting extensively with the EA movement here (maybe your impressions from the California EA summit differ, I’d be curious to hear). Donors interested in the far future are also considering donations to the following (all of these are from talks with actual people making concrete short-term choices; in addition to donations, people are also considering career choices post-college):
80,000 Hours, CEA and other movement building and capacity-increasing organizations (including CFAR), which also increase non-charity options (e.g. 80k helping people going into scientific funding agencies and political careers where they will be in a position to affect research and policy reactions to technologies relevant to x-risk and other trajectory changes)
AMF/GiveWell charities to keep GiveWell and the EA movement growing while actors like GiveWell, Paul Christiano, Nick Beckstead and others at FHI, investigate the intervention options and cause prioritization, followed by organization-by-organization analysis of the GiveWell variety, laying the groundwork for massive support for the top far future charities and organizations identified by said processes
Finding ways to fund such evaluation with RFMF, e.g. by paying for FHI or CEA hires to work on them
The FHI’s other work
A donor-advised fund investing the returns until such evaluations or more promising opportunities present themselves or are elicited by the fund (possibilities like Drexler’s nanotech panel, extensions of the DAGGRE methods, a Bayesian aggregation algorithm that greatly improves extraction of scientific expert opinion or science courts that could mobilize much more talent and resources to neglected problems with good cases, some key steps in biotech enhancement)
That’s why Peter Hurford posted the OP, because he’s an EA considering all these options, and wants to compare them to MIRI.
That is a sort of discussion my brain puts in a completely different category. Peter and Carl, please always give me a concrete alternative policy option that (allegedly) depends on a debate, if such is available; my brain is then far less likely to label the conversation “annoying useless meta objections that I want to just get over with as fast as possible”.
AMF/GiveWell charities to keep GiveWell and the EA movement growing while actors like GiveWell, Paul Christiano, Nick Beckstead and others at FHI, investigate the intervention options and cause prioritization, followed by organization-by-organization analysis of the GiveWell variety, laying the groundwork for massive support for the top far future charities and organizations identified by said processes
Cool, if MIRI keeps going, they might be able to show FAI as top focus with adequate evidence by the time all of this comes together.
Well, in collaboration with FHI. As soon as Bostrom’s Superintelligence is released, we’ll probably be building on and around that to make whatever cases we think are reasonable to make.
they could be checking relative plausibility of causing an OK outcome without trying to put absolute numbers on a probability estimate, and this is reasonable due to the following circumstances
Build up general altruistic capacities through things like the effective altruist movement or GiveWell’s investigation of catastrophic risks
I read every blog post they put out.
Invest money in an investment fund for the future which can invest more [...] when there are better opportunities
I figure I can use my retirement savings for this.
(recalling that most of the value of MIRI in your model comes from major institutions being collectively foolish or ignorant regarding AI going forward)
I thought it came from them being collectively foolish or ignorant regarding Friendliness rather than AGI.
Prediction markets, meta-research, and other institutional changes
Meh. Sounds like Lean Six Sigma or some other buzzword business process improvement plan.
Work like Bostrom’s
Luckily, Bostrom is already doing work like Bostrom’s.
Pursue cognitive enhancement technologies or education methods
Too indirect for my taste.
Find the most effective options for synthetic biology threats
Not very scary compared to AI. Lots of known methods to combat green goo.
If you don’t believe me about that aspect of heroic epistemology, feel free not to believe me about not multiplying small probabilities either.
Multiplying small probabilities seems fine to me, whereas I really don’t get “heroic epistemology”.
You seem to be suggesting that “heroic epistemology” and “multiplying small probabilities” both lead to the same conclusion: support MIRI’s work on FAI. But this is the case only if working on FAI has no negative consequences. In that case, “small chance of success” plus “multiplying small probabilities” warrants working on FAI, just as “medium probability of success” and “not multiplying small probabilities” does. But since working on FAI does have negative consequences, namely shortening AI timelines and (in the later stages) possibly directly causing the creation an UFAI, just allowing multiplication by small probabilities is not sufficient to warrant working on FAI if the probability of success is low.
I am really worried that you are justifying your current course of action through a novel epistemology of your own invention, which has not been widely vetted (or even widely understood). Most new ideas are wrong, and I think you ought to treat your own new ideas with deeper suspicion.
I’m a reactionary, not an innovator, dammit! Reacting against this newfangled antiheroic ‘reference class’ claim that says we ought to let the world burn because we don’t have enough of a hero license!
Ahem.
I’m also really unconvinced by the claim that this work could reasonably have expected net negative consequences. I’m worried about the dynamics and evidence of GiveDirectly. But I don’t think GD has negative consequences, that would be a huge stretch. It’s possible maybe but it’s certainly not the arithmetic expectation and with that said, I worry that this ‘maybe negative’ stuff is impeding EA motivation generally, there is much that is ineffectual to be wary of, and missed opportunity costs, but trying to warn people against reverse or negative effects seems pretty perverse for anything that has made it onto Givewell’s Top 3, or CFAR, or FHI, or MIRI. Info that shortens AI timelines should mostly just not be released publicly and I don’t see any particularly plausible way for a planet to survive without having some equivalent of MIRI doing MIRI’s job, and the math thereof should be started as early as feasible.
I’m a reactionary, not an innovator, dammit! Reacting against this newfangled antiheroic ‘reference class’ claim that says we ought to let the world burn because we don’t have enough of a hero license!
“Reference class” to me is just an intuitive way of thinking about updating on certain types of evidence. It seems like you’re saying that in some cases we ought to use the inside view, or weigh object-level evidence more heavily, but 1) I don’t understand why you are not worried about “inside view” reasoning typically producing overconfidence or why you don’t think it’s likely to produce overconfidence in this case, and 2) according to my inside view, the probability of a team like the kind you’re envisioning solving FAI is low, and a typical MIRI donor or potential donor can’t be said to have much of an inside view on this matter, and has to use “reference class” reasoning. So what is your argument here?
I’m also really unconvinced by the claim that this work could reasonably have expected net negative consequences.
Every AGI researcher is unconvinced by that, about their own work.
but trying to warn people against reverse or negative effects seems pretty perverse for anything that has made it onto Givewell’s Top 3, or CFAR, or FHI, or MIRI
CFAR and MIRI were created by you, to help you build FAI. If FHI has endorsed your plan for building FAI (as opposed to endorsing MIRI as an organization that’s a force for good overall, which I’d agree with and I’ve actually provided various forms of support to MIRI because of that), I’m not aware of it. I also think I’ve thought enough about this topic to give some weight to my own judgments, so even if FHI does endorse your plan, I’d want to see their reasoning (which I definitely have not seen) and not just take their word. I note that Givewell does publish its analyses and are not asking people to just trust it.
Info that shortens AI timelines should mostly just not be released publicly
My model of FAI development says that you have to get most of the way to being able to build an AGI just to be able to start working on many Friendliness-specific problems, and solving those problems would take a long time relative to finishing rest of the AGI capability work. Unless you’re flying completely below the radar, which is incompatible with your plan for funding via public donations, what is stopping your unpublished results from being stolen or leaked in the mean time? And just gathering 10 to 50 world-class talents to work on FAI is likely to spur competition and speed up AGI progress. The fact that you seem to be overconfident about your chance of success also suggests that you are likely to be overconfident in other areas, and indicates a high risk of accidental UFAI creation (relative to the probability of success, not necessarily high in absolute terms).
My model of FAI development says that you have to get most of the way to being able to build an AGI just to be able to start working on many Friendliness-specific problems, and solving those problems would take a long time relative to finishing rest of the AGI capability work.
Agree, though luckily there are other Friendliness-specific problems that we can start solving right now.
Unless you’re flying completely below the radar, which is incompatible with your plan for funding via public donations, what is stopping your unpublished results from being stolen or leaked in the mean time?
Presumably, security technology similar to what has mostly worked for the Manhattan project, secret NSA projects, etc. But yeah, it’s a big worry. But what did you have in mind about flying completely under the radar? There are versions of an FAI team that could be funded pretty discretely by just one person.
Agree, though luckily there are other Friendliness-specific problems that we can start solving right now.
I listed some in another comment, but they are not the current focus of MIRI research. Instead, MIRI is focusing on FAI-relevant problems that do shorten AI timelines (i.e., working on “get most of the way to being able to build an AGI”), such as decision theory and logical uncertainty.
Presumably, security technology similar to what has mostly worked for the Manhattan project, secret NSA projects, etc.
As I noted in previous comments, the economics of information security seems to greatly favor the offense, so you have to have to spend much more resources than your attackers in order to maintain secrets.
But what did you have in mind about flying completely under the radar? There are versions of an FAI team that could be funded pretty discretely by just one person.
That’s probably the best bet as far as avoiding having your results stolen, but introduces other problems, such as how to attract talent, and whether you can fund a large enough team that way. (Small teams might increase the chances of accidental UFAI creation, since there would be less people to look out for errors.) And given that Eliezer is probably already on the radar of most AGI researchers, you’d have to find a replacement for him on this “under the radar” team.
I should ask this question now rather than later: Is there a concrete policy alternative being considered by you?
Every AGI researcher is unconvinced by that, about their own work.
And on one obvious ‘outside view’, they’d be right—it’s a very strange and unusual situation, which took me years to acknowledge, that this one particular class of science research could have perverse results. There’s many attempted good deeds which have no effect, but complete backfires make the news because they’re rare.
(Hey, maybe the priors in favor of good outcomes from the broad reference class of scientific research are so high that we should just ignore the inside view which says that AGI research will have a different result!)
And even AGI research doesn’t end up making it less likely that AGI will be developed, please note—it’s not that perverse in its outcome.
Is there a concrete policy alternative being considered by you?
I’m currently in favor of of the following:
research on strategies for navigating intelligence explosion (what I called “Singularity Strategies”)
pushing for human intelligence enhancement
pushing for a government to try to take an insurmountable tech lead via large scale intelligence enhancement
research into a subset of FAI-related problems that do not shorten AI timelines (at least as far as we can tell), such as consciousness, normative ethics, metaethics, metaphilosophy
advocacy/PR/academic outreach on the dangers of AGI progress
There’s many attempted good deeds which have no effect, but complete backfires make the news because they’re rare.
What about continuing physics research possibly leading to a physics disaster or new superweapons, biotech research leading to biotech disasters, nanotech research leading to nanotech disasters, WBE research leading to value drift and Malthusian outcomes, computing hardware research leading to deliberate or accidental creation of massive simulated suffering (aside from UFAI)? In addition, I thought you believed that faster economic growth made a good outcome less likely, which would imply that most scientific research is bad?
And even AGI research doesn’t end up making it less likely that AGI will be developed, please note—it’s not that perverse in its outcome.
Many AGI researchers seem to think that their research will result in a benevolent AGI, and I’m assuming you agree that their research does make it less likely that such an AGI will be eventually developed.
It seems odd to insist that someone explicitly working on benevolence should consider themselves to be in the same reference class as someone who thinks they just need to take care of the AGI and the benevolence will pretty much take care of itself.
I wasn’t intending to use “AGI researchers” as a reference class to show that Eliezer’s work is likely to have net negative consequences, but to show that people whose work can reasonably be expected to have net negative consequences (of whom AGI researchers is a prominent class) still tend not to believe such claims, and therefore Eliezer’s failure to be convinced is not of much evidential value to others.
The reference class I usually do have in mind when I think of Eliezer is philosophers who think they have the right answer to some philosophical problem (virtually all of whom end up being wrong or at least incomplete even if they are headed in the right direction).
ETA: I’ve written a post that expands on this comment.
“Cognitively distorted” people should lose. People who get stuff done should have their alternate thinking processes carefully examined to see how the divergence is more rational than the non-divergence.
It’s not clear to me why people who accurately model the world should outperform those who follow less cognitively demanding heuristics. I’ve seen this position stated as a truism during a debate, but have never read an argument for or against it. Would someone be able to link to an argument about following non-robust shortcuts to rationality, or write a short case against that practice?
It’s not clear to me why people who accurately model the world should outperform those who follow less cognitively demanding heuristics.
If the aforementioned analysis of thinking processes finds that the advantage comes from superior allocation of bounded computational resources then that would be an interesting finding and a sufficient explanation. In some cases the alternate heuristics may be worth adopting.
Is the common-sense expectation that non-robust heuristics deliver poor results in a wider subset of possible future environments than robust heuristics not adequate?
“Yeah, the middle point of my probability interval for a happy ending is very low, but the interval is large enough that its upper bound isn’t that low, so it’s worth my time and your money trying to reach a happy ending.”
Am I right?
feel free not to believe me about not multiplying small probabilities either.
I’m saying I don’t know how to estimate heroic probabilities. I do not know any evenhanded rules which assign ‘you can’t do that’ probability to humanity’s survival which would not, in the hands of the same people thinking the same way, rule out Google or Apple, and maybe those happened to other people, but the same rules would also say that I couldn’t do the 3-5 other lesser “impossibilities” I’ve done so far. Sure, those were much easier “impossibilities” but the point is that the sort of people who think you can’t build Friendly AI because I don’t have a good-enough hero license to something so high-status or because heroic epistemology allegedly doesn’t work in real life, would also claim all those other things couldn’t happen in real life, if asked without benefit of advance knowledge to predict the fate of Steve Wozniak or me personally; that’s what happens when you play the role of “realism”.
Overconfidence (including factual error about success rates) is pervasive in entrepreneurs, both the failures and successes (and the successes often fail on a second try, although they have somewhat better odds). The motivating power of overconfidence doesn’t mean the overconfidence is factually correct or that anyone else should believe it. And the mega-successes tended to look good in expected value, value of information, and the availability of good intermediate outcomes short of mega-success: there were non-crazy overconfident reasons to pursue them. The retreat to “heroic epistemology” rather than reasons is a bad sign relative to those successes, and in any case most of those invoking heroic epistemology style reasoning don’t achieve heroic feats.
Applying the outside view startup statistics, including data on entrepreneur characteristics like experience and success rates of repeat entrepreneurs is not magic or prohibitively difficult. Add in the judgments of top VCs to your model.
For individuals or area/firm experts, one can add in hard-to-honestly-signal data (watching out for overconfidence in various ways, using coworkers, etc). That model would have assigned a medium chance to pretty nice success for Apple, and maybe 1-in-100 to 1-in-1000 odds of enormous success, with reasonable expected value for young people willing to take risks and enthused about the field. And the huge Apple success came much later, after Jobs had left and returned and Wozniak was long gone.
Google started with smart people with characteristics predictive of startup success, who went in heavily only after they had an algorithm with high commercial value (which looked impressive to VCs and others). Their success could have been much smaller if their competitors had been more nimble.
And of course you picked them out after the fact, just like you pick out instances of scientists making false predictions rather than true ones in the history of technology. You need to reconcile your story with the experience of the top VCs and serial entrepreneurs, and the degree of selection we see in the characteristics of people at different levels of success (which indicate a major multiplicative role for luck, causing a big chunk of the variation on a log scale).
but the same rules would also say that I couldn’t do the 3-5 other lesser “impossibilities” I’ve done so far
You have some good feats, and failures too (which give us some outside view info to limit the probabilities for outsiders’ evaluation of your heroic epistemology). But the overall mix is not an outlier of success relative to, e.g. the reference class of other top talent search students with a skew towards verbal ability, unless you treat your choice of problem as such.
The motivating power of overconfidence doesn’t mean the overconfidence is factually correct or that anyone else should believe it.
Did I say that? No, I did not say that. You should know better than to think I would ever say that. Knowingly make an epistemic error? Say “X is false but I believe it is true”? Since we’re talking heroism anyway, Just who the hell do you think I am?
The retreat to “heroic epistemology”
Okay, so suppose we jump back 4 years and I’m saying that maybe I ought to write a Harry Potter fanfiction. And it’s going to be the most popular HP fanfiction on the whole Internet. And Mathematical Olympiad winners will read it and work for us. What does your nonheroic epistemology say? Because I simply don’t believe that (your) nonheroic epistemology gets it right. I don’t think it can discriminate between the possible impossible and the impossible impossible. It just throws up a uniform fog of “The outside view says it is nonvirtuous to try to distinguish within this reference class.”
I thought Quixey was doomed because the idea wasn’t good enough. Michael Vassar said that Quixey would succeed because Tomer Kagan would succeed at anything he tried to do. Michael Vassar was right (a judgment already finalized because Quixey has already gotten further than I thought was possible). This made me update on Michael Vassar’s ability to discriminate Tomer Kagans in advance from within a rather large reference class of people trying to be Tomer Kagan.
Knowingly make an epistemic error? Say “X is false but I believe it is true”?
That’s what the arguments you’ve given for this have mostly amounted to. You have said “I need to believe this to be motivated and do productive work” in response to questions about the probabilities in the past, while not giving solid reasons for the confidence.
Okay, so suppose we jump back 4 years and I’m saying that maybe I ought to write a Harry Potter fanfiction. And it’s going to be the most popular HP fanfiction on the whole Internet.
When did you predict that? Early on I did not hear you making such claims, with the tune changing after it became clear that demand for it was good.
4 years ago I did advocate getting Math Olympiad people, and said they could be gotten, and had empirical evidence of that from multiple angles. And I did recognize your writing and fiction were well-received, and had evidence from the reactions to “Staring into the Singularity” and OB/LW. You tried several methods, including the rationality book, distributing written rationality exercises/micromanaging CFAR content, and the fanfiction. Most of them wasted time and resources without producing results, and one succeeded.
And there is a larger context, that in addition to the successes you are highlighting the path includes: Flare, LOGI and associated research, pre-Vassar SI management issues, open-source singularity, commercial software, trying to create non-FAI before nanowars in 2015.
This made me update on Michael Vassar’s ability to discriminate Tomer Kagans in advance from within a rather large reference class of people trying to be Tomer Kagan.
Tomer is indeed pretty great, but I have heard Michael say things like that about a number of people and projects over the years. Most did not become like Quixey. And what’s the analogy here? That people with good ability to predict success in scientific research have indicated you will succeed taking into account how the world and the space of computer science and AI progress must be for that? That Michael has?
As a LessWrong reader, I notice that I am confused because this does not sound like something you would say, but I’m not sure I could explain the difference between this and “heroic epistemology.”
EDIT: for the benefit of other readers following the conversation, Eliezer gives a description of heroic epistemology here.
For the record, I don’t recall ever hearing you say something like this in my presence or online, and if somebody had told me in person that you had said this, I think I would’ve raised a skeptical eyebrow and said “Really? That doesn’t sound like something Eliezer would say. Have you read The Sequences?”
But also, I remain confused about the normative content of “heroic epistemology.”
Ask Anna about it, she was present on both occasions, at the tail end of the Singularity Summit workshop discussion in New York City, and at the roundtable meeting at the office with Anna, Paul, Luke, and Louie.
In a related vein, arguments like this are arguments that someone could do A, but not so much that you will do A (and B and C and...). My impression is of too many arguments like the former and enough of the latter. If you can remedy that, it would be great, but it is a fact about the responses I have seen.
Eliezer emailed me to ask me about it (per Carl’s request, above); I emailed him back with the email below, which Eliezer requested I paste into the LW thread. Pasting:
In the majority of cases, people do not straightforwardly say “X is false, but I need to believe X anyhow”. More often they wiggle, and polite conversation drops the subject.
You have made statements that I and at least some others interpreted as perhaps indicating such wiggles (i.e., as perhaps indicating a desire to hold onto false impressions by diverting conscious attention or discussion from a particular subject). You have never, to my knowledge, uttered a straight-forward endorsement of holding false beliefs. The wiggle-suggesting statements were not super-clear, and were not beyond the realm of possible misinterpretation.
Re: statements that seemed to me and some others to indicate possible wiggles: You have mentioned multiple times that, well, I forget, but something like, it’d be hard to do top focused research on FAI-like problems while estimating a 1% chance of success. You’ve also avoided giving probability estimates in a lot of contexts, and have sometimes discouraged conversations in which others did so. You seemed perhaps a bit upset or defensive at multiple points during the probability estimates conversation with Zvi in NYC (enough so that a couple people commented on it with surprise to me afterward (not Carl or Paul; I forget who; newcomers to our conversations)), but, to your credit, you commented on these difficulties and proceeded to use debiasing techniques (e.g., I think you might’ve mentioned leaving a line of retreat, and might’ve given yourself a minute to do so).
If you would like polite conversation not to drop the subject on future occasions on which your impressions look (to me and other non-telepathic observers) like they might possibly be wiggly, give me a straight-forward request and I can change my settings here. I have in fact been a bit afraid of disturbing your motivation, and also a bit afraid of reducing your good will toward me.
Michael Vassar might be another interesting one to probe, if you’re collecting opinions. Though note that Vassar, Carl, and I have all discussed this at least a bit, and so are not independent datapoints.
From my internal perspective, the truth-as-I-experience-it is that I’m annoyed when people raise the topic because it’s all wasted motion, the question sets up a trap that forces you into appearing arrogant, and I honestly think that “Screw all this, I’m just going to go ahead and do it and you can debate afterward what the probabilities were” is a perfectly reasonable response.
From the perspective of folks choosing between supporting multiple lines of AI risk reduction effort, of which MIRI is only one, such probability estimates are not wasted effort.
Though your point about appearing arrogant is well taken. It’s unfortunate that it isn’t socially okay to publicly estimate a high probability of success, or to publicly claim one’s own exceptionalism, when ones impressions point that way. It places a barrier toward honest conversation here.
From my internal perspective, the truth-as-I-experience-it is that I’m annoyed when people raise the topic [of MIRI’s success-odds] because [good reason].
I suspect this annoyance is easily misinterpreted, independent of its actual cause. Most humans respond with annoyance when their plans are criticized. Also, in situations where A has power over B, and where B then shares concerns or criticisms about A’s plans, and where A responds with annoyance or with avoidance of such conversation… B is apt to respond (as I did) by being a bit hesitant to bring the topic up, and by also wondering if A is being defensive.
I’m not saying I was correct here. I’m also not sure what the fix is. But it might be worth setting a 1-minute timer and brainstorming or something.
If you were anyone else, this is ordinarily the point where I tell you that I’m just going to ignore all this and go ahead do it, and then afterward you can explain why it was either predictable in retrospect or a fluke, according to your taste. Since it’s you: What’s the first next goal you think I can’t achieve, strongly enough that if I do it, you give up on non-heroic epistemology?
If you were anyone else, this is ordinarily the point where I tell you that I’m just going to ignore all this and go ahead do it
I’m familiar with this move. But you make it before failing too, so its evidentiary weight is limited, and insufficient for undertakings with low enough prior probability from all the other evidence besides the move.
What’s the first next goal you think I can’t achieve, strongly enough that if I do it, you give up on non-heroic epistemology?
I don’t buy the framing. The update would be mainly about you and the problem in question, not the applicability of statistics to reality.
Two developments in AI as big as Pearl’s causal networks (as judged by Norvig types) by a small MIRI team would be a limited subset of the problems to be solved by a project trying to build AGI with a different and very safe architecture before the rest of the world, and wouldn’t address the question of the probability that such is needed in the counterfactual, but it would cause me to stop complaining and would powerfully support the model that MIRI can be more productive than the rest of the AI field when currently-available objective indicators put it as a small portion of the quality-adjusted capacity.
If we want a predictor for success that’s a lot better than the vast majority of quite successful entrepreneurs and pathbreaking researchers, making numerous major basic science discoveries and putting them together in a way that saves the world, then we need some evidence to distinguish the team and explain why it will make greater scientific contributions than any other ever with high reliability in a limited time.
A lot of intermediate outcomes would multiply my credence in and thus valuation of the “dozen people race ahead of the rest of the world in AI” scenario, but just being as productive as von Neumann or Turing or Pearl or Einstein would not result in high probability of FAI success, so the evidence has to be substantial.
I’m familiar with this move. But you make it before failing too
Sure, you try, sometimes you lose, sometimes you win. On anti-heroic epistemology (non-virtuous to attempt to discriminate within an outside view) there shouldn’t be any impossible successes by anyone you know personally after you met them. They should only happen to other people selected post-facto by the media, or to people who you met because of their previous success.
I don’t buy the framing. The update would be mainly about you and the problem in question, not the applicability of statistics to reality.
We disagree about how to use statistics in order to get really actually correct answers. Having such a low estimate of my rationality that you think that I know what correct statistics are, and am refusing to use them, is not good news from an Aumann perspective and fails the ideological Turing Test. In any case, surely if my predictions are correct you should update your belief about good frameworks (see the reasoning used in the Pascal’s Muggle post) - to do otherwise and go on insisting that your framework was nonetheless correct would be oblivious.
Two developments in AI as big as Pearl’s causal networks (as judged by Norvig types)
...should not have been disclosed to the general world, since proof well short of this should suffice for sufficient funding (Bayes nets were huge), though they might be disclosed to some particular Norvig type on a trusted oversight committee if there were some kind of reason for the risk. Major breakthroughs on the F side of FAI are not likely to be regarded as being as exciting as AGI-useful work like Bayes nets, though they may be equally mathematically impressive or mathematically difficult. Is there some kind of validation which you think MIRI should not be able to achieve on non-heroic premises, such that the results should be disclosed to the general world?
EDIT: Reading through the rest of the comment more carefully, I’m not sure we estimate the same order of magnitude of work for what it takes to build FAI under mildly good background settings of hidden variables. The reason why I don’t think the mainstream can build FAI isn’t that FAI is intrinsically huge a la the Cyc hypothesis. The mainstream is pretty good at building huge straightforward things. I just expect them to run afoul of one of the many instakill gotchas because they’re one or two orders of magnitude underneath the finite level of caring required.
EDIT 2: Also, is there a level short of 2 gigantic breakthroughs which causes you to question non-heroic epistemology? The condition is sufficient, but is it necessary? Do you start to doubt the framework after one giant breakthrough (leaving aside the translation question for now)? If not, what probability would you assign to that, on your framework? Standard Bayesian Judo applies—if you would, as I see it, play the role of the skeptic, then you must either be overly-credulous-for-the-role that we can do heroic things like one giant breakthrough, or else give up your skepticism at an earlier signal than the second. For you cannot say that something is strongly prohibited on your model and yet also refuse to update much if it happens, and this applies to every event which might lie along the way. (Evenhanded application: ’Tis why I updated on Quixey instead of saying “Ah, but blah”; Quixey getting this far just wasn’t supposed to happen on my previous background theory, and shouldn’t have happened even if Vassar had praised ten people to me instead of two.)
On anti-heroic epistemology (non-virtuous to attempt to discriminate within an outside view) there shouldn’t be any impossible successes by anyone you know personally after you met them.
I don’t understand why you say this. Given Carl’s IQ and social circle (didn’t he used to work for a hedge fund run by Peter Thiel?) why would it be very surprising that someone he personally knows achieves your current level of success after he meets them?
They should only happen to other people selected post-facto by the media, or to people who you met because of their previous success.
Carl referenced “Staring Into the Singularity” as an early indicator of your extraordinary verbal abilities (which explains much if not all of your subsequent successes). It suggests that’s how you initially attracted his attention. The same is certainly true for me. I distinctly recall saying to myself “I should definitely keep track of this guy” when I read that, back in the extropian days. Is that enough for you to count as “people who you met because of their previous success”?
In any case, almost everyone who meets you now would count you as such. What arguments can you give to them that “heroic epistemology” is normative (and hence they are justified in donating to MIRI)?
To state my overall position on the topic being discussed, I think according to “non-heroic epistemology”, after someone achieves an “impossible success”, you update towards them being able to achieve further successes of roughly the same difficulty and in related fields that use similar skills, but the posterior probabilities of them solving much more difficult problems or in fields that use very different skills remain low (higher relative to the prior, but still low in an absolute sense). Given my understanding of the distribution of cognitive abilities in humans, I don’t see why I would ever “give up” this epistemology, unless you achieved a level of success that made me suspect that you’re an alien avatar or something.
In any case, almost everyone who meets you now would count you as such. What arguments can you give to them that “heroic epistemology” is normative (and hence they are justified in donating to MIRI)?
Yes, no matter how many impossible things you do, the next person you meet thinks that they only heard of you because of them, ergo selection bias. This is an interesting question purely on a philosophical level—it seems to me to have some of the flavor of quantum suicide experiments where you can’t communicate your evidence. In principle this shouldn’t happen without quantum suicide for logically omniscient entities who already know the exact fraction of people with various characteristics, i.e., agree on exact priors, but I think it might start happening again to people who are logically unsure about which framework they should use.
To avoid talking past one another: I agree that one can and should update on evidence beyond the most solid empirical reference classes in predicting success. If you mean to say that a majority of the variation on a log scale in success (e.g. in wealth or scientific productivity) can be accounted for with properties of individuals and their circumstances, beyond dumb luck then we can agree on that. Some of those characteristics are more easily observed, while others are harder to discern or almost unmeasurable from a distance so that track records may be our best way to discern them.
shouldn’t be any impossible successes by anyone you know personally after you met them.
That is to say, repeated successes should not be explained by luck, but by updating estimates of hard-to-observe characteristics and world model.
The distribution of successes and failures you have demonstrated is not “impossible” or driving a massive likelihood ratio given knowledge about your cognitive and verbal ability, behavioral evidence of initiative and personality, developed writing skill (discernible through inspection and data about its reception), and philosophical inclinations. Using measurable features, and some earlier behavioral or track record data one can generate reference classes with quite high levels of lifetime success, e.g. by slicing and dicing cohorts like this one. Updating on further successes delivers further improvements.
But updating on hidden characteristics does not suffer exponential penalties like chance explanations, and there is a lot of distance to cover in hidden characteristics before a 10% probability of MIRI-derived FAI (or some other causal channel) averting existential catastrophe that would have occurred absent MIRI looks reasonable.
Now large repeated updates about hidden characteristics still indicate serious model problems and should lead us to be very skeptical of those models. However, I don’t see such very large surprising updates thus far.
Sure, you try, sometimes you lose, sometimes you win.
If difficulty is homogenous (at least as far as one can discern in advance), then we can use these data straightforwardly, but a lot of probability will be peeled off relative to “Tomer must win.” And generalizing to much higher difficulty is still dubious for the reasons discussed above.
Having such a low estimate of my rationality that you think that I know what correct statistics are, and am refusing to use them, is not good news from an Aumann perspective and fails the ideological Turing Test.
This is not what I meant. I didn’t claim you would explicitly endorse a contradiction formally. But nonetheless, the impression I got was of questions about probability met with troubling responses like talking about the state of mind you need for work and wanting to not think in terms of probabilities of success for your own work. That seems a bad signal because of the absence of good responses, and the suggestion that the estimates may not be the result of very much thought, or may be unduly affected by their emotional valence, without ever saying “p and not p.”
Is there some kind of validation which you think MIRI should not be able to achieve on non-heroic premises, such that the results should be disclosed to the general world?...should not have been disclosed to the general world, since proof well short of this should suffice for sufficient funding (Bayes nets were huge)
As I said elsewhere a 10% probability of counterfactually saving the world is far above the threshold for action. One won’t get to high confidence in that low prior claim without extraordinary evidence, but the value of pursuing it increases continuously with intermediate levels of evidence. Some examples would be successful AI researchers coming to workshops and regularly saying that the quality and productivity of the research group and process was orders of magnitude more productive, the results very solid, etc. This is one of the reasons I like the workshop path, because it exposes the thesis to empirical feedback.
The reason why I don’t think the mainstream can build FAI isn’t that FAI is intrinsically huge a la the Cyc hypothesis. The mainstream is pretty good at building huge straightforward things.
Although as we have discussed with AI folk, there are also smart AI people who would like to find nice clean powerful algorithms with huge practical utility without significant additional work.
I just expect them to run afoul of one of the many instakill gotchas because they’re one or two orders of magnitude underneath the finite level of caring required.
Yes, we do still have disagreements about many of the factual questions that feed into a probability estimate, and if I adopted your view on all of those except MIRI productivity there would be much less of a gap. There are many distinct issues going into the estimation of a probability of your success, from AGI difficulty, to FAI difficulty, to the competence of regular AI people and governance institutions, the productivity of a small MIRI team, the productivity of the rest of the world, signs of AI being close, reactions to those signs, and others.
There are a number of connections between these variables, but even accounting for that your opinions are systematically firmly in the direction of greater personal impact relative to the analyses of others, and the clustering seems tighter than is typical (others seem to vary more, sometimes evaluating different subissues as pointing in different directions). This shows up in attempts to work through the issues for estimation, as at that meeting with Paul et al.
One can apply a bias theory to myself and Paul Christiano and Nick Bostrom and the FHI surveys of AI experts are biased towards normalcy, respectability and conservatism. But I would still question the coincidence of so many substantially-independent variables landing in the same direction, and uncertainty over the pieces hurts the hypothesis that MIRI has, say, a 10% probability of averting a counterfactual existential catastrophe disproportionately.
And it is possible that you have become a superb predictor of such variables in the last 10 years (setting aside earlier poor predictions), and I could and would update on good technological and geopolitical prediction in DAGGRE or the like.
Thanks for talking this out, and let me reiterate that in my expectation your and MIRI’s existence (relative to the counterfactual in which it never existed and you become a science fiction writer) has been a good thing and reduced my expectation for existential risk.
The distribution of successes and failures you have demonstrated is not “impossible” or driving a massive likelihood ratio given knowledge about your cognitive and verbal ability, behavioral evidence of initiative and personality, developed writing skill (discernible through inspection and data about its reception), and philosophical inclinations
Of course I expect you to say that, since to say otherwise given your previous statements is equivalent to being openly incoherent and I do not regard you so lowly. But I don’t yet believe that you would actually have accepted or predicted those successes ante facto, vs. claiming ante facto that those successes were unlikely and that trying was overconfident. Which is why I repeat my question: What is the least impossible thing I could do next, where anything up to that is permitted by your model so it’s equivalent to affirming that you think I might be able to do it, and anything beyond that was prohibited by your model so it’s time to notice your confusion? I mean, if you think I can make one major AI breakthrough but not two, that’s already a lot of confidence in me… is that really what your outside view would say about me?
But nonetheless, you have returned questions about probability with troubling responses like talking about the state of mind you need for work and wanting to not think in terms of probabilities of success for your own work.
Please distinguish between the disputed reality and your personal memory, unless you’re defining the above so broadly (and uncharitably!) that my ‘wasted motion’ FB post counts as an instance.
Although as we have discussed with AI folk, there are also smart AI people who would like to find nice clean powerful algorithms with huge practical utility without significant additional work.
Without significant work? I don’t think I can do that. Why would you think I thought I could do that?
And it is possible that you have become a superb predictor of such variables in the last 10 years (setting aside earlier poor predictions), and I could and would update on good technological and geopolitical prediction in DAGGRE or the like.
If enough people agreed on that and DAGGRE could be done with relatively low effort on my part, I would do so, though I think I’d want at least some people committing in writing to large donations given success because it would be a large time commitment and I’m prior-skeptical that people know or are honest about their own reasons for disagreement; and I would expect the next batch of pessimists to write off the DAGGRE results (i.e., claim it already compatible with my known properties) so there’d be no long-term benefit. Still, 8 out of 8 on 80K’s “Look how bad your common sense is!” test, plus I recall getting 9 out of 10 questions correct the last time I was asked for 90% probabilities on a CFAR calibration test, so it’s possible I’ve already outrun the reference class of people who are bad at this.
Though if it’s mostly geopolitical questions where the correct output is “I know I don’t know much about this” modulo some surface scans of which other experts are talking sense, I wouldn’t necessarily expect to outperform the better groups that have already read up on cognitive rationality and done a few calibration exercises.
Which is why I repeat my question: What is the least impossible thing I could do next, where anything up to that is permitted by your model so it’s equivalent to affirming that you think I might be able to do it, and anything beyond that was prohibited by your model so it’s time to notice your confusion?
So, if von Neumann came out with similar FAI claims, but couldn’t present compelling arguments to his peers (if not to exact agreement, perhaps within an order of magnitude) I wouldn’t believe him. So showing that, e.g. your math problem-solving ability is greater than my point estimate, wouldn’t be very relevant. Shocking achievements would lead me to upgrade my estimate of your potential contribution going forward (although most of the work in an FAI team would be done by others in any case), resolving uncertainty about ability, but that would not be enough as such, it would have to be the effect on my estimates of your predictive model.
I would make predictions on evaluations of MIRI workshop research outputs by a properly constructed jury of AI people. If the MIRI workshops were many times more productive than comparably or better credentialed AI people according to independent expert judges (blinded to the extent possible) I would say my model was badly wrong, but I don’t think you would predict a win on that.
To avoid “too much work to do/prep for” and “disagreement about far future consequences of mundane predicted intermediates” you could give me a list of things that you or MIRI plan to attempt over the next 1, 3, and 5 years and I could pick one (with some effort to make it more precise).
DAGGRE...etc
Yes, I have seen you writing about the 80k quiz on LW and 80k and elsewhere, it’s good (although as you mention, test-taking skills went far on it). I predict that if we take an unbiased sample of people with similarly high cognitive test scores, extensive exposure to machine learning, and good career success (drawn from academia and tech/quant finance, say), and look at the top scorers on the 80k quiz and similar, their estimates for MIRI success will quite a bit closer to mine than yours. Do you disagree? Otherwise, I would want to see drastic outperformance relative to such a group on a higher-ceiling version (although this would be confounded by advance notice and the opportunity to study/prepare).
DAGGRE is going into the area of technology, not just geopolitics. Unfortunately it is mostly short term stuff, not long-term basic science, or subtle properties of future tech, so the generalization is imperfect. Also, would you predict exceptional success in predicting short-medium term technological developments?
So, if von Neumann came out with similar FAI claims...
...showing that, e.g. your math problem-solving ability is greater than my point estimate, wouldn’t be very relevant.
The question is not what convinces you that I can do FAI within the framework of your antiheroic epistemology. The question is what first and earliest shows that your antiheroic epistemology is yielding bad predictions. Is this a terrible question to ask for some reason? You’ve substituted an alternate question a couple of times now.
Also, would you predict exceptional success in predicting short-medium term technological developments?
From my perspective, you just asked how bad other people are at predicting such developments. The answer is that I don’t know. Certainly many bloggers are terrible at it. I don’t suppose you can give a quick example of a DAGGRE question?
The question is not what convinces you that I can do FAI within the framework of your antiheroic epistemology.
The question is what first and earliest shows that your antiheroic epistemology is yielding bad predictions
Which I said in the very same paragraph.
Is this a terrible question to ask for some reason? You’ve substituted an alternate question a couple of times now.
I already gave the example of independent judges evaluating MIRI workshop output, among others. If we make the details precise, I can set the threshold on the measure. Or we can take any number of other metrics with approximately continuous outputs where I can draw a line. But it takes work to define a metric precise enough to be solid, and I don’t want to waste my time generating more and more additional examples or making them ultra-precise without feedback on what you will actually stake a claim on.
I can’t determine what’s next without knowledge of what you’ll do or try.
I don’t suppose you can give a quick example of a DAGGRE question?
To clear up the ambiguity, does this mean you agree that I can do anything short of what von Neumann did, or that you don’t think it’s possible to get as far as independent judges favorably evaluating MIRI output, or is there some other standard you have in mind? I’m trying to get something clearly falsifiable, but right now I can’t figure out the intended event due to sheer linguistic ambiguity.
I also think that evaluation by academics is a terrible test for things that don’t come with blatant overwhwelming unmistakable undeniable-even-to-humans evidence—e.g. this standard would fail MWI, molecular nanotechnology, cryonics, and would have recently failed ‘high-carb diets are not necessarily good for you’. I don’t particularly expect this standard to be met before the end of the world, and it wouldn’t be necessary to meet it either.
To clear up the ambiguity, does this mean you agree that I can do anything short of what von Neumann did
As I said in my other comment, I would be quite surprised if your individual mathematical and AI contributions reach the levels of the best in their fields, as you are stronger verbally than mathematically, and discuss in more detail what I would find surprising and not there.
I also think that evaluation by academics is a terrible test for things that don’t come with blatant overwhwelming unmistakable undeniable-even-to-humans evidence—e.g. this standard would fail MWI, molecular nanotechnology, cryonics, and would have recently failed ‘high-carb diets are not necessarily good for you’.
I recently talked to Drexler about nanotechnology in Oxford. Nanotechnology is
Way behind Drexler’s schedule, and even accounting for there being far less funding and focused research than he expected, the timeline skeptics get significant vindication
Was said by the NAS panel to be possible, with no decisive physical or chemical arguments against (and discussion of some uncertainties which would not much change the overall picture, in any case), and arguments against tend to be or turn into timeline skepticism and skepticism about the utility of research
Has not been the subject of a more detailed report or expert judgment test than the National Academy of Sciences one (which said it’s possible) because Drexler was not on the ball and never tried. He is currently working with the FHI to get a panel of independent eminent physicists and chemists to work it over, and expects them to be convinced.
Tomer is indeed pretty great, but I have heard Michael say things like that about a number of people and projects over the years. Most did not become like Quixey.
Also, while it seems to me that Michael should have said this about many people, I have not actually heard him say this about many people, to me, except Alyssa Vance.
I don’t think it can discriminate between the possible impossible and the impossible impossible. It just throws up a uniform fog of “The outside view says it is nonvirtuous to try to distinguish within this reference class.”
This seems to be usually accounted for by value of information, you should do some unproven things primarily in order to figure out if something like that is possible (or why not, in more detail), before you know it to be possible. If something does turn out to be possible, you just keep on doing it, so that the primary motivation changes without the activity itself changing.
(One characteristic of doing something for its value of information as opposed to its expected utility seems to be the expectation of having to drop it when it’s not working out. If something has high expected utility a priori, continuing to do it despite it not working won’t be as damaging (a priori), even though there is no reason to act this way.)
continuing to do it despite it not working won’t be as damaging (a priori)
Not sure I understood this—are you saying that the expected damage caused by continuing to do it despite it not working is less just because the probability that it won’t work is less?
Ok, but that doesn’t increase the probability to ‘medium’ from the very low initial probability of MIRI or another organization benefiting from MIRI’s work solving the extremely hard problem of Friendly AI before anyone else screws it up.
I’ve read all your posts in the threads linked by the OP, and if multiplying the high beneficial impact of Friendly AI by the low probability of success isn’t allowed, I honestly don’t see why I should donate to MIRI.
If this was a regular math problem and it wasn’t world-shakingly important, why wouldn’t you expect that funding workshops and then researchers would cause progress on it?
Assigning a very low probability to progress rests on a sort of backwards reasoning wherein you expect it to be difficult to do things because they are important. The universe contains no such rule. They’re just things.
It’s hard to add a significant marginal fractional pull to a rope that many other people are pulling on. But this is not a well-tugged rope!
I’m not assigning a low probability to progress, I’m assigning a low probability to success.
Where FAI research is concerned, progress is only relevant in as much as it increases the probability of success, right?
Unlike a regular math problem, you’ve only got one shot at getting it right, and you’re in a race with other researchers who are working on an easier problem (seed AI, Friendly or not). It doesn’t matter if you’re 80% of the way there if we all die first.
Edited to add and clarify: Even accounting for the progress I think you’re likely to make, the probability of success remains low, and that’s what I care about.
Clarifying question: What do you think is MIRI’s probability of having been valuable, conditioned on a nice intergalactic future being true?
Pretty high. More than 10%, definitely. Maybe 50%?
A non-exhaustive list of some reasons why I strongly disagree with this combination of views:
AI which is not vastly superhuman can be restrained from crime, because humans can be so restrained, and with AI designers have the benefits of the ability to alter the mind’s parameters (desires, intuitions, capability for action, duration of extended thought, etc) inhibitions, test copies in detail, read out its internal states, and so on, making the problem vastly easier (although control may need to be tight if one is holding back an intelligence explosion while this is going on)
If 10-50 humans can solve AI safety (and build AGI!) in less than 50 years, then 100-500 not very superhuman AIs at 1200x speedup should be able to do so in less than a month
There are a variety of mechanisms by which humans could monitor, test, and verify the work conducted by such systems
The AIs can also work on incremental improvements to the control mechanisms being used initially, with steady progress allowing greater AI capabilities to develop better safety measures, until one approaches perfect safety
If a small group can solve all the relevant problems over a few decades, then probably a large portion of the AI community (and beyond) can solve the problems in a fraction of the time if mobilized
As AI becomes visibly closer such mobilization becomes more likely
Developments in other fields may make things much easier: better forecasting, cognitive enhancement, global governance, brain emulations coming first, global peace/governance
The broad shape of AI risk is known and considered much more widely than MIRI: people like Bill Gates and Peter Norvig consider it, but think that acting on it now is premature; if they saw AGI as close, or were creating it themselves, they would attend to the control problems
Paul Christiano, and now you, have started using the phrase “AI control problems”. I’ve gone along with it in my discussions with Paul, but before many people start adopting it maybe we ought to talk about whether it makes sense to frame the problem that way (as opposed to “Friendly AI”). I see a number of problems with it:
Control != Safe or Friendly. An AI can be perfectly controlled by a human and be extremely dangerous, because most humans aren’t very altruistic or rational.
The framing implicitly suggests (and you also explicitly suggest) that the control problem can be solved incrementally. But I think we have reason to believe this is not the case, that in short “safety for superintelligent AIs” = “solving philosophy/metaphilosophy” which can’t be done by “incremental improvements to the control mechanisms being used initially”.
“Control” suggests that the problem falls in the realm of engineering (i.e., belongs to the reference class of “control problems” in engineering, such as “aircraft flight control”), whereas, again, I think the real problem is one of philosophy (plus lots of engineering as well of course, but philosophy is where most of the difficulty lies). This makes a big difference in trying to predict the success of various potential attempts to solve the problem, and I’m concerned that people will underestimate the difficulty of the problem or overestimate the degree to which it’s parallelizable or generally amenable to scaling with financial/human resources, if the problem becomes known as “AI control”.
Do you disagree with this, on either the terminological issue (“AI control” suggests “incremental engineering problem”) or the substantive issue (the actual problem we face is more like philosophy than engineering)? If the latter, I’m surprised not to have seen you talk about your views on this topic earlier, unless you did and I missed it?
Thanks for those thoughts.
Nick Bostrom uses the term in his book, and it’s convenient for separating out pre-existing problems with “we don’t know what to do with our society long term, nor is it engineered to achieve that” and the particular issues raised by AI.
In the situation I mentioned, not vastly superintelligent initially (and capabilities can vary along multiple dimensions, e.g. one can have many compartmentalized copies of an AI system that collectively deliver a huge number of worker-years without any one of them possessing extraordinary capabilities.
What is your take on the strategy-swallowing point: if humans can do it, then not very superintelligent AIs can.
There is an ambiguity there. I’ll mention it to Nick. But, e.g. Friendliness just sounds silly. I use “safe” too, but safety can be achieved just by limiting capabilities, which doesn’t reflect the desire to realize the benefits.
It’s easy to imagine AIXI-like Bayesian EU maximizers that are powerful optimizers but incapable of solving philosophical problems like consciousness, decision theory, and foundations of mathematics, which seem to be necessary in order to build an FAI. It’s possible that that’s wrong, that one can’t actually get to “not very superintelligent AIs” unless they possessed the same level of philosophical ability that humans have, but it certainly doesn’t seem safe to assume this.
BTW, what does “strategy-swallowing” mean? Just “strategically relevant”, or more than that?
I suggested “optimal AI” to Luke earlier, but he didn’t like that. Here are some more options to replace “Friendly AI” with: human-optimal AI, normative AI (rename what I called “normative AI” in this post to something else), AI normativity. It would be interesting and useful to know what options Eliezer considered and discarded before settling on “Friendly AI”, and what options Nick considered and discarded before settling on “AI control”.
(I wonder why Nick doesn’t like to blog. It seems like he’d want to run at least some of the more novel or potentially controversial ideas in his book by a wider audience, before committing them permanently to print.)
Such systems, hemmed in and restrained, could certainly work on better AI designs, and predict human philosophical judgments. Predicting human philosophical judgments accurately and reporting those predictions is close enough.
“Control problem.”
He circulates them to reviewers, in wider circles as the book becomes more developed. And blogging half-finished idea on the internet is exactly what one shouldn’t do if one is worried about committing controversial ideas to print.
In case this is why you don’t tend to talk about your ideas in public either, except in terse (and sometimes cryptic) comments or in fully polished papers, I wanted to note that I’ve never had a cause to regret blogging (or posting to mailing lists) any of my half-finished ideas. As long as your signal to noise ratio is fairly high, people will remember the stuff you get right and forget the stuff you get wrong. The problem I see with committing ideas to print (as in physical books) is that books don’t come with comments attached pointing out all the parts that are wrong or questionable.
If such a system is powerful enough to predict human philosophical judgments using its general intelligence, without specifically having been programmed with a correct solution for metaphilosophy, it seems very likely that it would already be strongly superintelligent in many other fields, and hence highly dangerous.
(Since you seem to state this confidently but don’t give much detail, I wonder if you’ve discussed the idea elsewhere at greater length. For example I’m assuming that you’d ask the AI to answer questions like “What would Eliezer conclude about second-order logic after thinking undisturbed about it for 100 years?” but maybe you have something else in mind?)
I guess I actually meant “potentially wrong” rather than “controversial”, and I was suggesting that he blog about them after privately circulating to reviewers, but before publishing in print.
The thought is much more bite-sized and tractable questions to work with less individually capable systems (with shorter time horizons, etc) like: “find a machine-checkable proof of this lemma” or “I am going to read one of these 10 papers to try to shed light on my problem using random selection, score each with the predicted rating I will give the paper’s usefulness after reading it.” I discussed this in a presentation at the FHI (focused on WBE, where the issue of unbalanced abilities relative to humans does not apply), and the concepts will be discussed in Nick’s book.
Based on the two examples you give, which seem to suggest a workflow with a substantial portion still being done by humans (perhaps even the majority of the work in the case of the more philosophical parts of the problem), I don’t see how you’d arrive at this earlier conclusion:
Do you have any materials from the FHI presentation or any other writeup that you can share, that might shed more light? If not, I guess I can wait for the book...
It’s hard to discuss your specific proposal without understanding it in more detail, but in general I worry that the kind of AI you suggest would be much better at helping to improve AI capability than at helping to solve Friendliness, since solving technical problems is likely to be more of a strength for such an AI than predicting human philosophical judgments, and unless humanity develops much better coordination abilities than it has now (so that everyone can agree or be forced to refrain from trying to develop strongly superintelligent AIs until the Friendliness problem is solved), such an AI isn’t likely to ultimately contribute to a positive outcome.
Yes, the range of follow-up examples there was a bit too narrow, I was starting from the other end and working back. Smaller operations could be chained, parallelized (with limited thinking time and capacity per unit), used to check on each other in tandem with random human monitoring and processing, and otherwise leveraged to minimize the human bottleneck element.
A strong skew of abilities away from those directly useful for Friendliness development makes things worse, but leaves a lot of options. Solving technical problems can let you work to, e.g.
Create AIs with ability distributions directed more towards “philosophical” problems
Create AIs with simple sensory utility functions that are easier to ‘domesticate’ (short time horizons, satiability, dependency on irreplaceable cryptographic rewards that only the human creators can provide, etc)
Solve the technical problems of making a working brain emulation model
Create software to better detect and block unintended behavior,
Yes, that’s the biggest challenge for such bootstrapping approaches, which depends on the speedup in safety development one gets out of early models, the degree of international peace and cooperation, and so forth.
This strikes me as quite risky, as the amount of human monitoring has to be really minimal in order to solve a 50-year problem in 1 month, and earlier experiences with slower and less capable AIs seem unlikely to adequately prepare the human designers to come up with fully robust control schemes, especially if you are talking about a time scale of months. Can you say a bit more about the conditions you envision where this proposal would be expected to make a positive impact? It seems to me like it might be a very narrow range of conditions. For example if the degree of international peace and cooperation is very high, then a better alternative may be an international agreement to develop WBE tech while delaying AGI, or an international team to take as much time as needed to build FAI while delaying other forms of AGI.
I tend to think that such high degrees of global coordination are implausible, and therefore put most of my hope in scenarios where some group manages to obtain a large tech lead over the rest of the world and are thereby granted a measure of strategic initiative in choosing how best to navigate the intelligence explosion. Your proposal might be useful in such a scenario, if other seemingly safer alternatives (like going for WBE, or having genetically enhanced humans build FAI with minimal AGI assistance) are out of reach due to time or resource constraints. It’s still unclear to me why you called your point “strategy-swallowing” though, or what that phrase means exactly. Can you please explain?
I certainly didn’t say that would be risk-free, but it interacts with other drag factors on very high estimates of risk. In the full-length discussion of it, I pair it with discussion of historical lags in tech development between leader and follower in technological arms races (longer than one month) and factors relative to corporate and international espionage, raise the possibility of global coordination (or at least between the leader and next closest follower), and so on.
It also interacts with technical achievements in producing ‘domesticity’ short of exact unity of will.
When strategy A to a large extent can capture the impacts of strategy B.
If you’re making the point as part of an argument against “either Eliezer’s FAI plan succeeds, or the world dies” then ok, that makes sense. ETA: But it seems like it would be very easy to take “if humans can do it, then not very superintelligent AIs can” out of context, so I’d suggest some other way of making this point.
Sorry, I’m still not getting it. What does “impacts of strategy” mean here?
I don’t think that separation is a good idea. Not knowing what to do with our society long term is a relatively tolerable problem until an upcoming change raises a significant prospect of locking-in some particular vision of society’s future. (Wei-Dai raises similar points in your exchange of replies, but I thought this framing might still be helpful.)
If we are talking about goal definition evaluating AI (and Paul was probably thinking in the context of some sort of indirect normativity), “control” seems like a reasonable fit. The primary philosophical issue for that part of the problem is decision theory.
(I agree that it’s a bad term for referring to FAI itself, if we don’t presuppose a method of solution that is not Friendliness-specific.)
Not that it should be used to dismiss any of your arguments, but reading your other comments in this thread I thought you must be playing devil’s advocate. Your phrasing here seems to preclude that possibility.
If you are so strongly convinced that while AGI is a non-negligible x-risk, MIRI will probably turn out to have been without value even if a good AGI outcome were to be eventually achieved, why are you a research fellow there?
I’m puzzled. Let’s consider an edge case: even if MIRI’s factual research turned out to be strictly non-contributing to an eventual solution, there’s no reasonable doubt that it has raised awareness of the issue significantly (in relative terms).
Would the current situation with the CSER or FHI be unchanged or better if MIRI had never existed? Do you think those have a good chance of being valuable in bringing about a good outcome? Answering ‘no’ to the former and ‘yes’ to the latter would transitively imply that MIRI is valuable as well.
I.e. that alone—nevermind actual research contributions—would make it valuable in hindsight, given an eventual positive outcome. Yet you’re strongly opposed to that view?
The “combination of views” includes both high probability of doom, and quite high probability of MIRI making the counterfactual difference given survival. The points I listed address both.
I think MIRI’s expected impact is positive and worthwhile. I’m glad that it exists, and that it and Eliezer specifically have made the contributions they have relative to a world in which they never existed. A small share of the value of the AI safety cause can be quite great. That is quite consistent with thinking that “medium probability” is a big overestimate for MIRI making the counterfactual difference, or that civilization is almost certainly doomed from AI risk otherwise.
Lots of interventions are worthwhile even if a given organization working on them is unlikely to make the counterfactual difference. Most research labs working on malaria vaccines won’t invent one, most political activists won’t achieve big increases in foreign aid or immigration levels or swing an election, most counterproliferation expenditures won’t avert nuclear war, asteroid tracking was known ex ante to be far more likely to discover we were safe than that there was an asteroid on its way and ready to be stopped by a space mission.
The threshold for an x-risk charity of moderate scale to be worth funding is not a 10% chance of literally counterfactually saving the world from existential catastrophe. Annual world GDP is $80,000,000,000,000, and wealth including human capital and the like will be in the quadrillions of dollars. A 10% chance of averting x-risk would be worth trillions of present dollars.
We’ve spent tens of billions of dollars on nuclear and bio risks, and even $100,000,000+ on asteroids (averting dinosaur-killer risk on the order of 1 in 100,000,000 per annum). At that exchange rate again a 10% x-risk impact would be worth trillions of dollars, and governments and philanthropists have shown that they are ready to spend on x-risk or GCR opportunities far, far less likely to make a counterfactual difference than 10%.
I see. We just used different thresholds for valuable, you used “high probability of MIRI making the counterfactual difference given survival”, while for me just e.g. speeding Norvig/Gates/whoever a couple years along the path until they devote efforts to FAI would be valuable, even if it were unlikely to Make The Difference (tm).
Whoever would turn out to have solved the problem, it’s unlikely that their AI safety evaluation process (“Should I do this thing?”) would work in a strict vacuum, i.e. whoever will one day have evaluated the topic and made up their mind to Save The World will be highly likely to have stumbled upon MIRI’s foundational work. Given that at least some of the steps in solving the problem are likely to be quite serial (sequential) in nature, the expected scenario would be that MIRI’s legacy would at least provide some speed-up; a contribution which, again, I’d call valuable, even if it were unlikely to make or break the future.
If the Gates Foundation had someone evaluate the evidence for AI-related x-risk right now, you probably wouldn’t expect MIRI research, AI researcher polls, philosophical essays etc. to be wholly disregarded.
I used that threshold because the numbers being thrown around in the thread were along those lines, and are needed for the “medium probability” referred to in the OP. So counterfactual impact of MIRI never having existed on x-risk is the main measure under discussion here. I erred in quoting your sentence in a way that might have made that hard to interpret.
That’s right, and one reason that I think that MIRI’s existence has reduced expected x-risk, although by less than a 10% probability.
Sorry hard to tell from the thread which combination of views. Eliezer’s?
The view presented by Furcas, of probable doom, and “[m]ore than 10%, definitely. Maybe 50%” probability that MIRI will be valuable given the avoidance of doom, which in the context of existential risk seems to mean averting the risk.
...um.
It seems to me that if I believed what I infer you believe, I would be donating to MIRI while frantically trying to figure out some way to have my doomed world actually be saved.
Why? You (and everybody else) will almost certainly fail anyway, and you say I shouldn’t multiply this low probability by the utility of saving the world.
The only way I see is what MIRI is doing.
Edited to add: While this is interesting, what I was really asking in my first post is, if you think the odds of MIRI succeeding are not low, why do you think so?
Because sometimes the impossible can be done, and I don’t know how to estimate the probability of that. What would you have estimated in advance, without knowing the result, was the chance of success for the AI-Box Experiment? How about if I told you that I was going to write the most popular Harry Potter fanfiction in the world and use it to recruit International Mathematical Olympiad medalists? There may be true impossibilities in this world. Eternal life may be one such, if the character of physical law is what is it appears to be, to our sorrow. I do not think that FAI is one of those. So I am going to try. We can work out what the probability of success was after we have succeeded. The chance which is gained is not gained by turning away or by despair, but by continuing to engage with and attack the problem, watching for opportunities and constantly advancing.
If you don’t believe me about that aspect of heroic epistemology, feel free not to believe me about not multiplying small probabilities either.
Could you give a more precise statement of what this is supposed to entail?
Not easily. Antiantiheroic epistemology might be a better term, i.e., I think that a merely accurate epistemology doesn’t have a built-in mechanism which prevents people from thinking they can do things because the outside view says it’s nonvirtuous to try to distinguish yourself within reference class blah. Antiantiheroic epistemology doesn’t say that it’s possible to distinguish yourself within reference class blah so much as it thinks that the whole issue is asking the wrong question and you should mostly be worrying about staying engaged with the object-level problem because this is how you learn more and gain the ability to take opportunities as they arrive. An antiheroic epistemology that throws up some reference class or other saying this is impossible will regard you as trying to distinguish yourself within this reference class, but this is not what the antiantiheroic epistemology is actually about; that’s an external indictment of nonvirtuosity arrived at by additional modus ponens to conclusions on which antiantiheroic epistemology sees no reason to expend cognitive effort.
Obviously from my perspective non-antiheroic epistemology cancels out to mere epistemology, simpler for the lack of all this outside-view-social-modesty wasted motion, but to just go around telling you “That’s not how epistemology works, of course!” would be presuming a known standard which is logically rude (I think you are doing this, though not too flagrantly).
An archetypal example of antiantiheroic epistemology is Harry in Methods of Rationality, who never bothers to think about any of this reference class stuff or whether he’s being immodest, just his object-level problems in taking over the universe, except once when Hermione challenges him on it and Harry manages to do one thing a normal wizard can’t. Harry doesn’t try to convince himself of anything along those lines, or think about it without Hermione’s prompting. It just isn’t something that occurs to him might be a useful thought process.
I don’t think it’s a useful thought process either, and rationalizing elaborate reasons why I’m allowed to be a hero wouldn’t be useful either (Occam’s Imaginary Razor: decorating my thought processes with supportive tinsel will just slow down any changes I need to make), which is why I tend to be annoyed by the entire subject and wish people would get back to the object level instead of meta demands for modesty that come with no useful policy suggestions about ways to do anything better. Tell me a better object-level way to save the world and we can talk about my doing that instead.
“Antiantiheroic epistemology might be a better term, i.e., I think that a merely accurate epistemology doesn’t have a built-in mechanism which prevents people from thinking they can do things because the outside view says it’s nonvirtuous to try to distinguish yourself within reference class blah. ”
Taken literally, I can’t possibly disagree with this, but it doesn’t seem to answer my question, which is “where is the positive evidence that one is not supposed to ignore.” I favor combining many different kinds of evidence, including sparse data. And that can and does lead to very high expectations for particular individuals.
For example, several of my college fraternity brothers are now billionaires. Before facebook Mark Zuckerberg was clearly the person with the highest entrepreneurial potential that I knew, based on his intelligence, motivation, ambition, and past achievements in programming, business, and academics. People described him to me as resembling a young Bill Gates. His estimated expected future wealth based on that data if pursuing entrepreneurship, and informed by the data about the relationship of all of the characteristics I could track with it, was in the 9-figure range. Then add in that facebook was a very promising startup (I did some market sizing estimates for it, and people who looked at it and its early results were reliably impressed).
Moving from entrepreneurship to politics, one can predict success to a remarkable degree with evidence like “Eton graduate, Oxford PPE graduate with first class degree, Oxford Union leader, interested in politics, starts in an entry-level political adviser job with a party.” See this post or this paper. Most of the distance in log odds to reliably becoming Prime Minister, let alone Member of Cabinet or Parliament, can be crossed with objective indicators. Throw in a bit more data about early progress, media mentions, and the like and the prediction improves still more.
I would then throw in other evidence, like the impressiveness of the person’s public speaking relative to other similar people, their number and influence of friends and contacts in high places relative to other similar types (indicating both social capital, and skill at getting more), which could improve or worsen the picture. There is still a sizable chunk of randomness in log terms, as political careers are buffeted by switches in party control, the economy, the rise and fall of factions that carry members with them, and other hard-to-control factors at many stages. So I can and do come to expect that someone will probably get federal political office, and have a good shot at Cabinet, and less so for PM. But within the real distribution of characteristics I won’t be convinced that a young person will probably become PM, which would require almost zero noise.
In science I can be convinced a young star is a good prospect for Nobel or Fields medal caliber work. But I would need stronger evidence than we have seen for anyone to expect that they would do this 10 times (since no one has done so). I am sympathetic to Wei Dai’s comment
I would be quite surprised to see you reliably making personal mathematical contributions at the level of the best top math and AI people. I would not be surprised to see MIRI workshop participants making progress on the problems at a level consistent with the prior evidence of their ability, and somewhat higher per unit time because workshops harvest ideas generated over a longer period, are solely dedicated to research, have a lot of collaboration and cross-fertilization, and may benefit from improved motivation and some nice hacking of associated productivity variables. And I would not be surprised at a somewhat higher than typical rate of interesting (to me, etc) results because of looking at strange problems.
I would be surprised if the strange problems systematically deliver relatively huge gains on actual AI problems (and this research line is supposed to deliver AGI as a subset of FAI before others get AGI so it must have great utility in AGI design), i.e. if the strange problems are super-promising by the criteria that Pearl or Hinton or Ng or Norvig are using but neglected by blunder. I would be surprised if the distance to AGI is crossable in 20 years.
You are asking other people for their money and time, when they have other opportunities. To do that they need an estimate of the chance of MIRI succeeding, considering things like AI timelines, the speed of takeoff given powerful AI, competence of other institutions, the usefulness of MIRI’s research track, the feasibility of all alternative solutions to AI risk/AI control problems, how much MIRI-type research will be duplicated by researchers interested for other reasons over what timescales, and many other factors including the ability to execute given the difficulty of the problems and likelihood of relevance. So they need adequate object-level arguments about those contributing factors, or some extraordinary evidence to trust your estimates of all of them over the estimates of others without a clear object-level case. Some of the other opportunities available to them that they need to compare against MIRI:
Build up general altruistic capacities through things like the effective altruist movement or GiveWell’s investigation of catastrophic risks, (which can address AI in many ways, including ones now not visible, and benefit from much greater resources as well as greater understanding from being closer to AI); noting that these seem to scale faster and spill over
Invest money in an investment fund for the future which can invest more (in total and as a share of effort) when there are better opportunities, either by the discovery of new options, or the formation of better organizations or people (which can receive seed funding from such a trust)
Enhance decision-making and forecasting capabilities with things like the IARPA forecasting tournaments, science courts, etc, to improve reactions to developments including AI and others (recalling that most of the value of MIRI in your model comes from major institutions being collectively foolish or ignorant regarding AI going forward)
Prediction markets, meta-research, and other institutional changes
Work like Bostrom’s, seeking out crucial considerations and laying out analysis of issues such as AI risk for the world to engage with and to let key actors see the best arguments and reasons bearing on the problems
Pursue cognitive enhancement technologies or education methods (you give CFAR in this domain) to improve societal reaction to such problems
Find the most effective options for synthetic biology threats (GiveWell will be looking through them) and see if that is a more promising angle
No they don’t; they could be checking relative plausibility of causing an OK outcome without trying to put absolute numbers on a probability estimate, and this is reasonable due to the following circumstances:
The life lesson I’ve learned is that by the time you really get anywhere, if you get anywhere, you’ll have encountered some positive miracles, some negative miracles, your plans will have changed, you’ll have found that the parts which took the longest weren’t what you thought they would be, and that other things proved to be much easier than expected. Your successes won’t come through foreseen avenues, and neither will your failures. But running through it all will be the fundamental realization that everything you accomplished, and all the unforeseen opportunities you took advantage of, were things that you would never have received if you hadn’t attacked the hardest part of the problem that you knew about straight-on, without distraction.
How do you estimate probabilities like that? I honestly haven’t a clue. Now, we all still have to maximize expected utility, but the heuristic I’m applying to do that (which at the meta level I think is the planning heuristic with the best chance of actually working) is to ask “Is there any better way of attacking the hardest part of the problem?” or “Is there any better course of action which doesn’t rely on someone else performing a miracle?” So far as I can tell, these other proposed courses of action don’t attack the hardest part of the problem for humanity’s survival, but rely on someone else performing a miracle. I cannot make myself believe that this would really actually work. (And System 2 agrees that System 1′s inability to really believe seems well-founded.)
Since I’m acting on such reasons and heuristics as “If you don’t attack the hardest part of the problem, no one else will” and “Beware of taking the easy way out” and “Don’t rely on someone else to perform a miracle”, I am indeed willing to term what I’m doing “heroic epistemology”. It’s just that I think such reasoning is, you know, actually correct and normative under these conditions.
If you don’t mind mixing the meta-level and the object-level, then I find any reasoning along the lines of “The probability of our contributing to solving FAI is too low, maybe we can have a larger impact by working on synthetic biology defense and hoping a miracle happens elsewhere” much less convincing than the meta-level observation, “That’s a complete Hail Mary pass, if there’s something you think is going to wipe out humanity then just work on that directly as your highest priority.” All the side cleverness, on my view, just adds up to losing the chance that you get by engaging directly with the problem and everything unforeseen that happens from there.
Another way of phrasing this is that if we actually win, I fully expect the counterfactual still-arguing-about-this version of 2013-Carl to say, “But we succeeded through avenue X, while you were then advocating avenue Y, which I was right to say wouldn’t work.” And to this the counterfactual reply of Eliezer will be, “But Carl, if I’d taken your advice back then, I wouldn’t have stayed engaged with the problem long enough to discover and comprehend avenue X and seize that opportunity, and this part of our later conversation was totally foreseeable in advance.” Hypothetical oblivious!Carl then replies, “But the foreseeable probability should still have been very low” or “Maybe you or someone else would’ve tried Y without that detour, if you’d worked on Z earlier” where Z was not actually uniquely suggested as the single best alternative course of action at the time. If there’s a reply that counterfactual non-oblivious Carl can make, I can’t foresee it from here, under those hypothetical circumstances unfolding as I describe (and you shouldn’t really be trying to justify yourself under those hypothetical circumstances, any more than I should be making excuses in advance for what counterfactual Eliezer says after failing, besides “Oops”).
My reasoning here is, from my internal perspective, very crude, because I’m not sure I really actually trust non-crude reasoning. There’s this killer problem that’s going to make all that other stuff pointless. I see a way to make progress on it, on the object level; the next problem up is visible and can be attacked. (Even this wasn’t always true, and I stuck with the problem anyway long enough to get to the point where I could state the tiling problem.) Resources should go to attacking this visible next step on the hardest problem. An exception to this as top priority maximization was CFAR, via “teaching rationality demonstrably channels more resources toward FAI; and CFAR which will later be self-sustaining is just starting up; plus CFAR might be useful for a general saving throw bonus; plus if a rational EA community had existed in 1996 it would have shaved ten years off the timeline and we could easily run into that situation again; plus I’m not sure MIRI will survive without CFAR”. Generalizing, young but hopefully self-sustaining initiatives can be plausibly competitive with MIRI for small numbers of marginal dollars, provided that they’re sufficiently directly linked to FAI down the road. Short of that, it doesn’t really make sense to ignore the big killer problem and hope somebody else handles it later. Not really actually.
If the year was 1960, which would you rather have?
10 smart people trying to build FAI for 20 years, 1960-1980
A billion dollars, a large supporting movement, prediction markets and science courts that make the state of the evidence on AI transparent, and teams working on FAI, brain emulations, cognitive enhancement, and more but starting in 1990 (in expectation closer to AI)
At any given time there are many problems where solutions are very important, but the time isn’t yet right to act on them, rather than on the capabilities to act on them, and also to deal with the individually unexpected problems that come along so regularly. Investment-driven and movement-building-driven discount rates are relevant even for existential risk.
GiveWell has grown in influence much faster than the x-risk community while working on global health, and are now in the process of investigating and pivoting towards higher leverage causes, with global catastrophic risk among the top three under consideration.
I’d rather have both, hence diverting some marginal resources to CFAR until it was launched, then switching back to MIRI. Is there a third thing that MIRI should divert marginal resources to right now?
I have just spent a month in England interacting extensively with the EA movement here (maybe your impressions from the California EA summit differ, I’d be curious to hear). Donors interested in the far future are also considering donations to the following (all of these are from talks with actual people making concrete short-term choices; in addition to donations, people are also considering career choices post-college):
80,000 Hours, CEA and other movement building and capacity-increasing organizations (including CFAR), which also increase non-charity options (e.g. 80k helping people going into scientific funding agencies and political careers where they will be in a position to affect research and policy reactions to technologies relevant to x-risk and other trajectory changes)
AMF/GiveWell charities to keep GiveWell and the EA movement growing while actors like GiveWell, Paul Christiano, Nick Beckstead and others at FHI, investigate the intervention options and cause prioritization, followed by organization-by-organization analysis of the GiveWell variety, laying the groundwork for massive support for the top far future charities and organizations identified by said processes
Finding ways to fund such evaluation with RFMF, e.g. by paying for FHI or CEA hires to work on them
The FHI’s other work
A donor-advised fund investing the returns until such evaluations or more promising opportunities present themselves or are elicited by the fund (possibilities like Drexler’s nanotech panel, extensions of the DAGGRE methods, a Bayesian aggregation algorithm that greatly improves extraction of scientific expert opinion or science courts that could mobilize much more talent and resources to neglected problems with good cases, some key steps in biotech enhancement)
That’s why Peter Hurford posted the OP, because he’s an EA considering all these options, and wants to compare them to MIRI.
That is a sort of discussion my brain puts in a completely different category. Peter and Carl, please always give me a concrete alternative policy option that (allegedly) depends on a debate, if such is available; my brain is then far less likely to label the conversation “annoying useless meta objections that I want to just get over with as fast as possible”.
Can we have a new top-level comment on this?
I edited my top-level comment to include the list and explanation.
Cool, if MIRI keeps going, they might be able to show FAI as top focus with adequate evidence by the time all of this comes together.
Well, in collaboration with FHI. As soon as Bostrom’s Superintelligence is released, we’ll probably be building on and around that to make whatever cases we think are reasonable to make.
Thanks; I found this comment illuminating about your views.
Also, because this.
I read every blog post they put out.
I figure I can use my retirement savings for this.
I thought it came from them being collectively foolish or ignorant regarding Friendliness rather than AGI.
Meh. Sounds like Lean Six Sigma or some other buzzword business process improvement plan.
Luckily, Bostrom is already doing work like Bostrom’s.
Too indirect for my taste.
Not very scary compared to AI. Lots of known methods to combat green goo.
Multiplying small probabilities seems fine to me, whereas I really don’t get “heroic epistemology”.
You seem to be suggesting that “heroic epistemology” and “multiplying small probabilities” both lead to the same conclusion: support MIRI’s work on FAI. But this is the case only if working on FAI has no negative consequences. In that case, “small chance of success” plus “multiplying small probabilities” warrants working on FAI, just as “medium probability of success” and “not multiplying small probabilities” does. But since working on FAI does have negative consequences, namely shortening AI timelines and (in the later stages) possibly directly causing the creation an UFAI, just allowing multiplication by small probabilities is not sufficient to warrant working on FAI if the probability of success is low.
I am really worried that you are justifying your current course of action through a novel epistemology of your own invention, which has not been widely vetted (or even widely understood). Most new ideas are wrong, and I think you ought to treat your own new ideas with deeper suspicion.
I’m a reactionary, not an innovator, dammit! Reacting against this newfangled antiheroic ‘reference class’ claim that says we ought to let the world burn because we don’t have enough of a hero license!
Ahem.
I’m also really unconvinced by the claim that this work could reasonably have expected net negative consequences. I’m worried about the dynamics and evidence of GiveDirectly. But I don’t think GD has negative consequences, that would be a huge stretch. It’s possible maybe but it’s certainly not the arithmetic expectation and with that said, I worry that this ‘maybe negative’ stuff is impeding EA motivation generally, there is much that is ineffectual to be wary of, and missed opportunity costs, but trying to warn people against reverse or negative effects seems pretty perverse for anything that has made it onto Givewell’s Top 3, or CFAR, or FHI, or MIRI. Info that shortens AI timelines should mostly just not be released publicly and I don’t see any particularly plausible way for a planet to survive without having some equivalent of MIRI doing MIRI’s job, and the math thereof should be started as early as feasible.
“Reference class” to me is just an intuitive way of thinking about updating on certain types of evidence. It seems like you’re saying that in some cases we ought to use the inside view, or weigh object-level evidence more heavily, but 1) I don’t understand why you are not worried about “inside view” reasoning typically producing overconfidence or why you don’t think it’s likely to produce overconfidence in this case, and 2) according to my inside view, the probability of a team like the kind you’re envisioning solving FAI is low, and a typical MIRI donor or potential donor can’t be said to have much of an inside view on this matter, and has to use “reference class” reasoning. So what is your argument here?
Every AGI researcher is unconvinced by that, about their own work.
CFAR and MIRI were created by you, to help you build FAI. If FHI has endorsed your plan for building FAI (as opposed to endorsing MIRI as an organization that’s a force for good overall, which I’d agree with and I’ve actually provided various forms of support to MIRI because of that), I’m not aware of it. I also think I’ve thought enough about this topic to give some weight to my own judgments, so even if FHI does endorse your plan, I’d want to see their reasoning (which I definitely have not seen) and not just take their word. I note that Givewell does publish its analyses and are not asking people to just trust it.
My model of FAI development says that you have to get most of the way to being able to build an AGI just to be able to start working on many Friendliness-specific problems, and solving those problems would take a long time relative to finishing rest of the AGI capability work. Unless you’re flying completely below the radar, which is incompatible with your plan for funding via public donations, what is stopping your unpublished results from being stolen or leaked in the mean time? And just gathering 10 to 50 world-class talents to work on FAI is likely to spur competition and speed up AGI progress. The fact that you seem to be overconfident about your chance of success also suggests that you are likely to be overconfident in other areas, and indicates a high risk of accidental UFAI creation (relative to the probability of success, not necessarily high in absolute terms).
Agree, though luckily there are other Friendliness-specific problems that we can start solving right now.
Presumably, security technology similar to what has mostly worked for the Manhattan project, secret NSA projects, etc. But yeah, it’s a big worry. But what did you have in mind about flying completely under the radar? There are versions of an FAI team that could be funded pretty discretely by just one person.
I listed some in another comment, but they are not the current focus of MIRI research. Instead, MIRI is focusing on FAI-relevant problems that do shorten AI timelines (i.e., working on “get most of the way to being able to build an AGI”), such as decision theory and logical uncertainty.
As I noted in previous comments, the economics of information security seems to greatly favor the offense, so you have to have to spend much more resources than your attackers in order to maintain secrets.
That’s probably the best bet as far as avoiding having your results stolen, but introduces other problems, such as how to attract talent, and whether you can fund a large enough team that way. (Small teams might increase the chances of accidental UFAI creation, since there would be less people to look out for errors.) And given that Eliezer is probably already on the radar of most AGI researchers, you’d have to find a replacement for him on this “under the radar” team.
I should ask this question now rather than later: Is there a concrete policy alternative being considered by you?
And on one obvious ‘outside view’, they’d be right—it’s a very strange and unusual situation, which took me years to acknowledge, that this one particular class of science research could have perverse results. There’s many attempted good deeds which have no effect, but complete backfires make the news because they’re rare.
(Hey, maybe the priors in favor of good outcomes from the broad reference class of scientific research are so high that we should just ignore the inside view which says that AGI research will have a different result!)
And even AGI research doesn’t end up making it less likely that AGI will be developed, please note—it’s not that perverse in its outcome.
I’m currently in favor of of the following:
research on strategies for navigating intelligence explosion (what I called “Singularity Strategies”)
pushing for human intelligence enhancement
pushing for a government to try to take an insurmountable tech lead via large scale intelligence enhancement
research into a subset of FAI-related problems that do not shorten AI timelines (at least as far as we can tell), such as consciousness, normative ethics, metaethics, metaphilosophy
advocacy/PR/academic outreach on the dangers of AGI progress
What about continuing physics research possibly leading to a physics disaster or new superweapons, biotech research leading to biotech disasters, nanotech research leading to nanotech disasters, WBE research leading to value drift and Malthusian outcomes, computing hardware research leading to deliberate or accidental creation of massive simulated suffering (aside from UFAI)? In addition, I thought you believed that faster economic growth made a good outcome less likely, which would imply that most scientific research is bad?
Many AGI researchers seem to think that their research will result in a benevolent AGI, and I’m assuming you agree that their research does make it less likely that such an AGI will be eventually developed.
It seems odd to insist that someone explicitly working on benevolence should consider themselves to be in the same reference class as someone who thinks they just need to take care of the AGI and the benevolence will pretty much take care of itself.
I wasn’t intending to use “AGI researchers” as a reference class to show that Eliezer’s work is likely to have net negative consequences, but to show that people whose work can reasonably be expected to have net negative consequences (of whom AGI researchers is a prominent class) still tend not to believe such claims, and therefore Eliezer’s failure to be convinced is not of much evidential value to others.
The reference class I usually do have in mind when I think of Eliezer is philosophers who think they have the right answer to some philosophical problem (virtually all of whom end up being wrong or at least incomplete even if they are headed in the right direction).
ETA: I’ve written a post that expands on this comment.
Related to heroic epistemology: The Five Cognitive Distortions of People Who Get Stuff Done.
“Cognitively distorted” people should lose. People who get stuff done should have their alternate thinking processes carefully examined to see how the divergence is more rational than the non-divergence.
It’s not clear to me why people who accurately model the world should outperform those who follow less cognitively demanding heuristics. I’ve seen this position stated as a truism during a debate, but have never read an argument for or against it. Would someone be able to link to an argument about following non-robust shortcuts to rationality, or write a short case against that practice?
If the aforementioned analysis of thinking processes finds that the advantage comes from superior allocation of bounded computational resources then that would be an interesting finding and a sufficient explanation. In some cases the alternate heuristics may be worth adopting.
Is the common-sense expectation that non-robust heuristics deliver poor results in a wider subset of possible future environments than robust heuristics not adequate?
The most charitable way I can interpret this is:
“Yeah, the middle point of my probability interval for a happy ending is very low, but the interval is large enough that its upper bound isn’t that low, so it’s worth my time and your money trying to reach a happy ending.”
Am I right?
I don’t. :)
I’m saying I don’t know how to estimate heroic probabilities. I do not know any evenhanded rules which assign ‘you can’t do that’ probability to humanity’s survival which would not, in the hands of the same people thinking the same way, rule out Google or Apple, and maybe those happened to other people, but the same rules would also say that I couldn’t do the 3-5 other lesser “impossibilities” I’ve done so far. Sure, those were much easier “impossibilities” but the point is that the sort of people who think you can’t build Friendly AI because I don’t have a good-enough hero license to something so high-status or because heroic epistemology allegedly doesn’t work in real life, would also claim all those other things couldn’t happen in real life, if asked without benefit of advance knowledge to predict the fate of Steve Wozniak or me personally; that’s what happens when you play the role of “realism”.
Overconfidence (including factual error about success rates) is pervasive in entrepreneurs, both the failures and successes (and the successes often fail on a second try, although they have somewhat better odds). The motivating power of overconfidence doesn’t mean the overconfidence is factually correct or that anyone else should believe it. And the mega-successes tended to look good in expected value, value of information, and the availability of good intermediate outcomes short of mega-success: there were non-crazy overconfident reasons to pursue them. The retreat to “heroic epistemology” rather than reasons is a bad sign relative to those successes, and in any case most of those invoking heroic epistemology style reasoning don’t achieve heroic feats.
Applying the outside view startup statistics, including data on entrepreneur characteristics like experience and success rates of repeat entrepreneurs is not magic or prohibitively difficult. Add in the judgments of top VCs to your model.
For individuals or area/firm experts, one can add in hard-to-honestly-signal data (watching out for overconfidence in various ways, using coworkers, etc). That model would have assigned a medium chance to pretty nice success for Apple, and maybe 1-in-100 to 1-in-1000 odds of enormous success, with reasonable expected value for young people willing to take risks and enthused about the field. And the huge Apple success came much later, after Jobs had left and returned and Wozniak was long gone.
Google started with smart people with characteristics predictive of startup success, who went in heavily only after they had an algorithm with high commercial value (which looked impressive to VCs and others). Their success could have been much smaller if their competitors had been more nimble.
And of course you picked them out after the fact, just like you pick out instances of scientists making false predictions rather than true ones in the history of technology. You need to reconcile your story with the experience of the top VCs and serial entrepreneurs, and the degree of selection we see in the characteristics of people at different levels of success (which indicate a major multiplicative role for luck, causing a big chunk of the variation on a log scale).
You have some good feats, and failures too (which give us some outside view info to limit the probabilities for outsiders’ evaluation of your heroic epistemology). But the overall mix is not an outlier of success relative to, e.g. the reference class of other top talent search students with a skew towards verbal ability, unless you treat your choice of problem as such.
Did I say that? No, I did not say that. You should know better than to think I would ever say that. Knowingly make an epistemic error? Say “X is false but I believe it is true”? Since we’re talking heroism anyway, Just who the hell do you think I am?
Okay, so suppose we jump back 4 years and I’m saying that maybe I ought to write a Harry Potter fanfiction. And it’s going to be the most popular HP fanfiction on the whole Internet. And Mathematical Olympiad winners will read it and work for us. What does your nonheroic epistemology say? Because I simply don’t believe that (your) nonheroic epistemology gets it right. I don’t think it can discriminate between the possible impossible and the impossible impossible. It just throws up a uniform fog of “The outside view says it is nonvirtuous to try to distinguish within this reference class.”
I thought Quixey was doomed because the idea wasn’t good enough. Michael Vassar said that Quixey would succeed because Tomer Kagan would succeed at anything he tried to do. Michael Vassar was right (a judgment already finalized because Quixey has already gotten further than I thought was possible). This made me update on Michael Vassar’s ability to discriminate Tomer Kagans in advance from within a rather large reference class of people trying to be Tomer Kagan.
That’s what the arguments you’ve given for this have mostly amounted to. You have said “I need to believe this to be motivated and do productive work” in response to questions about the probabilities in the past, while not giving solid reasons for the confidence.
When did you predict that? Early on I did not hear you making such claims, with the tune changing after it became clear that demand for it was good.
4 years ago I did advocate getting Math Olympiad people, and said they could be gotten, and had empirical evidence of that from multiple angles. And I did recognize your writing and fiction were well-received, and had evidence from the reactions to “Staring into the Singularity” and OB/LW. You tried several methods, including the rationality book, distributing written rationality exercises/micromanaging CFAR content, and the fanfiction. Most of them wasted time and resources without producing results, and one succeeded.
And there is a larger context, that in addition to the successes you are highlighting the path includes: Flare, LOGI and associated research, pre-Vassar SI management issues, open-source singularity, commercial software, trying to create non-FAI before nanowars in 2015.
Tomer is indeed pretty great, but I have heard Michael say things like that about a number of people and projects over the years. Most did not become like Quixey. And what’s the analogy here? That people with good ability to predict success in scientific research have indicated you will succeed taking into account how the world and the space of computer science and AI progress must be for that? That Michael has?
This does not sound like something I would ever say. Ever, even at age 16. Your memory conflicts with mine. Is there any way to check?
As a LessWrong reader, I notice that I am confused because this does not sound like something you would say, but I’m not sure I could explain the difference between this and “heroic epistemology.”
EDIT: for the benefit of other readers following the conversation, Eliezer gives a description of heroic epistemology here.
For the record, I don’t recall ever hearing you say something like this in my presence or online, and if somebody had told me in person that you had said this, I think I would’ve raised a skeptical eyebrow and said “Really? That doesn’t sound like something Eliezer would say. Have you read The Sequences?”
But also, I remain confused about the normative content of “heroic epistemology.”
Ask Anna about it, she was present on both occasions, at the tail end of the Singularity Summit workshop discussion in New York City, and at the roundtable meeting at the office with Anna, Paul, Luke, and Louie.
In a related vein, arguments like this are arguments that someone could do A, but not so much that you will do A (and B and C and...). My impression is of too many arguments like the former and enough of the latter. If you can remedy that, it would be great, but it is a fact about the responses I have seen.
Eliezer emailed me to ask me about it (per Carl’s request, above); I emailed him back with the email below, which Eliezer requested I paste into the LW thread. Pasting:
From my internal perspective, the truth-as-I-experience-it is that I’m annoyed when people raise the topic because it’s all wasted motion, the question sets up a trap that forces you into appearing arrogant, and I honestly think that “Screw all this, I’m just going to go ahead and do it and you can debate afterward what the probabilities were” is a perfectly reasonable response.
From the perspective of folks choosing between supporting multiple lines of AI risk reduction effort, of which MIRI is only one, such probability estimates are not wasted effort.
Though your point about appearing arrogant is well taken. It’s unfortunate that it isn’t socially okay to publicly estimate a high probability of success, or to publicly claim one’s own exceptionalism, when ones impressions point that way. It places a barrier toward honest conversation here.
I suspect this annoyance is easily misinterpreted, independent of its actual cause. Most humans respond with annoyance when their plans are criticized. Also, in situations where A has power over B, and where B then shares concerns or criticisms about A’s plans, and where A responds with annoyance or with avoidance of such conversation… B is apt to respond (as I did) by being a bit hesitant to bring the topic up, and by also wondering if A is being defensive.
I’m not saying I was correct here. I’m also not sure what the fix is. But it might be worth setting a 1-minute timer and brainstorming or something.
If you were anyone else, this is ordinarily the point where I tell you that I’m just going to ignore all this and go ahead do it, and then afterward you can explain why it was either predictable in retrospect or a fluke, according to your taste. Since it’s you: What’s the first next goal you think I can’t achieve, strongly enough that if I do it, you give up on non-heroic epistemology?
I’m familiar with this move. But you make it before failing too, so its evidentiary weight is limited, and insufficient for undertakings with low enough prior probability from all the other evidence besides the move.
I don’t buy the framing. The update would be mainly about you and the problem in question, not the applicability of statistics to reality.
Two developments in AI as big as Pearl’s causal networks (as judged by Norvig types) by a small MIRI team would be a limited subset of the problems to be solved by a project trying to build AGI with a different and very safe architecture before the rest of the world, and wouldn’t address the question of the probability that such is needed in the counterfactual, but it would cause me to stop complaining and would powerfully support the model that MIRI can be more productive than the rest of the AI field when currently-available objective indicators put it as a small portion of the quality-adjusted capacity.
If we want a predictor for success that’s a lot better than the vast majority of quite successful entrepreneurs and pathbreaking researchers, making numerous major basic science discoveries and putting them together in a way that saves the world, then we need some evidence to distinguish the team and explain why it will make greater scientific contributions than any other ever with high reliability in a limited time.
A lot of intermediate outcomes would multiply my credence in and thus valuation of the “dozen people race ahead of the rest of the world in AI” scenario, but just being as productive as von Neumann or Turing or Pearl or Einstein would not result in high probability of FAI success, so the evidence has to be substantial.
Sure, you try, sometimes you lose, sometimes you win. On anti-heroic epistemology (non-virtuous to attempt to discriminate within an outside view) there shouldn’t be any impossible successes by anyone you know personally after you met them. They should only happen to other people selected post-facto by the media, or to people who you met because of their previous success.
We disagree about how to use statistics in order to get really actually correct answers. Having such a low estimate of my rationality that you think that I know what correct statistics are, and am refusing to use them, is not good news from an Aumann perspective and fails the ideological Turing Test. In any case, surely if my predictions are correct you should update your belief about good frameworks (see the reasoning used in the Pascal’s Muggle post) - to do otherwise and go on insisting that your framework was nonetheless correct would be oblivious.
...should not have been disclosed to the general world, since proof well short of this should suffice for sufficient funding (Bayes nets were huge), though they might be disclosed to some particular Norvig type on a trusted oversight committee if there were some kind of reason for the risk. Major breakthroughs on the F side of FAI are not likely to be regarded as being as exciting as AGI-useful work like Bayes nets, though they may be equally mathematically impressive or mathematically difficult. Is there some kind of validation which you think MIRI should not be able to achieve on non-heroic premises, such that the results should be disclosed to the general world?
EDIT: Reading through the rest of the comment more carefully, I’m not sure we estimate the same order of magnitude of work for what it takes to build FAI under mildly good background settings of hidden variables. The reason why I don’t think the mainstream can build FAI isn’t that FAI is intrinsically huge a la the Cyc hypothesis. The mainstream is pretty good at building huge straightforward things. I just expect them to run afoul of one of the many instakill gotchas because they’re one or two orders of magnitude underneath the finite level of caring required.
EDIT 2: Also, is there a level short of 2 gigantic breakthroughs which causes you to question non-heroic epistemology? The condition is sufficient, but is it necessary? Do you start to doubt the framework after one giant breakthrough (leaving aside the translation question for now)? If not, what probability would you assign to that, on your framework? Standard Bayesian Judo applies—if you would, as I see it, play the role of the skeptic, then you must either be overly-credulous-for-the-role that we can do heroic things like one giant breakthrough, or else give up your skepticism at an earlier signal than the second. For you cannot say that something is strongly prohibited on your model and yet also refuse to update much if it happens, and this applies to every event which might lie along the way. (Evenhanded application: ’Tis why I updated on Quixey instead of saying “Ah, but blah”; Quixey getting this far just wasn’t supposed to happen on my previous background theory, and shouldn’t have happened even if Vassar had praised ten people to me instead of two.)
I don’t understand why you say this. Given Carl’s IQ and social circle (didn’t he used to work for a hedge fund run by Peter Thiel?) why would it be very surprising that someone he personally knows achieves your current level of success after he meets them?
Carl referenced “Staring Into the Singularity” as an early indicator of your extraordinary verbal abilities (which explains much if not all of your subsequent successes). It suggests that’s how you initially attracted his attention. The same is certainly true for me. I distinctly recall saying to myself “I should definitely keep track of this guy” when I read that, back in the extropian days. Is that enough for you to count as “people who you met because of their previous success”?
In any case, almost everyone who meets you now would count you as such. What arguments can you give to them that “heroic epistemology” is normative (and hence they are justified in donating to MIRI)?
To state my overall position on the topic being discussed, I think according to “non-heroic epistemology”, after someone achieves an “impossible success”, you update towards them being able to achieve further successes of roughly the same difficulty and in related fields that use similar skills, but the posterior probabilities of them solving much more difficult problems or in fields that use very different skills remain low (higher relative to the prior, but still low in an absolute sense). Given my understanding of the distribution of cognitive abilities in humans, I don’t see why I would ever “give up” this epistemology, unless you achieved a level of success that made me suspect that you’re an alien avatar or something.
Yes, no matter how many impossible things you do, the next person you meet thinks that they only heard of you because of them, ergo selection bias. This is an interesting question purely on a philosophical level—it seems to me to have some of the flavor of quantum suicide experiments where you can’t communicate your evidence. In principle this shouldn’t happen without quantum suicide for logically omniscient entities who already know the exact fraction of people with various characteristics, i.e., agree on exact priors, but I think it might start happening again to people who are logically unsure about which framework they should use.
To avoid talking past one another: I agree that one can and should update on evidence beyond the most solid empirical reference classes in predicting success. If you mean to say that a majority of the variation on a log scale in success (e.g. in wealth or scientific productivity) can be accounted for with properties of individuals and their circumstances, beyond dumb luck then we can agree on that. Some of those characteristics are more easily observed, while others are harder to discern or almost unmeasurable from a distance so that track records may be our best way to discern them.
That is to say, repeated successes should not be explained by luck, but by updating estimates of hard-to-observe characteristics and world model.
The distribution of successes and failures you have demonstrated is not “impossible” or driving a massive likelihood ratio given knowledge about your cognitive and verbal ability, behavioral evidence of initiative and personality, developed writing skill (discernible through inspection and data about its reception), and philosophical inclinations. Using measurable features, and some earlier behavioral or track record data one can generate reference classes with quite high levels of lifetime success, e.g. by slicing and dicing cohorts like this one. Updating on further successes delivers further improvements.
But updating on hidden characteristics does not suffer exponential penalties like chance explanations, and there is a lot of distance to cover in hidden characteristics before a 10% probability of MIRI-derived FAI (or some other causal channel) averting existential catastrophe that would have occurred absent MIRI looks reasonable.
Now large repeated updates about hidden characteristics still indicate serious model problems and should lead us to be very skeptical of those models. However, I don’t see such very large surprising updates thus far.
If difficulty is homogenous (at least as far as one can discern in advance), then we can use these data straightforwardly, but a lot of probability will be peeled off relative to “Tomer must win.” And generalizing to much higher difficulty is still dubious for the reasons discussed above.
This is not what I meant. I didn’t claim you would explicitly endorse a contradiction formally. But nonetheless, the impression I got was of questions about probability met with troubling responses like talking about the state of mind you need for work and wanting to not think in terms of probabilities of success for your own work. That seems a bad signal because of the absence of good responses, and the suggestion that the estimates may not be the result of very much thought, or may be unduly affected by their emotional valence, without ever saying “p and not p.”
As I said elsewhere a 10% probability of counterfactually saving the world is far above the threshold for action. One won’t get to high confidence in that low prior claim without extraordinary evidence, but the value of pursuing it increases continuously with intermediate levels of evidence. Some examples would be successful AI researchers coming to workshops and regularly saying that the quality and productivity of the research group and process was orders of magnitude more productive, the results very solid, etc. This is one of the reasons I like the workshop path, because it exposes the thesis to empirical feedback.
Although as we have discussed with AI folk, there are also smart AI people who would like to find nice clean powerful algorithms with huge practical utility without significant additional work.
Yes, we do still have disagreements about many of the factual questions that feed into a probability estimate, and if I adopted your view on all of those except MIRI productivity there would be much less of a gap. There are many distinct issues going into the estimation of a probability of your success, from AGI difficulty, to FAI difficulty, to the competence of regular AI people and governance institutions, the productivity of a small MIRI team, the productivity of the rest of the world, signs of AI being close, reactions to those signs, and others.
There are a number of connections between these variables, but even accounting for that your opinions are systematically firmly in the direction of greater personal impact relative to the analyses of others, and the clustering seems tighter than is typical (others seem to vary more, sometimes evaluating different subissues as pointing in different directions). This shows up in attempts to work through the issues for estimation, as at that meeting with Paul et al.
One can apply a bias theory to myself and Paul Christiano and Nick Bostrom and the FHI surveys of AI experts are biased towards normalcy, respectability and conservatism. But I would still question the coincidence of so many substantially-independent variables landing in the same direction, and uncertainty over the pieces hurts the hypothesis that MIRI has, say, a 10% probability of averting a counterfactual existential catastrophe disproportionately.
And it is possible that you have become a superb predictor of such variables in the last 10 years (setting aside earlier poor predictions), and I could and would update on good technological and geopolitical prediction in DAGGRE or the like.
Thanks for talking this out, and let me reiterate that in my expectation your and MIRI’s existence (relative to the counterfactual in which it never existed and you become a science fiction writer) has been a good thing and reduced my expectation for existential risk.
Of course I expect you to say that, since to say otherwise given your previous statements is equivalent to being openly incoherent and I do not regard you so lowly. But I don’t yet believe that you would actually have accepted or predicted those successes ante facto, vs. claiming ante facto that those successes were unlikely and that trying was overconfident. Which is why I repeat my question: What is the least impossible thing I could do next, where anything up to that is permitted by your model so it’s equivalent to affirming that you think I might be able to do it, and anything beyond that was prohibited by your model so it’s time to notice your confusion? I mean, if you think I can make one major AI breakthrough but not two, that’s already a lot of confidence in me… is that really what your outside view would say about me?
Please distinguish between the disputed reality and your personal memory, unless you’re defining the above so broadly (and uncharitably!) that my ‘wasted motion’ FB post counts as an instance.
Without significant work? I don’t think I can do that. Why would you think I thought I could do that?
If enough people agreed on that and DAGGRE could be done with relatively low effort on my part, I would do so, though I think I’d want at least some people committing in writing to large donations given success because it would be a large time commitment and I’m prior-skeptical that people know or are honest about their own reasons for disagreement; and I would expect the next batch of pessimists to write off the DAGGRE results (i.e., claim it already compatible with my known properties) so there’d be no long-term benefit. Still, 8 out of 8 on 80K’s “Look how bad your common sense is!” test, plus I recall getting 9 out of 10 questions correct the last time I was asked for 90% probabilities on a CFAR calibration test, so it’s possible I’ve already outrun the reference class of people who are bad at this.
Though if it’s mostly geopolitical questions where the correct output is “I know I don’t know much about this” modulo some surface scans of which other experts are talking sense, I wouldn’t necessarily expect to outperform the better groups that have already read up on cognitive rationality and done a few calibration exercises.
So, if von Neumann came out with similar FAI claims, but couldn’t present compelling arguments to his peers (if not to exact agreement, perhaps within an order of magnitude) I wouldn’t believe him. So showing that, e.g. your math problem-solving ability is greater than my point estimate, wouldn’t be very relevant. Shocking achievements would lead me to upgrade my estimate of your potential contribution going forward (although most of the work in an FAI team would be done by others in any case), resolving uncertainty about ability, but that would not be enough as such, it would have to be the effect on my estimates of your predictive model.
I would make predictions on evaluations of MIRI workshop research outputs by a properly constructed jury of AI people. If the MIRI workshops were many times more productive than comparably or better credentialed AI people according to independent expert judges (blinded to the extent possible) I would say my model was badly wrong, but I don’t think you would predict a win on that.
To avoid “too much work to do/prep for” and “disagreement about far future consequences of mundane predicted intermediates” you could give me a list of things that you or MIRI plan to attempt over the next 1, 3, and 5 years and I could pick one (with some effort to make it more precise).
Yes, I have seen you writing about the 80k quiz on LW and 80k and elsewhere, it’s good (although as you mention, test-taking skills went far on it). I predict that if we take an unbiased sample of people with similarly high cognitive test scores, extensive exposure to machine learning, and good career success (drawn from academia and tech/quant finance, say), and look at the top scorers on the 80k quiz and similar, their estimates for MIRI success will quite a bit closer to mine than yours. Do you disagree? Otherwise, I would want to see drastic outperformance relative to such a group on a higher-ceiling version (although this would be confounded by advance notice and the opportunity to study/prepare).
DAGGRE is going into the area of technology, not just geopolitics. Unfortunately it is mostly short term stuff, not long-term basic science, or subtle properties of future tech, so the generalization is imperfect. Also, would you predict exceptional success in predicting short-medium term technological developments?
The question is not what convinces you that I can do FAI within the framework of your antiheroic epistemology. The question is what first and earliest shows that your antiheroic epistemology is yielding bad predictions. Is this a terrible question to ask for some reason? You’ve substituted an alternate question a couple of times now.
From my perspective, you just asked how bad other people are at predicting such developments. The answer is that I don’t know. Certainly many bloggers are terrible at it. I don’t suppose you can give a quick example of a DAGGRE question?
Which I said in the very same paragraph.
I already gave the example of independent judges evaluating MIRI workshop output, among others. If we make the details precise, I can set the threshold on the measure. Or we can take any number of other metrics with approximately continuous outputs where I can draw a line. But it takes work to define a metric precise enough to be solid, and I don’t want to waste my time generating more and more additional examples or making them ultra-precise without feedback on what you will actually stake a claim on.
I can’t determine what’s next without knowledge of what you’ll do or try.
http://blog.daggre.org/tag/prediction-market/
To clear up the ambiguity, does this mean you agree that I can do anything short of what von Neumann did, or that you don’t think it’s possible to get as far as independent judges favorably evaluating MIRI output, or is there some other standard you have in mind? I’m trying to get something clearly falsifiable, but right now I can’t figure out the intended event due to sheer linguistic ambiguity.
I also think that evaluation by academics is a terrible test for things that don’t come with blatant overwhwelming unmistakable undeniable-even-to-humans evidence—e.g. this standard would fail MWI, molecular nanotechnology, cryonics, and would have recently failed ‘high-carb diets are not necessarily good for you’. I don’t particularly expect this standard to be met before the end of the world, and it wouldn’t be necessary to meet it either.
As I said in my other comment, I would be quite surprised if your individual mathematical and AI contributions reach the levels of the best in their fields, as you are stronger verbally than mathematically, and discuss in more detail what I would find surprising and not there.
I recently talked to Drexler about nanotechnology in Oxford. Nanotechnology is
Way behind Drexler’s schedule, and even accounting for there being far less funding and focused research than he expected, the timeline skeptics get significant vindication
Was said by the NAS panel to be possible, with no decisive physical or chemical arguments against (and discussion of some uncertainties which would not much change the overall picture, in any case), and arguments against tend to be or turn into timeline skepticism and skepticism about the utility of research
Has not been the subject of a more detailed report or expert judgment test than the National Academy of Sciences one (which said it’s possible) because Drexler was not on the ball and never tried. He is currently working with the FHI to get a panel of independent eminent physicists and chemists to work it over, and expects them to be convinced.
Also, while it seems to me that Michael should have said this about many people, I have not actually heard him say this about many people, to me, except Alyssa Vance.
This seems to be usually accounted for by value of information, you should do some unproven things primarily in order to figure out if something like that is possible (or why not, in more detail), before you know it to be possible. If something does turn out to be possible, you just keep on doing it, so that the primary motivation changes without the activity itself changing.
(One characteristic of doing something for its value of information as opposed to its expected utility seems to be the expectation of having to drop it when it’s not working out. If something has high expected utility a priori, continuing to do it despite it not working won’t be as damaging (a priori), even though there is no reason to act this way.)
Not sure I understood this—are you saying that the expected damage caused by continuing to do it despite it not working is less just because the probability that it won’t work is less?