Summary: I review a significant amount of 2017 research related to AI Safety and offer some comments about where I am going to donate this year. Cross-posted from here upon request.
Contents
Contents
Introduction
The Machine Intelligence Research Institute (MIRI)
The Future of Humanity Institute (FHI)
Global Catastrophic Risks Institute (GCRI)
The Center for the Study of Existential Risk (CSER)
AI Impacts
Center for Human-Compatible AI (CFHCA)
Other related organisations
Related Work by other parties
Other major developments this year
Conclusion
Disclosures
Bibliography
Introduction
Like last year, I’ve attempted to review the research that has been produced by various organisations working on AI safety, to help potential donors gain a better understanding of the landscape. This is a similar role to that which GiveWell performs for global health charities, and somewhat similar to an securities analyst with regards to possible investments. It appears that once again no-one else has attempted to do this, to my knowledge, so I’ve once again undertaken the task. While I’ve been able to work significantly more efficiently on this than last year, I have been unfortunately very busy with my day job, which has dramarically reduced the amount of time I’ve been able to dedicate.
My aim is basically to judge the output of each organisation in 2017 and compare it to their budget. This should give a sense for the organisations’ average cost-effectiveness. Then we can consider factors that might increase or decrease the marginal cost-effectiveness going forward. We focus on organisations, not researchers.
Judging organisations on their historical output is naturally going to favour more mature organisations. A new startup, whose value all lies in the future, will be disadvantaged. However, I think that this is correct. The newer the organisation, the more funding should come from people with close knowledge. As organisations mature, and have more easily verifiable signals of quality, their funding sources can transition to larger pools of less expert money. This is how it works for startups turning into public companies and I think the same model applies here.
This judgement involves analysing a large number papers relating to Xrisk that were produced during 2017. Hopefully the year-to-year volatility of output is sufficiently low that this is a reasonable metric. I also attempted to include papers during December 2016, to take into account the fact that I’m missing the last month’s worth of output from 2017, but I can’t be sure I did this successfully.
This article focuses on AI risk work. If you think other causes are important too, your priorities might differ. This particularly affects GCRI and CSER, who both do a lot of work on other issues.
We focus virtually exclusively on papers, rather than outreach or other activities. This is party because they are much easier to measure; while there has been a large increase in interest in AI safety over the last year, it’s hard to work out who to credit for this, and partly because I think progress has to come by persuading AI researchers, which I think comes through technical outreach and publishing good work, not popular/political work.
My impression is that policy on technical subjects (as opposed to issues that attract strong views from the general population) is generally made by the government and civil servants in consultation with, and being lobbied by, outside experts and interests. Without expert (e.g. top ML researchers at Google, CMU & Baidu) consensus, no useful policy will be enacted. Pushing directly for policy seems if anything likely to hinder expert consensus. Attempts to directly influence the government to regulate AI research seem very adversarial, and risk being pattern-matched to ignorant opposition to GM foods or nuclear power. We don’t want the ‘us-vs-them’ situation, that has occurred with climate change, to happen here. AI researchers who are dismissive of safety law, regarding it as an imposition and encumbrance to be endured or evaded, will probably be harder to convince of the need to voluntarily be extra-safe—especially as the regulations may actually be totally ineffective. The only case I can think of where scientists are relatively happy about punitive safety regulations, nuclear power, is one where many of those initially concerned were scientists themselves. Given this, I actually think policy outreach to the general population is probably negative in expectation.
The good news on outreach this year is we haven’t had any truly terrible publicity that I can remember, though I urge organisations to remember that the personal activities of their employees, especially senior ones, reflect on the organisations themselves, so they should take care not to act/speak in ways that are offensive to those outside their bubble, and to avoid hiring crazy people.
Part of my motivation for writing this is to help more people become informed about the AI safety landscape so they can contribute better with both direct work and donations. With regard donations, at present Nick Beckstead, in his role as both Fund Manager of the Long-Term Future Fund and officer with the Open Philanthropy Project, is probably the most important financer of this work. He is also probably significantly more informed on the subject than me, but I think it’s important that the vitality of the field doesn’t depend on a single person, even if that person is awesome.
The Machine Intelligence Research Institute (MIRI)
MIRI is the largest pure-play AI existential risk group. Based in Berkeley, it focuses on mathematics research that is unlikely to be produced by academics, trying to build the foundations for the development of safe AIs.
Their agent foundations work is basically trying to develop the correct way of thinking about agents and learning/decision making by spotting areas where our current models fail and seeking to improve them. Much of their work this year seems to involve trying to address self-reference in some way—how can we design, or even just model, agents that are smart enough to think about themselves? This work is technical, abstract, and requires a considerable belief in their long-term vision, as it is rarely locally applicable, so hard to independently judge the quality.
In 2016 they announced they were somewhat pivoting towards work that tied in closer to the ML literature, a move I thought was a mistake. However, looking at their published research or their 2017 review page, in practice this seems to have been less of a change of direction than I had thought, as most of their work appears to remain on highly differentiated and unreplaceable agent foundations type work—it seems unlikely that anyone not motivated by AI safety would produce this work. Even within those concerned about friendly AI, few not at MIRI would produce this work.
Critch’s Toward Negotiable Reinforcement Learning: Shifting Priorities in Pareto Optimal Sequential Decision-Making (elsewhere titled ‘Servant of Many Masters’) is a neat paper. Basically it identifies the pareto-efficient outcome if you have two agents with different beliefs who want to agree on a utility function for an AI, in a generalisation of Harsanyi’s Cardinal welfare, individualistic ethics, and interpersonal comparisons of utility. The key assumption is both want to use their current beliefs when they calculate the expected value of the deal to themselves, and the (surprising to me) conclusion is that over time the AI will have to weigh more and more heavily the values of the negotiator whose beliefs were more accurate. While I don’t think this is necessarily Critch’s interpretation, I take this as something of a reductio of the assumption. Surely if I was negotiating over a utility function, I would want the agent to learn about the world and use that knowledge to better promote my values … not to learn about the world, decide I was a moron with a bad world model, and ignore me thereafter? If I think the AI is/will be smarter than me, I should be happy for it to do things I’m unaware will benefit me, and avoid doing things I falsely believe will help me. On the other hand, if the parties are well-informed nation states rather than individuals, the prospect of ‘getting one over’ the other might be helpful for avoiding arms races?
Kosoy’s Optimal polynomial-time estimators addresses a similar topic to the Logical Induction work—assigning ‘probabilities’ to logical/mathematical/deductive statements under computational limitations—but with a quite different approach to solving it. The work seems impressive but I didn’t really understand it. Inside his framework he can prove that various results from probability theory also apply to logical statements, which seems like what we’d want. (Note that technically this paper came out in December 2016, and so is included in this year rather than last year’s.)
Carey’s article, Incorrigibility in the CIRL Framework, is a response to Milli et al.’s Should Robots be Obedient and Hadfield-Menel’s The Off-Switch Game. Carey basically argues it’s not necessarily the case that the CIRLs will be ‘automatically’ corigible if the AI’s beliefs about value are very wrong, for example due to incorrect parameterisation or assigning a zero prior to something that turns out to be the case. The discussion section has some interesting arguments, for example pointing out that an algorithm designed to shut itself off unless it had a track record of perfectly predicting what humans would want might still fail if its ontology was insufficient, so it couldn’t even tell that it was disagreeing with the humans during training. I agree that value complexity and fragility might mean it’s very likely that any AI’s value model will be partially (and hence, for an AGI, catastrophically) mis-parameterised. However, I’m not sure how much the examples that take up much of the paper add to this argument. Milli’s argument only holds when the AI can learn the parameters, and given that this paper assumes the humans choose the wrong action by accident less than 1% of the time, it seems that the AI should assign a very large amount of evidence to a shutdown command… instead the AI seems to simply ignore it?
When I asked for top achievements for 2017, MIRI pointed me towards a lot of work they’d posted on agentfoundations.org as being one of their major achievements for the year, especially this, this and this, which pose and then solve a problem about how to find game-theoretic agents that can stably model each other, formulated it as a topological fixed point problem. There is also a lot of other work on agentfoundations that seems interesting, I’m not entirely sure how to think about giving credit for these. These seem more like ‘work in progress’ than finished work—for most organisations I am only giving credit for the latter. MIRI could with some justification respond that the standard academic process is very inefficient, and part of their reason for existence is to do things that universities cannot. However, even if you de-prioritise peer review, I still think it is important to write things up into papers. Otherwise it is extremely hard for outsiders to evaluate—bad both for potential funders and for people wishing to enter the field. Unfortunately it is possible that, if they continue on this route, MIRI might produce a lot of valuable work that is increasingly illegible from the outside. So overall I think I consider these as evidence that MIRI is continuing to actually do research, but will wait until they’re ArXived to actually review them. If you disagree with this approach, MIRI is going to look much more productive, and their research possibility accelerating in 2017 vs 2016. If you instead only look at published papers, 2017 appears to be something of a ‘down year’ after 2016.
Last year I was not keen to see that Eliezer was spending a lot of time producing content on Arbital as part of his job at MIRI, as there was a clear conflict of interest—he was a significant shareholder in Arbital, and additionally I expected Arbital to fail. Now that Arbital does seem to have indeed failed, I’m pleased he seems to be spending less time on it, but confused why he is spending any time at all on it—though some of this seems to be cross-posted from elsewhere.
Eliezer’s book Inadequate Equilibria, however, does seem to be high quality—basically another sequence—though only relevant inasmuch as AI safety might be one of many applications of the subject of the book. I also encourage readers to also read this excellent article by Greg Lewis (FHI) on the other side.
I was sorry to hear Jessica Taylor left MIRI, as I thought she did good work.
MIRI spent roughly $1.9m in 2017, and aim to rapidly increase this to $3.5m in 2019, to fund new researchers and their new engineering team.
The Open Philanthropy Project awarded MIRI a $3.75m grant (over 3 years) earlier this year, largely because one reviewer was impressed with their work on Logical Induction. You may recall this was a significant part of why I endorsed MIRI last year. However, as this review is focused on work in the last twelve months, they don’t get credit for the same work two years running! OPP have said they plan to fund roughly half of MIRI’s budget. On the positive side, one might argue this was essentially a 1:1 match on donations to MIRI—but there are clearly game-theoretic problems here. Additionally, if you had faith in OpenPhil’s process, you might consider this a positive signal of MIRI quality. On the other hand, if you think MIRI’s marginal cost-effectiveness is diminishing over the multi-million dollar range, this might reduce your estimate of the cost-effectiveness of the marginal dollar.
There is also $1m of somewhat plausibly counterfactually valid donation matching available for MIRI (but not other AI Xrisk organisations).
Finally, I will note that MIRI are have been very generous with their time in helping me understand what they are doing.
The Future of Humanity Institute (FHI)
Oxford’s FHI requested not to be included in this analysis, so I won’t be making any comment on whether or not they are a good place to fund. Had they not declined (and depending on their funding situation) they would have been a strong candidate. This was disappointing to me, because they seem to have produced an impressive list of publications this year, including a lot of collaborations. I’ll briefly note two some pieces of research they published this year, but regret not being able to give them better coverage.
Saunders et al. published Trial without Error: Towards Safe Reinforcement Learning via Human Intervention, a nice paper where they attempt to make a Reinforcement Learner that can ‘safely’ learn by training a catastrophe-recognition algorithm to oversee the training. It’s a cute idea, and a nice use of the OpenAI Atari suite, though I was most impressed with the fact that they concluded that their approach would not scale (i.e. would not work). It’s not often researchers publish negative results!
FHI’s Brundage Bot apparently reads every ML paper ever written.
Global Catastrophic Risks Institute (GCRI)
The Global Catastrophic Risks Institute is run by Seth Baum and Tony Barrett. They have produced work on a variety of existential risks, including non-AI risks. Some of this work seems quite valuable, especially Denkenberger’s Feeding Everyone No Matter What on ensuring food supply in the event of disaster, and is probably probably of interest to the sort of person who would read this document. However, they are off-topic for us here. Within AI they do a lot of work on the strategic landscape, and are very prolific.
Baum’s Survey of Artificial General Intelligence Projects for Ethics, Risk, and Policy attempts to analyse all existing AGI research projects. This is a huge project and I laud him for it. I don’t know how much here is news to people who are very plugged in, but to me at least it was very informative. The one criticism I would have is it could do more to try to differentiate on capacity/credibility—e.g. my impression is Deepmind is dramatically more capable than many of the smaller organisations listed—but that is clearly a very difficult ask. It’s hard for me to judge the accuracy, but I didn’t notice any mistakes (beyond being surprised that AIXI has an ‘unspecified’ for safety engagement, given the amount of AI safety papers coming out of ANU.)
Baum’s Social Choice Ethics in Artificial Intelligence argues that value-learning type approaches to AI ethics (like CEV ) contain many degrees of freedom for the programmers to finesse it to pick their values, making them no better than the programmers simply choosing an ethical system directly. The programmers can choose whose values are used for learning, how they are measured, and how they are aggregated. Overall I’m not fully convinced—for example, pace the argument on page 3, a Law of Large Numbers argument could support averaging many views to get at the true ethics even if we had no way of independently verifying the true ethics. And there is some irony that, for all the paper’s concern with bias risk, the left-wing views of the author come through strongly. But despite these I liked the paper, especially for the discussion of who has standing—something that seems like it will need a philosophical solution, rather than a ML one.
Barrett’s Value of Global Catastrophic Risk (GCR) Information: Cost-Effectiveness-Based Approach for GCR Reduction covers a lot of familiar ground, and then attempts to do some monte carlo cost-benefit analysis on the a small number of interventions to help address nuclear war and comet impact. After putting a lot of thought into setting up the machinery, it would have been good to see analysis of a wider range of risks!
Baum & Barrett published Global Catastrophes: The Most Extreme Risks, which seems to be essentially a reasonably well argued general introduction to the subject of existential risks. Hopefully people who bought the book for other reasons will read it and become convinced.
Baum & Barrett’s Towards an Integrated Assessment of Global Catastrophic Risk is a similar introductory piece on catastrophic risks, but the venue—a colloquium on catastrophic risks—seems less useful, as people reading it are more likely to already be concerned about the subject, and I don’t think it spends enough time on AI risk per se to convince those who were already worried about Xrisk but not AI Xrisk.
Last year I was (and still am) impressed by their paper On the Promotion of Safe and Socially Beneficial Artificial Intelligence, which made insightful, convincing and actionable criticisms of ‘AI arms race’ language. I was less convinced by this year’s Reconciliation Between Factions Focused on Near-Term and Long-Term Artificial Intelligence, which argues for a re-alignment away from near-term AI worries vs long-term AI worries towards AI worriers vs non-worriers. However, I’m not sure why anyone would agree to this—long-term worriers don’t currently spend much time arguing against short-term worries (even if you thought that AI discrimination arguments were orwellian, why bother arguing about it?), and convincing short-term worriers to stop criticise long-term worries seems approximately as hard as simply convincing them to become long-term worriers.
GCRI spent approximately $117k in 2017, which is shockingly low considering their productivity. This was lower than 2016; apparently their grants from the US Dept. of Homeland Security came to an end.
The Center for the Study of Existential Risk (CSER)
CSER is an existential risk focused group located in Cambridge. Like GCRI they do work on a variety of issues, notably including Rees’ work on infrastructure resilience.
Last year I criticised them for not having produced any online research over several years; they now have a separate page that does list some but maybe not all of their research.
Liu, a CSER researcher, wrote The Sure-Thing principle and P2 and was second author on Gaifman & Liu’s A simpler and more realistic subjective decision theory, both on the mathematical foundations of bayesian decision theory, which is a valuable topic for AI safety in general. Strangely neither paper mentioned CSER as a financial supporter of the paper or affiliation.
Liu and Price’s Heart of DARCness argues that agents do not have credences for what they will do while deciding whether to do it—their confidence is temporarily undefined. I was not convinced—even someone is deciding whether she’s 75% confident or 50% confident, presumably there are some odds that determine which side in a bet she’d take if forced to choose? I’m also not sure of the direct link to AI safety.
They’ve also convened and attended workshops on AI and decision theory, notably the AI & Society Symposium in Japan, but in general I am wary of giving organisations credit for these, as they are too hard for the outside observer to judge, and ideally workshops lead to produce papers—in which case we can judge those.
CSER also did a significant amount of outreach, including presenting to the House of Lords, and apparently have expertise in Chinese outreach (multiple native mandarin speakers), which could be important, given China’s AI research but cultural separation from the west.
They are undertaking a novel publicity effort that I won’t name as I’m not sure it’s public yet. In general I think most paths to success involve consensus-building among mainstream ML researchers, and ‘popular’ efforts risk harming our credibility, so I am not optimistic here.
Their annual budget is around $750,000, with I estimate a bit less than half going on AI risk . Apparently they need to raise funds to continue existing once their current grants run out in 2019.
AI Impacts
AI Impacts is a small group that does high-level strategy work, especially on AI timelines, somewhat associated with MIRI.
They seem to have produced significantly more this year than last year. The main achievement is the When will AI exceed Human Performance? Evidence from AI Experts, which asked gathered the opinions of hundreds of AI researchers on AI timelines questions. There were some pretty relevant takeaways, like that most researchers find the AI Catastrophic Risk argument somewhat plausible, but doubt there is anything that can usefully be done in the short term, or that asian researchers think human-level AI is significantly closer than americans do. I think the value-prop here is twofold: firstly, providing a source of timeline estimates for when we make decisions that hinge on how long we have, and secondly, to prove that concern about AI risk is a respectable, mainstream position. It was apparently one of the most discussed papers of 2017.
John Salvatier (member of AI Impacts at the time) was also second author on Agent-Agnostic Human-in-the-Loop Reinforcement Learning, along with Evans (FHI, 4th author), which attempts to design an interface for reinforcement learning that abstracts away from the agent, so you could easily change the underlying agent.
AI Impacts’ budget is tiny compared to most of the other organisations listed here; around $60k at present. Incremental funds would apparently be spent on hiring more part-time researchers.
Center for Human-Compatible AI (CFHCA)
The Center for Human-Compatible AI, founded by Stuart Russell in Berkeley, launched in August 2016. As they are not looking for more funding at the moment I will only briefly survey some of they work on cooperative inverse reinforcement learning.
Hadfield-Menel et al’s The Off-Switch Game is a nice paper that produces and formalises the (at least now I’ve read it) very intuitive result that a value-learning AI might be corrigible (at least in some instances) because it takes the fact that a human pressed the off-switch as evidence that this is the best thing to do.
Milli et al’s Should Robots be Obedient is in the same vein as Hadfield-Menel et al’s Cooperative Inverse Reinforcement Learning (last year) on learning values from humans, specifically touching on whether such agents would be willing to obey a command to ‘turn off’, as per Soares’s paper on Corrigibility. She does some interesting analysis about the trade-off between obedience and results in cases where humans are fallible.
In both cases I thought the papers were thoughtful and had good analysis. However, I don’t think either is convincing in showing that corrigibility comes ‘naturally’ - at least not the strength of corrigibility we need.
I encourage them to keep their website more up-to-date.
Overall I think their research is good and their team promising. However, apparently they have enough funding for now, so I won’t be donating this year. If this changed and they requested incremental capital I could certainly imagine funding them in future years.
Other related organisations
The Center for Applied Rationality (CFAR) works on trying to improve human rationality, especially with the aim of helping with AI Xrisk efforts.
The Future of Life Institute (FLI) ran a huge grant-making program to try to seed the field of AI safety research. There definitely seem to be a lot more academics working on the problem now, but it’s hard to tell how much to attribute to FLI.
Eighty Thousand Hours (80K) provide career advice, with AI safety being one of their key cause areas.
Related Work by other parties
Deep Reinforcement Learning from Human Preferences, was possibly my favourite paper of the year, which possibly shouldn’t come as a surprise, given that two of the authors (Christiano and Amodei from OpenAI ) were authors on last year’s Concrete Problems in AI Safety. It applies ideas on bootstrapping that Christiano has been discussing for a while—getting humans to train an AI which then trains another AI etc. The model performs significantly better than I would have expected, and as ever I’m pleased to see OpenAI—Deepmind collaboration.
Christiano continues to produce very interesting content on his blog, like this on Corrigibility. When I first read his articles about how to bootstrap safety through iterative training procedures, my reactions was that, while this seemed an interesting idea, it didn’t seem to have much in common with mainstream ML. However, there do seem to be a bunch of practical papers about imitation learning now. I’m not sure if this was always the case, and I was just ignorant, or if they have become more prominent in the last year. Either way, I have updated towards considering this approach to be a promising one for integrating safety into mainstream ML work. He has also written a nice blog post explaining how AlphaZero works, and arguing that this supports his enhancement ideas.
It was also nice to see ~95 papers that were addressing Amodei et al’s call in last year’s Concrete Problems.
Menda et al’s DropoutDAgger paper on safe exploration seems to fit in this category. Basically they come up with a form of imitation learning where the AI being trained can explore a bit, but isn’t allowed to stray too far from the expert policy—though I’m not sure why they always have the learner explore in the direction it thinks is best, rather than assigning some weight to its uncertainty of outcome, explore-exploit-style. I’m not sure how much credit Amodei et al can get for inspiring this though, as it seems to be (to a significant degree) an extension of Zhang and Cho’s Query-Efficient Imitation Learning for End-to-End Autonomous Driving.
However, I don’t want to give too much credit for work that improves ‘local’ safety that doesn’t also address the big problems in AI safety, because this work probably accelerates unsafe human-level AI. There are many papers in this category, but for obvious reasons I won’t call them out.
Gan’s Self-Regulating Artificial General Intelligence contains some nice economic formalism around AIs seizing power from humans, and raises the interesting argument that if you need specialist AIs to achieve things, the first human-level AIs might not exhibit takeoff behaviour because they would be unable to sufficiently trust the power-seizing agents they would need to create. I’m sceptical that this assumption about the need for specialised AIs holds—surely even if you need to make separate AI agents for different tasks, rather than integrating them, it would suffice to give them specialised capabilities and but the same goals. Regardless, the paper does suggest the interesting possibility that humanity might make an AI which is intelligent enough to realise it cannot solve the alignment problem to safely self-improve… and hence progress stops there—though of course this would not be something to rely on.
In terms of predicting AI timelines, another piece I found interesting was Gupta et al.’s Revisiting the Unreasonable Effectiveness of Data, which argued that, for vision tasks at least, performance improved logarithmically in sample size.
The Foresight Institute published a white paper on the general subject of AI policy and risk.
Stanford’s One Hundred Year Study on Artificial Intelligence produced an AI Index report, which is basically a report on progress in the field up to 2016. Interestingly various metrics they tracked, summarised in their ‘Vibrancy’ metric, suggest that the field actually regressed in 2016, through my experience with similar data in the financial world leaves me rather sceptical of such methodology. Unfortunately the report dedicated only a single word to the subject of AI safety.
On a lighter note, the esteemed G.K. Chesterton returned from beyond the grave to eviscerate an AI risk doubter, and a group of researchers (some FHI) proved that it is impossible to create a machine larger than a human, so that’s a relief.
Other major developments this year
Google’s Deepmind produced AlphaZero, which learnt how to beat the best AIs (and hence also the best humans) at Go, Chess and Shogi with just a few hours of self-play.
Creation of the EA funds, including the Long-Term Future Fund, run by Nick Beckstead, which has made one smallish grant related to AI Safety, conserved the other 96%.
The Open Philanthropy Project funded both MIRI and OpenAI (acquiring a board seat in the process with the latter).
Nvidia (who make GPUs used for ML) saw their share price approximately doubl, after quadrupling last year.
Hillary Clinton was possibly concerned about AI risk? But unfortunately Putin seems to have less helpful concerns about an AI Arms race… namely ensuring that he wins it. And China announced a national plan for AI with chinese characteristics—but bear in mind they have failed at these before, like their push into Semiconductors, though companies like Baidu do seem to be doing impressive research.
There were somepapers suggesting the replication crisis may be coming to ML?
Conclusion
In some ways this has been a great year. My impression is that the cause of AI safety has become increasingly mainstream, with a lot of researchers unaffiliated with the above organisations working at least tangentially on it.
However, it’s tough from the point of view of an external donor. Some of the organisations doing the best work are well funded. Others (MIRI) seem to be doing a lot of good work but (perhaps necessarily) it is significantly harder for outsiders to judge than last year, as there doesn’t seem to be a really heavy-hitting paper like there was last year. I see MIRI’s work as being a long-shot bet that their specific view of the strategic landscape is correct, but given this they’re basically irreplaceable. GCRI and CSER’s work is more mainstream in this regard, but GCRI’s productivity is especially noteworthy, given the order of magnitude of difference in budget size.
As I have once again failed to reduce charity selection to a science, I’ve instead attempted to subjectively weigh the productivity of the different organisations against the resources they used to generate that output, and donate accordingly.
My constant wish is to promote a lively intellect and independent decision-making among my readers; hopefully my laying out the facts as I see them above will prove helpful to some readers. Here is my eventual decision, rot13′d so you can do come to your own conclusions first if you wish:
Fvtavsvpnag qbangvbaf gb gur Znpuvar Vagryyvtrapr Erfrnepu Vafgvghgr naq gur Tybony Pngnfgebcuvp Evfxf Vafgvghgr. N zhpu fznyyre bar gb NV Vzcnpgf.
However I wish to emphasis that all the above organisations seem to be doing good work on the most important issue facing mankind. It is the nature of making decisions under scarcity that we must prioritize some over others, and I hope that all organisations will understand that this necessarily involves negative comparisons at times.
Thanks for reading this far; hopefully you found it useful. Someone suggested that, instead of doing this annually, I should instead make a blog where I provide some analysis of AI-risk related events as they occur. Presumably there would still be an annual giving-season writeup like this one. If you’d find this useful, please let me know.
Disclosures
I was a Summer Fellow at MIRI back when it was SIAI, volunteered very briefly at GWWC (part of CEA) and once applied for a job at FHI. I am personal friends with people at MIRI, FHI, CSER, CFHCA and AI Impacts but not GCRI (so if you’re worried about bias you should overweight them… though it also means I have less direct knowledge). However I have no financial ties beyond being a donor and have never been romantically involved with anyone who has ever been at any of the organisations.
I shared a draft of the relevant sections of this document with representatives of MIRI, CSER and GCRI and AI Impacts. I’m very grateful for Alex Flint and Jess Riedel for helping review a draft of this document. Any remaining inadequacies and mistakes are my own.
2017 AI Safety Literature Review and Charity Comparison
Summary: I review a significant amount of 2017 research related to AI Safety and offer some comments about where I am going to donate this year. Cross-posted from here upon request.
Contents
Contents
Introduction
The Machine Intelligence Research Institute (MIRI)
The Future of Humanity Institute (FHI)
Global Catastrophic Risks Institute (GCRI)
The Center for the Study of Existential Risk (CSER)
AI Impacts
Center for Human-Compatible AI (CFHCA)
Other related organisations
Related Work by other parties
Other major developments this year
Conclusion
Disclosures
Bibliography
Introduction
Like last year, I’ve attempted to review the research that has been produced by various organisations working on AI safety, to help potential donors gain a better understanding of the landscape. This is a similar role to that which GiveWell performs for global health charities, and somewhat similar to an securities analyst with regards to possible investments. It appears that once again no-one else has attempted to do this, to my knowledge, so I’ve once again undertaken the task. While I’ve been able to work significantly more efficiently on this than last year, I have been unfortunately very busy with my day job, which has dramarically reduced the amount of time I’ve been able to dedicate.
My aim is basically to judge the output of each organisation in 2017 and compare it to their budget. This should give a sense for the organisations’ average cost-effectiveness. Then we can consider factors that might increase or decrease the marginal cost-effectiveness going forward. We focus on organisations, not researchers.
Judging organisations on their historical output is naturally going to favour more mature organisations. A new startup, whose value all lies in the future, will be disadvantaged. However, I think that this is correct. The newer the organisation, the more funding should come from people with close knowledge. As organisations mature, and have more easily verifiable signals of quality, their funding sources can transition to larger pools of less expert money. This is how it works for startups turning into public companies and I think the same model applies here.
This judgement involves analysing a large number papers relating to Xrisk that were produced during 2017. Hopefully the year-to-year volatility of output is sufficiently low that this is a reasonable metric. I also attempted to include papers during December 2016, to take into account the fact that I’m missing the last month’s worth of output from 2017, but I can’t be sure I did this successfully.
This article focuses on AI risk work. If you think other causes are important too, your priorities might differ. This particularly affects GCRI and CSER, who both do a lot of work on other issues.
We focus virtually exclusively on papers, rather than outreach or other activities. This is party because they are much easier to measure; while there has been a large increase in interest in AI safety over the last year, it’s hard to work out who to credit for this, and partly because I think progress has to come by persuading AI researchers, which I think comes through technical outreach and publishing good work, not popular/political work.
My impression is that policy on technical subjects (as opposed to issues that attract strong views from the general population) is generally made by the government and civil servants in consultation with, and being lobbied by, outside experts and interests. Without expert (e.g. top ML researchers at Google, CMU & Baidu) consensus, no useful policy will be enacted. Pushing directly for policy seems if anything likely to hinder expert consensus. Attempts to directly influence the government to regulate AI research seem very adversarial, and risk being pattern-matched to ignorant opposition to GM foods or nuclear power. We don’t want the ‘us-vs-them’ situation, that has occurred with climate change, to happen here. AI researchers who are dismissive of safety law, regarding it as an imposition and encumbrance to be endured or evaded, will probably be harder to convince of the need to voluntarily be extra-safe—especially as the regulations may actually be totally ineffective. The only case I can think of where scientists are relatively happy about punitive safety regulations, nuclear power, is one where many of those initially concerned were scientists themselves. Given this, I actually think policy outreach to the general population is probably negative in expectation.
The good news on outreach this year is we haven’t had any truly terrible publicity that I can remember, though I urge organisations to remember that the personal activities of their employees, especially senior ones, reflect on the organisations themselves, so they should take care not to act/speak in ways that are offensive to those outside their bubble, and to avoid hiring crazy people.
Part of my motivation for writing this is to help more people become informed about the AI safety landscape so they can contribute better with both direct work and donations. With regard donations, at present Nick Beckstead, in his role as both Fund Manager of the Long-Term Future Fund and officer with the Open Philanthropy Project, is probably the most important financer of this work. He is also probably significantly more informed on the subject than me, but I think it’s important that the vitality of the field doesn’t depend on a single person, even if that person is awesome.
The Machine Intelligence Research Institute (MIRI)
MIRI is the largest pure-play AI existential risk group. Based in Berkeley, it focuses on mathematics research that is unlikely to be produced by academics, trying to build the foundations for the development of safe AIs.
Their agent foundations work is basically trying to develop the correct way of thinking about agents and learning/decision making by spotting areas where our current models fail and seeking to improve them. Much of their work this year seems to involve trying to address self-reference in some way—how can we design, or even just model, agents that are smart enough to think about themselves? This work is technical, abstract, and requires a considerable belief in their long-term vision, as it is rarely locally applicable, so hard to independently judge the quality.
In 2016 they announced they were somewhat pivoting towards work that tied in closer to the ML literature, a move I thought was a mistake. However, looking at their published research or their 2017 review page, in practice this seems to have been less of a change of direction than I had thought, as most of their work appears to remain on highly differentiated and unreplaceable agent foundations type work—it seems unlikely that anyone not motivated by AI safety would produce this work. Even within those concerned about friendly AI, few not at MIRI would produce this work.
Critch’s Toward Negotiable Reinforcement Learning: Shifting Priorities in Pareto Optimal Sequential Decision-Making (elsewhere titled ‘Servant of Many Masters’) is a neat paper. Basically it identifies the pareto-efficient outcome if you have two agents with different beliefs who want to agree on a utility function for an AI, in a generalisation of Harsanyi’s Cardinal welfare, individualistic ethics, and interpersonal comparisons of utility. The key assumption is both want to use their current beliefs when they calculate the expected value of the deal to themselves, and the (surprising to me) conclusion is that over time the AI will have to weigh more and more heavily the values of the negotiator whose beliefs were more accurate. While I don’t think this is necessarily Critch’s interpretation, I take this as something of a reductio of the assumption. Surely if I was negotiating over a utility function, I would want the agent to learn about the world and use that knowledge to better promote my values … not to learn about the world, decide I was a moron with a bad world model, and ignore me thereafter? If I think the AI is/will be smarter than me, I should be happy for it to do things I’m unaware will benefit me, and avoid doing things I falsely believe will help me. On the other hand, if the parties are well-informed nation states rather than individuals, the prospect of ‘getting one over’ the other might be helpful for avoiding arms races?
Kosoy’s Optimal polynomial-time estimators addresses a similar topic to the Logical Induction work—assigning ‘probabilities’ to logical/mathematical/deductive statements under computational limitations—but with a quite different approach to solving it. The work seems impressive but I didn’t really understand it. Inside his framework he can prove that various results from probability theory also apply to logical statements, which seems like what we’d want. (Note that technically this paper came out in December 2016, and so is included in this year rather than last year’s.)
Carey’s article, Incorrigibility in the CIRL Framework, is a response to Milli et al.’s Should Robots be Obedient and Hadfield-Menel’s The Off-Switch Game. Carey basically argues it’s not necessarily the case that the CIRLs will be ‘automatically’ corigible if the AI’s beliefs about value are very wrong, for example due to incorrect parameterisation or assigning a zero prior to something that turns out to be the case. The discussion section has some interesting arguments, for example pointing out that an algorithm designed to shut itself off unless it had a track record of perfectly predicting what humans would want might still fail if its ontology was insufficient, so it couldn’t even tell that it was disagreeing with the humans during training. I agree that value complexity and fragility might mean it’s very likely that any AI’s value model will be partially (and hence, for an AGI, catastrophically) mis-parameterised. However, I’m not sure how much the examples that take up much of the paper add to this argument. Milli’s argument only holds when the AI can learn the parameters, and given that this paper assumes the humans choose the wrong action by accident less than 1% of the time, it seems that the AI should assign a very large amount of evidence to a shutdown command… instead the AI seems to simply ignore it?
Some of MIRI’s publications this year seem to mainly be better explanations of previous work. For example, Garrabrant et al’s A Formal Approach to the Problem of Logical Non-Omniscience seems to be basically an easier to understand version of last year’s Logical Induction. Likewise Yudkowsky and Soares’s Functional Decision Theory: A New Theory of Instrumental Rationality seems to be basically new exposition of classic MIRI/LW decision theory work—see for example Soares et al’s Toward Idealized Decision Theory. Similarly, I didn’t feel like there was much new in Soares et al’s Cheating Death in Damascus. Making things easier to understand is useful—and last year’s Logical Induction paper was a little dense—but it’s clearly not as impressive as inventing new things.
When I asked for top achievements for 2017, MIRI pointed me towards a lot of work they’d posted on agentfoundations.org as being one of their major achievements for the year, especially this, this and this, which pose and then solve a problem about how to find game-theoretic agents that can stably model each other, formulated it as a topological fixed point problem. There is also a lot of other work on agentfoundations that seems interesting, I’m not entirely sure how to think about giving credit for these. These seem more like ‘work in progress’ than finished work—for most organisations I am only giving credit for the latter. MIRI could with some justification respond that the standard academic process is very inefficient, and part of their reason for existence is to do things that universities cannot. However, even if you de-prioritise peer review, I still think it is important to write things up into papers. Otherwise it is extremely hard for outsiders to evaluate—bad both for potential funders and for people wishing to enter the field. Unfortunately it is possible that, if they continue on this route, MIRI might produce a lot of valuable work that is increasingly illegible from the outside. So overall I think I consider these as evidence that MIRI is continuing to actually do research, but will wait until they’re ArXived to actually review them. If you disagree with this approach, MIRI is going to look much more productive, and their research possibility accelerating in 2017 vs 2016. If you instead only look at published papers, 2017 appears to be something of a ‘down year’ after 2016.
Last year I was not keen to see that Eliezer was spending a lot of time producing content on Arbital as part of his job at MIRI, as there was a clear conflict of interest—he was a significant shareholder in Arbital, and additionally I expected Arbital to fail. Now that Arbital does seem to have indeed failed, I’m pleased he seems to be spending less time on it, but confused why he is spending any time at all on it—though some of this seems to be cross-posted from elsewhere.
Eliezer’s book Inadequate Equilibria, however, does seem to be high quality—basically another sequence—though only relevant inasmuch as AI safety might be one of many applications of the subject of the book. I also encourage readers to also read this excellent article by Greg Lewis (FHI) on the other side.
I also enjoyed There’s No Fire Alarm for Artificial General Intelligence, which although accessible to the layman I think provided a convincing case that, even when AGI is imminent, there would (/might be) no signal that this was the case, and his socratic security dialogs on the mindset required to develop a secure AI.
I was sorry to hear Jessica Taylor left MIRI, as I thought she did good work.
MIRI spent roughly $1.9m in 2017, and aim to rapidly increase this to $3.5m in 2019, to fund new researchers and their new engineering team.
The Open Philanthropy Project awarded MIRI a $3.75m grant (over 3 years) earlier this year, largely because one reviewer was impressed with their work on Logical Induction. You may recall this was a significant part of why I endorsed MIRI last year. However, as this review is focused on work in the last twelve months, they don’t get credit for the same work two years running! OPP have said they plan to fund roughly half of MIRI’s budget. On the positive side, one might argue this was essentially a 1:1 match on donations to MIRI—but there are clearly game-theoretic problems here. Additionally, if you had faith in OpenPhil’s process, you might consider this a positive signal of MIRI quality. On the other hand, if you think MIRI’s marginal cost-effectiveness is diminishing over the multi-million dollar range, this might reduce your estimate of the cost-effectiveness of the marginal dollar.
There is also $1m of somewhat plausibly counterfactually valid donation matching available for MIRI (but not other AI Xrisk organisations).
Finally, I will note that MIRI are have been very generous with their time in helping me understand what they are doing.
The Future of Humanity Institute (FHI)
Oxford’s FHI requested not to be included in this analysis, so I won’t be making any comment on whether or not they are a good place to fund. Had they not declined (and depending on their funding situation) they would have been a strong candidate. This was disappointing to me, because they seem to have produced an impressive list of publications this year, including a lot of collaborations. I’ll briefly note two some pieces of research they published this year, but regret not being able to give them better coverage.
Saunders et al. published Trial without Error: Towards Safe Reinforcement Learning via Human Intervention, a nice paper where they attempt to make a Reinforcement Learner that can ‘safely’ learn by training a catastrophe-recognition algorithm to oversee the training. It’s a cute idea, and a nice use of the OpenAI Atari suite, though I was most impressed with the fact that they concluded that their approach would not scale (i.e. would not work). It’s not often researchers publish negative results!
Honourable mention also goes to the very cool (but aren’t all his papers?) Sandberg et al. That is not dead which can eternal lie: the aestivation hypothesis for resolving Fermi’s paradox, which is relevant inasmuch as it suggests that the Fermi Paradox is not actually evidence against AI as an existential risk.
FHI’s Brundage Bot apparently reads every ML paper ever written.
Global Catastrophic Risks Institute (GCRI)
The Global Catastrophic Risks Institute is run by Seth Baum and Tony Barrett. They have produced work on a variety of existential risks, including non-AI risks. Some of this work seems quite valuable, especially Denkenberger’s Feeding Everyone No Matter What on ensuring food supply in the event of disaster, and is probably probably of interest to the sort of person who would read this document. However, they are off-topic for us here. Within AI they do a lot of work on the strategic landscape, and are very prolific.
Baum’s Survey of Artificial General Intelligence Projects for Ethics, Risk, and Policy attempts to analyse all existing AGI research projects. This is a huge project and I laud him for it. I don’t know how much here is news to people who are very plugged in, but to me at least it was very informative. The one criticism I would have is it could do more to try to differentiate on capacity/credibility—e.g. my impression is Deepmind is dramatically more capable than many of the smaller organisations listed—but that is clearly a very difficult ask. It’s hard for me to judge the accuracy, but I didn’t notice any mistakes (beyond being surprised that AIXI has an ‘unspecified’ for safety engagement, given the amount of AI safety papers coming out of ANU.)
Baum’s Social Choice Ethics in Artificial Intelligence argues that value-learning type approaches to AI ethics (like CEV ) contain many degrees of freedom for the programmers to finesse it to pick their values, making them no better than the programmers simply choosing an ethical system directly. The programmers can choose whose values are used for learning, how they are measured, and how they are aggregated. Overall I’m not fully convinced—for example, pace the argument on page 3, a Law of Large Numbers argument could support averaging many views to get at the true ethics even if we had no way of independently verifying the true ethics. And there is some irony that, for all the paper’s concern with bias risk, the left-wing views of the author come through strongly. But despite these I liked the paper, especially for the discussion of who has standing—something that seems like it will need a philosophical solution, rather than a ML one.
Barrett’s Value of Global Catastrophic Risk (GCR) Information: Cost-Effectiveness-Based Approach for GCR Reduction covers a lot of familiar ground, and then attempts to do some monte carlo cost-benefit analysis on the a small number of interventions to help address nuclear war and comet impact. After putting a lot of thought into setting up the machinery, it would have been good to see analysis of a wider range of risks!
Baum & Barrett published Global Catastrophes: The Most Extreme Risks, which seems to be essentially a reasonably well argued general introduction to the subject of existential risks. Hopefully people who bought the book for other reasons will read it and become convinced.
Baum & Barrett’s Towards an Integrated Assessment of Global Catastrophic Risk is a similar introductory piece on catastrophic risks, but the venue—a colloquium on catastrophic risks—seems less useful, as people reading it are more likely to already be concerned about the subject, and I don’t think it spends enough time on AI risk per se to convince those who were already worried about Xrisk but not AI Xrisk.
Last year I was (and still am) impressed by their paper On the Promotion of Safe and Socially Beneficial Artificial Intelligence, which made insightful, convincing and actionable criticisms of ‘AI arms race’ language. I was less convinced by this year’s Reconciliation Between Factions Focused on Near-Term and Long-Term Artificial Intelligence, which argues for a re-alignment away from near-term AI worries vs long-term AI worries towards AI worriers vs non-worriers. However, I’m not sure why anyone would agree to this—long-term worriers don’t currently spend much time arguing against short-term worries (even if you thought that AI discrimination arguments were orwellian, why bother arguing about it?), and convincing short-term worriers to stop criticise long-term worries seems approximately as hard as simply convincing them to become long-term worriers.
GCRI spent approximately $117k in 2017, which is shockingly low considering their productivity. This was lower than 2016; apparently their grants from the US Dept. of Homeland Security came to an end.
The Center for the Study of Existential Risk (CSER)
CSER is an existential risk focused group located in Cambridge. Like GCRI they do work on a variety of issues, notably including Rees’ work on infrastructure resilience.
Last year I criticised them for not having produced any online research over several years; they now have a separate page that does list some but maybe not all of their research.
Liu, a CSER researcher, wrote The Sure-Thing principle and P2 and was second author on Gaifman & Liu’s A simpler and more realistic subjective decision theory, both on the mathematical foundations of bayesian decision theory, which is a valuable topic for AI safety in general. Strangely neither paper mentioned CSER as a financial supporter of the paper or affiliation.
Liu and Price’s Heart of DARCness argues that agents do not have credences for what they will do while deciding whether to do it—their confidence is temporarily undefined. I was not convinced—even someone is deciding whether she’s 75% confident or 50% confident, presumably there are some odds that determine which side in a bet she’d take if forced to choose? I’m also not sure of the direct link to AI safety.
They’ve also convened and attended workshops on AI and decision theory, notably the AI & Society Symposium in Japan, but in general I am wary of giving organisations credit for these, as they are too hard for the outside observer to judge, and ideally workshops lead to produce papers—in which case we can judge those.
CSER also did a significant amount of outreach, including presenting to the House of Lords, and apparently have expertise in Chinese outreach (multiple native mandarin speakers), which could be important, given China’s AI research but cultural separation from the west.
They are undertaking a novel publicity effort that I won’t name as I’m not sure it’s public yet. In general I think most paths to success involve consensus-building among mainstream ML researchers, and ‘popular’ efforts risk harming our credibility, so I am not optimistic here.
Their annual budget is around $750,000, with I estimate a bit less than half going on AI risk . Apparently they need to raise funds to continue existing once their current grants run out in 2019.
AI Impacts
AI Impacts is a small group that does high-level strategy work, especially on AI timelines, somewhat associated with MIRI.
They seem to have produced significantly more this year than last year. The main achievement is the When will AI exceed Human Performance? Evidence from AI Experts, which asked gathered the opinions of hundreds of AI researchers on AI timelines questions. There were some pretty relevant takeaways, like that most researchers find the AI Catastrophic Risk argument somewhat plausible, but doubt there is anything that can usefully be done in the short term, or that asian researchers think human-level AI is significantly closer than americans do. I think the value-prop here is twofold: firstly, providing a source of timeline estimates for when we make decisions that hinge on how long we have, and secondly, to prove that concern about AI risk is a respectable, mainstream position. It was apparently one of the most discussed papers of 2017.
On a similar note they also have data on improvements in a number of AI-related benchmarks, like computing costs or algorithmic progress.
John Salvatier (member of AI Impacts at the time) was also second author on Agent-Agnostic Human-in-the-Loop Reinforcement Learning, along with Evans (FHI, 4th author), which attempts to design an interface for reinforcement learning that abstracts away from the agent, so you could easily change the underlying agent.
AI Impacts’ budget is tiny compared to most of the other organisations listed here; around $60k at present. Incremental funds would apparently be spent on hiring more part-time researchers.
Center for Human-Compatible AI (CFHCA)
The Center for Human-Compatible AI, founded by Stuart Russell in Berkeley, launched in August 2016. As they are not looking for more funding at the moment I will only briefly survey some of they work on cooperative inverse reinforcement learning.
Hadfield-Menel et al’s The Off-Switch Game is a nice paper that produces and formalises the (at least now I’ve read it) very intuitive result that a value-learning AI might be corrigible (at least in some instances) because it takes the fact that a human pressed the off-switch as evidence that this is the best thing to do.
Milli et al’s Should Robots be Obedient is in the same vein as Hadfield-Menel et al’s Cooperative Inverse Reinforcement Learning (last year) on learning values from humans, specifically touching on whether such agents would be willing to obey a command to ‘turn off’, as per Soares’s paper on Corrigibility. She does some interesting analysis about the trade-off between obedience and results in cases where humans are fallible.
In both cases I thought the papers were thoughtful and had good analysis. However, I don’t think either is convincing in showing that corrigibility comes ‘naturally’ - at least not the strength of corrigibility we need.
I encourage them to keep their website more up-to-date.
Overall I think their research is good and their team promising. However, apparently they have enough funding for now, so I won’t be donating this year. If this changed and they requested incremental capital I could certainly imagine funding them in future years.
Other related organisations
The Center for Applied Rationality (CFAR) works on trying to improve human rationality, especially with the aim of helping with AI Xrisk efforts.
The Future of Life Institute (FLI) ran a huge grant-making program to try to seed the field of AI safety research. There definitely seem to be a lot more academics working on the problem now, but it’s hard to tell how much to attribute to FLI.
Eighty Thousand Hours (80K) provide career advice, with AI safety being one of their key cause areas.
Related Work by other parties
Deep Reinforcement Learning from Human Preferences, was possibly my favourite paper of the year, which possibly shouldn’t come as a surprise, given that two of the authors (Christiano and Amodei from OpenAI ) were authors on last year’s Concrete Problems in AI Safety. It applies ideas on bootstrapping that Christiano has been discussing for a while—getting humans to train an AI which then trains another AI etc. The model performs significantly better than I would have expected, and as ever I’m pleased to see OpenAI—Deepmind collaboration.
Christiano continues to produce very interesting content on his blog, like this on Corrigibility. When I first read his articles about how to bootstrap safety through iterative training procedures, my reactions was that, while this seemed an interesting idea, it didn’t seem to have much in common with mainstream ML. However, there do seem to be a bunch of practical papers about imitation learning now. I’m not sure if this was always the case, and I was just ignorant, or if they have become more prominent in the last year. Either way, I have updated towards considering this approach to be a promising one for integrating safety into mainstream ML work. He has also written a nice blog post explaining how AlphaZero works, and arguing that this supports his enhancement ideas.
It was also nice to see ~95 papers that were addressing Amodei et al’s call in last year’s Concrete Problems.
Menda et al’s DropoutDAgger paper on safe exploration seems to fit in this category. Basically they come up with a form of imitation learning where the AI being trained can explore a bit, but isn’t allowed to stray too far from the expert policy—though I’m not sure why they always have the learner explore in the direction it thinks is best, rather than assigning some weight to its uncertainty of outcome, explore-exploit-style. I’m not sure how much credit Amodei et al can get for inspiring this though, as it seems to be (to a significant degree) an extension of Zhang and Cho’s Query-Efficient Imitation Learning for End-to-End Autonomous Driving.
However, I don’t want to give too much credit for work that improves ‘local’ safety that doesn’t also address the big problems in AI safety, because this work probably accelerates unsafe human-level AI. There are many papers in this category, but for obvious reasons I won’t call them out.
Gan’s Self-Regulating Artificial General Intelligence contains some nice economic formalism around AIs seizing power from humans, and raises the interesting argument that if you need specialist AIs to achieve things, the first human-level AIs might not exhibit takeoff behaviour because they would be unable to sufficiently trust the power-seizing agents they would need to create. I’m sceptical that this assumption about the need for specialised AIs holds—surely even if you need to make separate AI agents for different tasks, rather than integrating them, it would suffice to give them specialised capabilities and but the same goals. Regardless, the paper does suggest the interesting possibility that humanity might make an AI which is intelligent enough to realise it cannot solve the alignment problem to safely self-improve… and hence progress stops there—though of course this would not be something to rely on.
MacFie’s Plausibility and Probability in Deductive Reasoning also addresses the issue of how to assign probabilities to logical statements, in a similar vein to much MIRI research.
Vamplew et al’s Human-aligned artificial intelligence is a multiobjective problem argues that we should consider a broader class of functions than linear sums when combining utility functions.
Google Deepmind continue to churn out impressive research, some of which seems relevant to the problem, like Sunehag et al’s Value-Decomposition Networks For Cooperative Multi-Agent Learning and Danihelka, et al’s Comparison of Maximum Likelihood and GAN-based training of Real NVPs on avoiding overfitting.
In terms of predicting AI timelines, another piece I found interesting was Gupta et al.’s Revisiting the Unreasonable Effectiveness of Data, which argued that, for vision tasks at least, performance improved logarithmically in sample size.
The Foresight Institute published a white paper on the general subject of AI policy and risk.
Stanford’s One Hundred Year Study on Artificial Intelligence produced an AI Index report, which is basically a report on progress in the field up to 2016. Interestingly various metrics they tracked, summarised in their ‘Vibrancy’ metric, suggest that the field actually regressed in 2016, through my experience with similar data in the financial world leaves me rather sceptical of such methodology. Unfortunately the report dedicated only a single word to the subject of AI safety.
On a lighter note, the esteemed G.K. Chesterton returned from beyond the grave to eviscerate an AI risk doubter, and a group of researchers (some FHI) proved that it is impossible to create a machine larger than a human, so that’s a relief.
Other major developments this year
Google’s Deepmind produced AlphaZero, which learnt how to beat the best AIs (and hence also the best humans) at Go, Chess and Shogi with just a few hours of self-play.
Creation of the EA funds, including the Long-Term Future Fund, run by Nick Beckstead, which has made one smallish grant related to AI Safety, conserved the other 96%.
The Open Philanthropy Project funded both MIRI and OpenAI (acquiring a board seat in the process with the latter).
Nvidia (who make GPUs used for ML) saw their share price approximately doubl, after quadrupling last year.
Hillary Clinton was possibly concerned about AI risk? But unfortunately Putin seems to have less helpful concerns about an AI Arms race… namely ensuring that he wins it. And China announced a national plan for AI with chinese characteristics—but bear in mind they have failed at these before, like their push into Semiconductors, though companies like Baidu do seem to be doing impressive research.
There were some papers suggesting the replication crisis may be coming to ML?
Conclusion
In some ways this has been a great year. My impression is that the cause of AI safety has become increasingly mainstream, with a lot of researchers unaffiliated with the above organisations working at least tangentially on it.
However, it’s tough from the point of view of an external donor. Some of the organisations doing the best work are well funded. Others (MIRI) seem to be doing a lot of good work but (perhaps necessarily) it is significantly harder for outsiders to judge than last year, as there doesn’t seem to be a really heavy-hitting paper like there was last year. I see MIRI’s work as being a long-shot bet that their specific view of the strategic landscape is correct, but given this they’re basically irreplaceable. GCRI and CSER’s work is more mainstream in this regard, but GCRI’s productivity is especially noteworthy, given the order of magnitude of difference in budget size.
As I have once again failed to reduce charity selection to a science, I’ve instead attempted to subjectively weigh the productivity of the different organisations against the resources they used to generate that output, and donate accordingly.
My constant wish is to promote a lively intellect and independent decision-making among my readers; hopefully my laying out the facts as I see them above will prove helpful to some readers. Here is my eventual decision, rot13′d so you can do come to your own conclusions first if you wish:
Fvtavsvpnag qbangvbaf gb gur Znpuvar Vagryyvtrapr Erfrnepu Vafgvghgr naq gur Tybony Pngnfgebcuvp Evfxf Vafgvghgr. N zhpu fznyyre bar gb NV Vzcnpgf.
However I wish to emphasis that all the above organisations seem to be doing good work on the most important issue facing mankind. It is the nature of making decisions under scarcity that we must prioritize some over others, and I hope that all organisations will understand that this necessarily involves negative comparisons at times.
Thanks for reading this far; hopefully you found it useful. Someone suggested that, instead of doing this annually, I should instead make a blog where I provide some analysis of AI-risk related events as they occur. Presumably there would still be an annual giving-season writeup like this one. If you’d find this useful, please let me know.
Disclosures
I was a Summer Fellow at MIRI back when it was SIAI, volunteered very briefly at GWWC (part of CEA) and once applied for a job at FHI. I am personal friends with people at MIRI, FHI, CSER, CFHCA and AI Impacts but not GCRI (so if you’re worried about bias you should overweight them… though it also means I have less direct knowledge). However I have no financial ties beyond being a donor and have never been romantically involved with anyone who has ever been at any of the organisations.
I shared a draft of the relevant sections of this document with representatives of MIRI, CSER and GCRI and AI Impacts. I’m very grateful for Alex Flint and Jess Riedel for helping review a draft of this document. Any remaining inadequacies and mistakes are my own.
Edited 2017-12-21: Spelling mistakes, corrected Amodei’s affiliation.
Edited 2017-12-24: Minor correction to CSER numbers.
Bibliography
Adam D. Cobb, Andrew Markham, Stephen J. Roberts; Learning from lions: inferring the utility of agents from their trajectories; https://arxiv.org/abs/1709.02357
Alexei Andreev; What’s up with Arbital; http://lesswrong.com/r/discussion/lw/otq/whats_up_with_arbital/
Allison Duettmann; Artificial General Intelligence: Timeframes & Policy White Paper; https://foresight.org/publications/AGI-Timeframes&PolicyWhitePaper.pdf
Anders Sandberg, Stuart Armstrong, Milan Cirkovic; That is not dead which can eternal lie: the aestivation hypothesis for resolving Fermi’s paradox; https://arxiv.org/pdf/1705.03394.pdf
Andrew Critch, Stuart Russell; Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making; https://arxiv.org/abs/1711.00363
Andrew Critch; Toward Negotible Reinforcement Learning: Shifting Priorities in Pareto Optimal Sequential Decision-Making; https://arxiv.org/abs/1701.01302
Andrew MacFie; Plausibility and Probability in Deductive Reasoning; https://arxiv.org/pdf/1708.09032.pdf
Assaf Arbelle, Tammy Riklin Raviv; Microscopy Cell Segmentation via Adversarial Neural Networks; https://arxiv.org/abs/1709.05860
Ben Garfinkel, Miles Brundage, Daniel Filan, Carrick Flynn, Jelena Luketina, Michael Page, Anders Sandberg, Andrew Snyder-Beattie, and Max Tegmark; On the Impossibility of Supersized Machines; https://arxiv.org/pdf/1703.10987.pdf
Chelsea Finn, Tianhe Yu, Tianhao Zhang, Pieter Abbeel, Sergey Levine; One-Shot Visual Imitation Learning via Meta-Learning; https://arxiv.org/abs/1709.04905
Chen Sun, Abhinav Shrivastava Saurabh Singh, Abhinav Gupta; Revisiting Unreasonable Effectiveness of Data in Deep Learning Era; https://arxiv.org/pdf/1707.02968.pdf
Chih-Hong Cheng, Frederik Diehl, Yassine Hamza, Gereon Hinz, Georg Nuhrenberg, Markus Rickert, Harald Ruess, Michael Troung-Le; Neural Networks for Safety-Critical Applications—Challenges, Experiments and Perspectives; https://arxiv.org/pdf/1709.00911.pdf
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané; Concrete Problems in AI Safety; https://arxiv.org/abs/1606.06565
David Abel, John Salvatier, Andreas Stuhlmüller, Owain Evans; Agent-Agnostic Human-in-the-Loop Reinforcement Learning; https://arxiv.org/abs/1701.04079
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell; The Off-Switch Game; https://arxiv.org/pdf/1611.08219.pdf
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell; Cooperative Inverse Reinforcement Learning; https://arxiv.org/abs/1606.03137
Eliezer Yudkowsky and Nate Soares; Functional Decision Theory: A New Theory of Instrumental Rationality; https://arxiv.org/abs/1710.05060
Eliezer Yudkowsky; A reply to Francois Chollet on intelligence exposion; https://intelligence.org/2017/12/06/chollet/
Eliezer Yudkowsky; Coherant Extrapolated Volition; https://intelligence.org/files/CEV.pdf
Eliezer Yudkowsky; Inadequate Equilibria; https://www.amazon.com/dp/B076Z64CPG
Eliezer Yudkowsky; There’s No Fire Alarm for Artificial General Intelligence; https://intelligence.org/2017/10/13/fire-alarm/
Filipe Rodrigues, Francisco Pereira; Deep learning from crowds; https://arxiv.org/abs/1709.01779
Greg Lewis; In Defense of Epistemic Modesty; http://effective-altruism.com/ea/1g7/in_defence_of_epistemic_modesty/
Haim Gaifman and Yang Liu; A simpler and more realistic subjective decision theory; https://link.springer.com/article/10.1007%2Fs11229-017-1594-6
Harsanyi; Cardinal welfare, individualistic ethics, and interpersonal comparisons of utility; http://www.springer.com/us/book/9789027711861
Ivo Danihelka, Balaji Lakshminarayanan, Benigno Uria, Daan Wierstra, Peter Dayan; Comparison of Maximum Likelihood and GAN-based training of Real NVPs; https://arxiv.org/pdf/1705.05263.pdf
Jiakai Zhang, Kyunghyun Cho; Query-Efficient Imitation Learning for End-to-End Autonomous Driving; https://arxiv.org/abs/1605.06450
Joshua Gans; Self-Regulating Artificial General Intelligence; https://arxiv.org/pdf/1711.04309.pdf
Katja Grace, John Salvatier, Allan Dafoe, Baobao Zhang, Owain Evans; When will AI exceed Human Performance? Evidence from AI Experts; https://arxiv.org/abs/1705.08807
Kavosh Asadi, Cameron Allen, Melrose Roderick, Abdel-rahman Mohamed, George Konidaris, Michael Littman; Mean Actor Critic; https://arxiv.org/abs/1709.00503
Kunal Menda, Katherine Driggs-Campbell, Mykel J. Kochenderfer; DropoutDAgger: A Bayesian Approach to Safe Imitation Learning; https://arxiv.org/abs/1709.06166
Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, Olivier Bousquet; Are GANs Created Equal? A Large-Scale Study; https://arxiv.org/abs/1711.10337
Martin Rees; “Black Sky” Infrastructure and Societal Resilience Workshop; https://www.cser.ac.uk/media/uploads/files/Black-Sky-Workshop-at-the-Royal-Society-Jan.-20171.pdf
Mile Brundage; Brundage Bot; https://twitter.com/BrundageBot
Minghai Qin, Chao Sun, Dejan Vucinic; Robustness of Neural Networks against Storage Media Errors; https://arxiv.org/abs/1709.06173
Myself; 2017 AI Risk Literature Review and Charity Evaluation; http://effective-altruism.com/ea/14w/2017_ai_risk_literature_review_and_charity/
Nate Soares and Benja Fallenstein; Towards Idealized Decision Theory; https://arxiv.org/pdf/1507.01986.pdf
Nate Soares and Benjamin Levinstein; Cheating Death in Damascus; https://intelligence.org/files/DeathInDamascus.pdf
Nates Soares, Benja Fallenstein, Eliezer Yudkowsky, Stuart Armstrong; Corrigibility; https://intelligence.org/files/Corrigibility.pdf
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei; Deep Reinforcement Learning from Human Preferences; https://arxiv.org/abs/1706.03741
Paul Christiano; AlphaGo Zero and capability amplification; https://ai-alignment.com/alphago-zero-and-capability-amplification-ede767bb8446
Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, David Meger; Deep Reinforcement Learning that Matters; https://arxiv.org/abs/1709.06560
Peter Stone, Rodney Brooks, Erik Brynjolfsson, Ryan Calo, Oren Etzioni, Greg Hager, Julia Hirschberg, Shivaram Kalyanakrishnan, Ece Kamar, Sarit Kraus, Kevin Leyton-Brown, David Parkes, William Press, AnnaLee Saxenian, Julie Shah, Milind Tambe, Astro Teller.; One Hundred Year Study on Artificial Intelligence; https://ai100.stanford.edu/
Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel; Value-Decomposition Networks For Cooperative Multi-Agent Learning; https://arxiv.org/pdf/1706.05296.pdf
Peter Vamplew, Richard Dazeley, Cameron Foale, Sally Firmin, Jane Mummery; Human-aligned artificial intelligence is a multiobjective problem; https://link.springer.com/article/10.1007/s10676-017-9440-6
Ryan Carey; Incorrigibility in the CIRL Framework; https://arxiv.org/abs/1709.06275
Samuel Yeom, Matt Fredrikson, Somesh Jha; The Unintended Consequences of Overfitting: Training Data Inference Attacks; https://arxiv.org/abs/1709.01604
Scott Alexander; G.K. Chesterton on AI Risk; http://slatestarcodex.com/2017/04/01/g-k-chesterton-on-ai-risk/
Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares, Jessica Taylor; A Formal Approach to the Problem of Logical Non-Omniscience; https://arxiv.org/abs/1707.08747
Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares, Jessica Taylor; Logical Induction; http://arxiv.org/abs/1609.03543
Seth Baum and Tony Barrett; Global Catastrophes: The Most Extreme Risks; http://sethbaum.com/ac/2018_Extreme.pdf
Seth Baum and Tony Barrett; Towards an Integrated Assessment of Global Catastrophic Risk ; https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3046816
Seth Baum; On the Promotion of Safe and Socially Beneficial Artificial Intelligence; https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2816323
Seth Baum; Reconciliation Between Factions Focused on Near-Term and Long-Term Artificial Intelligence; https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2976444
Seth Baum; Social Choice Ethics in Artificial Intelligence; https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3046725
Seth Baum; Survey of Artificial General Intelligence Projects for Ethics, Risk, and Policy; https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3070741
Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, Stuart Russell; Should Robots be Obedient; https://arxiv.org/pdf/1705.09990.pdf
Tony Barrett; Value of Global Catastrophic Risk (GCR) Information: Cost-Effectiveness-Based Approach for GCR Reduction; https://www.dropbox.com/s/7a7eh2law7tbvk0/2017-barrett.pdf?dl=0
Vadim Kosoy; Optimal Polynomial-Time Estimators: A Bayesian Notion of Approximation Algorithm; https://arxiv.org/abs/1608.04112
Victor Shih, David C Jangraw, Paul Sajda, Sameer Saproo; Towards personalized human AI interaction—adapting the behavior of AI agents using neural signatures of subjective interest; https://arxiv.org/abs/1709.04574
William Saunders, Girish Sastry, Andreas Stuhlmueller, Owain Evans; Trial without Error: Towards Safe Reinforcement Learning via Human Intervention; https://arxiv.org/abs/1707.05173
Xiongzhao Wang, Varuna De Silva, Ahmet Kondoz; Agent-based Learning for Driving Policy Learning in Connected and Autonomous Vehicles; https://arxiv.org/abs/1709.04622
Yang Liu and Huw Price; Heart of DARCness; http://yliu.net/wp-content/uploads/darcness.pdf
Yang Liu; The Sure-Thing principle and P2; http://www.academia.edu/33992500/The_Sure-thing_Principle_and_P2
Yunpeng Pan, Ching-An Cheng, Kamil Saigol, Keuntaek Lee, Xinyan Yan, Evangelos Theodorou, Byron Boots; Agile Off-Road Autonomous Driving Using End-to-End Deep Imitation Learning; https://arxiv.org/abs/1709.07174