Safety Culture and the Marginal Effect of a Dollar
We spent an evening at last week’s Rationality Minicamp brainstorming strategies for reducing existential risk from Unfriendly AI, and for estimating their marginal benefit-per-dollar. To summarize the issue briefly, there is a lot of research into artificial general intelligence (AGI) going on, but very few AI researchers take safety seriously; if someone succeeds in making an AGI, but they don’t take safety seriously or they aren’t careful enough, then it might become very powerful very quickly and be a threat to humanity. The best way to prevent this from happening is to promote a safety culture—that is, to convince as many artificial intelligence researchers as possible to think about safety so that if they make a breakthrough, they won’t do something stupid.
We came up with a concrete (albeit greatly oversimplified) model which suggests that the marginal reduction in existential risk per dollar, when pursuing this strategy, is extremely high. The model is this: assume that if an AI is created, it’s because one researcher, chosen at random from the pool of all researchers, has the key insight; and humanity survives if and only if that researcher is careful and takes safety seriously. In this model, the goal is to convince as many researchers as possible to take safety seriously. So the question is: how many researchers can we convince, per dollar? Some people are very easy to convince—some blog posts are enough. Those people are convinced already. Some people are very hard to convince—they won’t take safety seriously unless someone who really cares about it will be their friend for years. In between, there are a lot of people who are currently unconvinced, but would be convinced if there were lots of good research papers about safety in machine learning and computer science journals, by lots of different authors.
Right now, those articles don’t exist; we need to write them. And it turns out that neither the Singularity Institute nor any other organization has the resources—staff, expertise, and money to hire grad students—to produce very much research or to substantially alter the research culture. We are very far from the realm of diminishing returns. Let’s make this model quantitative.
Let A be the probability that an AI will be created; let R the fraction of researchers that would be convinced to take safety seriously if there were a 100 good papers in about it in the right journals; and let C be the cost of one really good research paper. Then the marginal reduction in existential risk per dollar is A*R/100*C. The total cost of a grad student-year (including recruiting, management and other expenses) is about $100k. Estimate a 10% current AI risk, and estimate that 30% of researchers currently don’t take safety seriously but would be convinced. That gives is a marginal existential risk reduction per dollar of 0.1*0.3/100*100k = 3*10^-9. Counting only the ~7 billion people alive today, and not any of the people who will be born in the future, this comes to a little over two expected lives saved per dollar.
That’s huge. Enormous. So enormous that I’m instantly suspicious of the model, actually, so let’s take note of some of the things it leaves out. First, the “one researcher at random determines the fate of humanity” part glosses over the fact that research is done in groups; but it’s not clear whether adding in this detail should make us adjust the estimate up or down. It ignores all the time we have between now and the creation of the first AI, during which a safety culture might arise without intervention; but it’s also easier to influence the culture now, while the field is still young, rather than later. In order for promoting AI research safety to not be an extraordinarily good deal for philanthropists, there would have to be at least an additional 10^3 penalty somewhere, and I can’t find one.
As a result of this calculation, I will be thinking and writing about AI safety, attempting to convince others of its importance, and, in the moderately probable event that I become very rich, donating money to the SIAI so that they can pay others to do the same.
It worries me a tad that nobody in the discussion group corrected what I consider to be the obvious basic inaccuracy of the model.
Success on FAI is not a magical result of a researcher caring about safety. The researcher who would have otherwise first created AGI does not gain the power to create FAI just by being concerned about it. They would have to develop a stably self-improving AI which learned an understandable goal system which actually did what they wanted. This could be a completely different set of design technologies than what would have gone into something unstable that improved itself by ad-hoc methods well enough to go FOOM and end the game. The researcher who would have otherwise created AGI might not be good enough to do this. The best you might be able to convince them to do would be to retire from the game. It’s a lot harder to convince someone to abandon the incredibly good idea they’re enthusiastic about, and start over from scratch or leave the game, then to persuade people to be “concerned about safety”, which is really cheap (you just put on a look of grave concern).
If I thought all you had to do to win was to convince the otherwise-first creator of AGI to be “take safety seriously”, this problem would be tremendously easier and I would be approaching it in a very different way. I’d be putting practically all of my efforts into PR and academia, not trying to assemble a team to solve basic FAI problems over however-many years and then afterward build FAI. A free win just for convincing someone to take something seriously? Hot damn, that’d be one easy planet to save; there’d be no point in pursuing any other avenue until you’d totally exhausted that one.
As it stands, though, you’re faced with (a) the much harder sell of convincing AGI people that they will destroy the world and that being concerned is not enough to save them, that they have to tackle much harder problems than they wanted to face on a problem that seems to them hard-enough-already; and (b) if you do convince the AGI person who otherwise would’ve destroyed the world, to join the good guys on a different problem or retire, you don’t win. The game isn’t won there. It’s just a question of how long it takes the next AGI person in line to destroy the world. If you convinced them? Number three. You keep dealing through the deck until you turn up the ace of spades, unless the people working on the ace of hearts can solve their more difficult problem before that happens.
All academic persuasion does is buy time, and not very much of that—the return on effort invested seems to be pretty low.
The main advantage of convincing mainstream AI people that FAI is a problem worth worrying about appears to be not that you will have mainstream AI people thinking twice before they build their AGI, but that you will then have mainstream AI people working on FAI. More people working on a given problem seems to make it massively more likely that the problem will be solved.
If there are rigorous arguments that FAI is worth worrying about, and that there are interesting questions about which people could be doing useful incremental research, then convincing people who work in universities to start doing this research has to be such a massive win than it would take something pretty huge to outweigh it—there are a lot of very clever people working in universities, massively more than will ever work at SingInst, and they already have a huge network in place to give them money to think about the things they find interesting.
Indeed, all of this was discussed at the time, and these complexities do indeed make the model produce an overestimate. However, I really don’t think think the difference is whole orders of magnitude, and this
is definitely wrong. While there is a great deal more that needs to be figured out in order for an AI to be friendly, much of it is research that academia could do, too, if only they thought it was worthwhile.
I plan to write an article about just what “being safety conscious” would mean, but it’s not “spending a few extra days on safety features before flipping the switch”, it’s more like handing the whole project over to friendliness researchers experts and taking advantage of whatever friendliness research has been done up to that point. Those experts and that research need to exist, but I don’t think those differences are on the margin of current existential risk reduction spending, since the limiting resource there isn’t money.
After reading Eliezer’s comment and yours, I now think the “30%” figure for “being safety conscious” needs unpacking. In particular I think there’s a tendency to picture the most safety conscious of the converts, and say the entire 30% looks like that, even though (for me at least) the intuitions which let 30% be plausible are based on researchers intellectually believing safety consciousness is very important rather than researchers taking actions as if safety consciousness is very important.
GuySrinivasan’s comment seems to suggest that the estimated marginal effect of a dollar could be at least 2 orders of magnitude smaller if additional considerations are taken into account. See “down by a factor of 10” and “10%-50% that being safety conscious works”.
I agree with you that we’re stuck in (arguably unpleasant) position of having to actually go ahead with the FAI as a project; still, academic persuasion might get you funds and some of the best brains for your project.
Safety-speed tradeoffs, the systematic bias in “one randomly selected researcher,” and AGI vs FAI difficulty were discussed at the time.
I think you can infer from GuySrinivasan’s comment that they did (unfortunately the evidence is presented in an overly cryptic way).
This comment seems to argue that “trying to assemble a team to solve basic FAI problems over however-many years and then afterward build FAI” is the real goal here and that “convincing someone to take something seriously” is barely worth thinking about. However, it certainly seems to me that convincing people to take the problem seriously is a productive (and perhaps an essential) first step toward assembling a team.
However, reading the subtext in that comment, it certainly appears that the real fear expressed here is that if safety consciousness should become endemic in the AGI community, there is a real risk that someone else might produce FAI before Eliezer.
That makes no sense. If safety consciousness means that the AGI community is likely to produce FAI before Eliezer, then without safety consciousness, the AGI community is even more likely to produce UFAI before Eliezer produces FAI. Either way, Eliezer gets scooped; but in the second case, we’re very dead.
That’s hardly charitable.
I read that judgement as somewhat uncharitable itself. In particular it is not reasonable to make your judgement of Perplexed reading subtext reading independently of what that reading is.
I disagree strongly, on several points.
That anyone would attempt to implement FAI with any definition similar to that of SIAI seems highly unlikely, regardless of safety concern.
That Eliezer would be upset if someone got it right before he did seems obviously absurd.
That there’s a fear of safety consciousness being too good, rather than safety consciousness being a farce put on for grant applications and PR purposes, makes no sense.
Finally, the tone of your post, and almost any other post you have regarding the topic of FAI, provokes responses in me which seem out of proportion to what they should be.
As explicit reasoning, yes, that would be absurd. But we are all primates, and the thought of being overshadowed feels bad on a subconscious level, even if that feeling disagrees with conscious beliefs. “I will do it right and anyone else who tries will do it wrong” is an unlikely thing to believe, but a likely thing to alieve.
I don’t think that the argument, that people can be smart enough to create an AGI, that can take over the universe in a matter of hours, can be dumb enough not to recognize the dangers posed by such an AGI, is very strong. To fortify that argument you would either have to show that the people working for SIAI are vastly more intelligent than most AGI researchers, in which case they would be more likely to build the first AGI, or that the creation of an AGI, that is capable of explosive recursive self-improvement, demands much less intelligence and insight than which is necessary to recognize risks from AI.
I deem that to be very unlikely as well. But given the scope of the project, and human nature, it should be taken into account. Not only that, but also that it’s a giant scam. Because if that is the case, even at a probability as low as 0.1%, valuable resources would be wasted that could be used to mitigate other existential risks, or being used by someone who follows selfish motives.
It is very easy to dispel any such doubts, all he would have to do is publish some technical paper that manages to survive peer-review, thereby substantiate his claims and prove that he is qualified.
You say that as if it’s worse than other ways that money could go to waste.
(There are good game-theoretic reasons to act sort of as if you thought that, but they should be made explicit, and explicit consideration of them probably wasn’t what motivated your statement.)
You seem to have the idea that this is all about Eliezer Yudkowsky. In actual fact, he wasn’t at the meeting where we came up with the model I described in this article, he’s influential but doesn’t control SIAI, and the existential risk issue is bigger than SIAI and a lot bigger than any one person. Most of the people involved think AI risk is important based on their own reasoning, not based on trusting Eliezer. Personally, I don’t really care whether he’s qualified, because I consider myself qualified enough to judge his arguments (or anonymous arguments) directly. What may be throwing you off is that he’s extremely visible—he’s the public face of SIAI to a lot of people—because he’s a prolific writer, and because he optimizes his writing to get lots of people to read it.
Journals are actually very bad for getting read by non-specialists, and Eliezer’s specialized his writing skill for presenting to smart laymen, rather than academics. Nevertheless, other authors have written and published papers about AI risk have been published. The issue at hand right now is getting into prestigious machine learning and computer science journals, rather than philosophy journals, so that the right specialists will read them. That’s much more difficult, because their editors think of them as having narrow topics that don’t include philosophy or futurism.
As someone who is still acquiring a basic education I have to rely on some amount of intuition and trust in peer-review. Here I give a lot of weight to actual, provable success, recognition, and substantial evidence in the form of a real world demonstration of intelligence and skill.
The Less Wrong sequences and upvotes by unknown and anonymous strangers are not enough to prove the expertise and intelligence that I consider necessary to lend enough support to such extraordinary ideas as the possibility of risks from artificial general intelligences undergoing explosive recursive self-improvement. At least not enough to disregard other risks that have been deemed important by a world-wide network of professionals with a track record of previous achievements.
I do not intent to be derogatory, but who are you, why would I trust your judgement or that of other people on Less Wrong? This is a blog on the Internet created by an organisation with a few papers that lack a lot of mathematical rigor and technical details.
What is bothering me is that I haven’t seen much evidence that he is qualified and intelligent enough to just believe him. People don’t even believe Roger Penrose when he is making up extraordinary claims outside his realm of expertise. And Roger Penrose achieved a lot more than Eliezer Yudkowsky and has demonstrated his superior intellect.
Rightly so.
Penrose has made a much bigger fool of himself in public—if that is what you mean.
IMO, a Yudkowsky is worth 10 Penroses—at least.
Machine intelligence will be an enormous deal. As to experts who think global warming is the important issue of the day—they are not doing that through genuine concern about the future. That is all about funding, and marketing, not the welfare of the planet. It is easier to put together grant applications in that area. Easier to build a popular movement. Easier to win votes. Environmentalist concerns signal greenenss, which is linked by many to goodness. Showing that you even care for trees, whales and the whole planet, shows you have a big heart.
The fact that global warming is actually irrelevant fluff is a side issue.
I give global warming as my number one example in my Bad Causes video.
One of the major messages which I think you should be picking up from the sequences is that it takes more than just intelligence to consistently separate good ideas from bad ones.
I have some scepticism about that point as well. We do have some relevant history relating to engineering disaasters. There have been lots of engineering projects in history, and we know roughly how many people died in accidents, and as a result of negligence, or were screwed over in other ways.
Engineers do sometimes fail. The Titanic. The Tacoma Narrows Bridge Collapse.
Then there’s all the people killed by cars and in coal mines. Society wants the benefits, and individuals pay the price. This effect seems much more significant than accidents to me—in terms of number of deaths.
However, I think that engineers have a reasonable record. In a historical enginnering project with lives at stake, one would certainly not expect failure—or claim that failure is “the default case”. The case goes the other way: high technology and machines cause humans to thrive.
Of course, reference class forecasting has limitations, but we should at least attempt to learn from this type of data.
I don’t consider this a response to my point. My point was that “concern for safety” is not well correlated with “ability to perform safely”. It’s very likely that many or all AGI researchers are aware of “risks” regarding the outcomes of their research. However, I consider it very unlikely that they will think deeply enough about the topic to come up with, or even start on, solutions such as Friendliness.
Why do these discussions constantly come down to the same people debating the same points? Because, as you said, there are no published technical papers such as those promised by last year’s donation drive. SIAI is operating internally and not revising their public information. Do you believe their thoughts have failed to change, on any detail, in the time since initial publication?
If I had an extraordinary idea related to a field of expertise that I am not part of, I would humbly request some of the experts to review it, before claiming that I know something that they don’t know, if I don’t even know if my idea makes sense.
Has this happened? All I know about are derogatory comments about mainstream AGI research, the academia and peer-review in general.
In case it has happened, it seems that the idea was not received positively. Does that mean that the idea is bogus? No. Does that mean that you should be particularly confident in your idea? No. It means that you should reassess it and gather or wait for more evidence before telling everyone that the world is going to end, create a whole movement around it, ask for money and advice people to neglect any other ideas, because everyone else is below your epistemic level.
Because nobody other than a school dropout like me cares to take a critical look at those points, points that haven’t been addressed enough to generate the slightest academic interest.
Did you see the coverage in recent versions of “AI: A Modern Approach”? Peter Norvig is an actual expert in artificial intelligence. The End of The World As We Know It even gets a mention!
Cool, I admit I have been wrong there and herewith retract that point.
Just to be clear: this model was drafted by a couple of mini-camp participants, not by the workshop as a whole, and isn’t advocated by the Singularity Institute. For example, when I do my own back-of-the-envelopes I don’t expect nearly a 30% increase in existential safety from convincing 30% of AI researchers that risk matters. Among other things, this is because there’s a distance between “realize risk matters” and “successfully avoid creating UFAI” (much less “create FAI”)”, since sanity and know-how also play roles in AI design; and partly because there are more players than just AI researchers.
Still, it is good to get explicit models out there where they can be critiqued—I just want to avoid folks having the impression that this is SingInst’s model, or that it was taught at minicamp.
I agree that there is a lot of room for more and better academic work on this topic to reduce existential risk (including other channels like more academic research into AI safety strategies, influence on other actors like large corporations and governments, etc), but as I said at the minicamp, I think the assumptions of this model systematically lead to overestimates of effectiveness of this channel (EDIT: and would lead to overestimates of other strategies as well, including the “FAI team in a basement” strategy as I mention in my comment below).
One of the primary reasons for concern about AI risk is the likelihood of tradeoffs between safety and speed of development. Commercial or military competition make it plausible that quite extensive tradeoffs along these lines will be made, so that reckless (or self-deceived) projects are more likely to succeed first than more cautious ones. So the “random selection” assumption disproportionately favors safety.
The assumption that safety-conscious researchers always succeed in making any AI they produce safe is also fairly heroic and a substantial upward bias. There may be some cheap and simple safety measures that any safety-conscious AI project can take without significant sacrifice, but we shouldn’t assign high probability to the problem being that easy. Also, if it turns out safety is so easy, why wouldn’t any group sophisticated enough to build AI start to take such precautions once it became apparent they were making real progress?
As folks discussed at the time this idea was first presented, if ‘concern for safety’ means halting a project with high risk to pursue a lower-risk design, then unless almost all researchers are affected this just leads to a modest expected delay until someone unconcerned succeeds.
What makes the SIAI team, that will be assembled, any different?
I think many of the same assumptions also lead to overestimates of the success odds of an SIAI team in creating safe AI. In general, some features that I would think conduce to safety and could differ across scenarios include:
Internal institutions and social epistemology of a project that makes it possible to slow down, or even double back, upon discovering a powerful but overly risky design, rather than automatically barreling ahead because of social inertia or releasing the data so that others do the same
The relative role of different inputs, like researchers of different ability levels, abundant computing hardware, neuroscience data, etc, in designing AI; with some patterns of input favoring higher understanding by designers of the likely behavior of their systems
Dispersion of project success, i.e. the longer a period after finding the basis of a design in which one can expect other projects not to reach the same point; the history of nuclear weapons suggests that this can be modestly large (nukes were developed by the first five powers in 1945, 1949, 1952, 1960, 1964) under some development scenarios, although near-simultaneous development is also common in science and technology
The type of AI technology: whole brain emulation looks like it could be relatively less difficult to control initially by solving social coordination problems, without developing new technology, while de novo AGI architectures may vary hugely in the difficulty of specifying decision algorithms with needed precision
Some shifts along these dimensions do seem plausible given sufficient resources and priority for safety (and suggest, to me, that there is a large spectrum of safety investments to be made beyond simply caring about).
Another factor to consider, the permeability of the team, how much they are likely to leak information to the outside world.
However if the teams are completely impermeable then it becomes hard for external entities to evaluate the other factors for evaluating the project.
Does SIAI have procedures/structures in place to shift funding between the internal team and more promising external teams if they happen to arise?
Most potential funding exists in the donor cloud, which can reallocate resources easily enough; SIAI does not have large reserves or an endowment that would be encumbered by the nonprofit status. Ensuring that the donor cloud is sophisticated and well-informed contributes to that flexibility, but I’m not sure what other procedures you were thinking about. Formal criteria to identify more promising outside work to recommend?
I think that might help. In this matter it all seems to be about trust.
People doing outside work have to trust that SIAI will look at their work and may be supportive. Without formal guidelines, they might suspect that their work will be judged subjectively and negatively due to potential conflict of interest due to funding.
SIAI also need to be trusted not to leak information from other projects as they evaluate them, having a formal vetted well known evaluation team might help with that.
The Donor cloud needs to trust SIAI to look at work and make a good decision about it, not just based on monkey instincts. Formal criteria might help instill that trust.
SIAI doesn’t need all this now as there aren’t any projects that need evaluating. However it is something to think about for the future.
I don’t think the SIAI has much experience writing code, or programming machine learning applications.
Superficially, that makes them less likley to know what they are doing, and more likely to make mistakes and screw up.
Eliezer’s FAI team currently consists of 2 people: himself and Marcello Herreshoff. Whatever its probability of success, most would seem to come from actually recruiting enough high-powered folk for a team. Certainly he thinks so, thus his focus on Overcoming Bias and then the rationality book as a tool to recruit a credible team.
Sure, ceteris paribus, although coding errors seem less likely than architectural screwups to result in catastrophic harm rather than the AI not working.
It’s hard for me to imagine 100 good papers on the subject of AI safety (as opposed to say, FAI design). Once you have 10 good papers with variations of “AGI is dangerous, please be careful!”, what can you say in the 11th one that you haven’t already said? Also, 100 papers all carrying the same basic message, all funded by the same organization… that seems a bit surreal.
ETA: Sorry, I’m being overly skeptical and nitpicking. On reflection I think something like this probably is a good idea and should be pursued (unless money is a constraint and someone can come up with better use for it).
ETA2: If someone has done serious thinking about the feasibility of convincing a substantial fraction of AGI researchers about the need for safety, by “publishing X good quality papers”, could they please explain their thoughts in more detail? (My mind keeps changing about whether this is feasible or not.)
There’s a lot to say at one layer remove—things like stability analyses of particular strategies for implementing goal systems, general safety measures such as fake network interfaces, friendliness analyses of hypothetical programs, and so on. A paper can impart the idea that safety is important, without being directly about safety. (In fact, there’s some reason to suspect that articles one layer removed may be better than articles that are directly about safety).
This seems right. One additional thing to note, however, is that while it looks quite likely that good papers lead to improvements at the margin, high-publicity bad work can harm a developing field’s prospects and reputation, and thus outsiders’ desire to affiliate with it. Robin Hanson emphasizes this point a lot.
Carl, are you saying that the non-SIAI-affiliated qualified academics among us should attempt to get high-publicity, bad papers published advocating anything-goes GAI design, without regard for safety?
No, for many reasons, including the following:
Such things are very likely to backfire, and moreso than they seem; we live in a world of substantial transparency, and dirty laundry gets found
Being the kind of people who would do such things would have bad effects and sabotage friendly cooperation with the very AI folk whose cooperation is so important
There is already a lot of stuff along these lines
Folk actually in a position to do such things would better use their limited time, reputation, and commitment on other projects
My impression is that the bridges are mostly burned there. For years, the SIAI has been campaigning against other projects, in the hope of denying them mindshare and funding.
We have Yudkowsky saying: “And if Novamente should ever cross the finish line, we all die.” and saying he will try to make various other AI projects “look merely stupid”.
I expect the SIAI looks to most others in the field like a secretive competing organisation, who likes to use negative marketing techniques. Implying that your rivals will destroy the world is an old marketing trick that takes us back to the Daisy Ad. This is not necessarily the kind of organisation one would want to affiliate with.
What is the status of the academic papers from the 2010 Singularity Research Challenge?
In general, there seems to have been substantial planning fallacy on the ease of getting skilled people to make progress on them via the Visiting Fellows program and other means. Versions of many of them have eventually come into being (as discussed below) but with great delays. And it seems that delivery of the planned reporting infrastructure failed badly. With respect to the individual papers:
.Containing superintelligence led to this paper which was accepted for a subsequently-cancelled conference and is now seeking a venue, as well as (I believe) an accepted Singularity Hypothesis chapter by Daniel Dewey.
The WBE-AGI one has lagged, but is a submission to the JCS special issue Chalmers’ Singularity paper (by myself and Anders Sandberg), with presentations of the content at FHI, San Diego State University, and the AGI-11 workshop on the future of AI.
Collective Action Problems and AI Risk led to another Singularity Hypothesis submission.
AI risk philanthropy was taken on by an external author who never delivered, and subsequently had to be transferred to a different person who hasn’t finished it yet.
There is an incarnation of the Singularity FAQ, and lukeprog, along with Anna Salamon, have custody of the landing pages project, with an academic one in place (although they are trading off against the minicamp/bootcamp timewise).
The Coherence of Human Goals led to this paper, at AGI-10.
The Visitors Grants were used.
The two papers at the top, submissions for the cancelled Minds and Machines issue funded before the Challenge, went into limbo after the cancellation.
Software Minds and Endogenous Growth led to a paper at ECAP-2010 which is under continued development but is not a journal article yet.
And then there have been various other non-Challenge papers, like Bostrom and Yudkowsky’s joint piece on AI ethics, my piece with Bostrom on inference from evolution to AI difficulty, etc.
Note that that one wasn’t actually funded. The ECAP paper is online here.
Right, it didn’t get earmarked donations, only two papers were specifically funded in the challenge grant. In general, mostly people weren’t interested in funding specific projects, and the challenge primarily went to general funds.
A human alone can’t build a superintelligence. So, companies and other organisations are what we should mostly be concerned with. Targetting the engineering talent with the message is probably the wrong approach—you mostly want the managers and directors, since they are more likely to be the ones who willl decide what the machine wants.
I think the low-hanging fruit in getting corporations to behave better is reputation systems—which I discuss here. Merely telling corporations what they are doing is risky seems unlikely to be very effective—corporations are just not that risk-averse.
Does anyone know of a historical example of a concerted effort to convince people in an academic discipline to pay attention to something, by funding a bunch of papers on or related to the topic?
If so, how well did it work?
I believe the tobacco companies tried this (and maybe they still do). How much difference it made I don’t know.
Were they trying to get people to pay attention to something that was neglected before? I thought they were just trying to sow confusion around the smoking-illness connection, which was already being studied.
As part of their efforts to kick up dirt around the smoking-illness link, they did fund some research to try building up fringe hypotheses (as opposed to knocking down mainstream hypotheses). They gave Hans Eysenck money to research the link between personality traits and cancer (with smoking as a possible mediator).
Surely the most existential-risk-reduction-per-buck at this point is not “thinking and writing about AI safety”, but thinking up more strategies like it in order to possibly find even better ones? Shouldn’t SIAI (or perhaps FHI, depending on the comparative advantage between them) fund and publish some sort of systematic search-and-comparison of existential risk reduction strategies in order to have high confidence that the strategies it ends up pursuing are the optimal ones?
ETA: To be more constructive, has anyone done a similar analysis for “pushing for world-wide safety regulations on AI research” or “spending money directly on building FAI”?
The number one point of comparison for safety regulations is the cryptography export regulations. I am pretty sceptical about something similar being attempted for machine intelligence. It is possible to imagine the export of smart robots to “bad” countries being banned—for fear that they will reverse-engineer their secrets—but not easy to imagine that anyone will bother. Machine intelligence will ultimately be more useful than cryptography was. It seems pretty difficult to imagine an effective ban. So far, I haven’t seen any serious proposals to do that.
Governments seem likely to continue promoting this kind of thing, not banning it.
Summary:
If the first researcher with the key insight into general AI is really “safety conscious” we don’t automatically get friendly AI first. That’s a 10x reduction in marginal value from the original model.
Being “safety conscious” correctly is really hard and most of the 30% won’t be safety conscious in the way we want, even though they “know” they should. That’s another 30x reduction in marginal value from the original model.
One big penalty that was discussed is the likelihood of another researcher having the key insight before the first researcher can leverage the insight into friendly AI. Throwing some crazy numbers down (aka a concrete albeit greatly simplified model), call it a 1% “no one would possibly think of this before FAI”, 10% “50% someone else will think of this in time to beat me if they’re unfriendly”, 89% “this is idea whose time has come, we save a couple years on the first friendly researcher, a year on the next two, and months on the rest” and call it some fraction of 50 years. That gives something like 0.02 + 0.065 + 0.048 = down by a factor of 10.
(Edited) Another factor is whether being “safety conscious” about your key insight actually ends up gaining us anything. e.g. telling a collaborator you thought was okay but wasn’t loses some of the gains. I haven’t thought through this but wouldn’t be viscerally upset if someone said anywhere from 10%-50% that being safety conscious works. (Edited) After reading Eliezer’s comment, I think I was confusing two things (and maybe others are). There’s a spectrum of safety consciousness, and I don’t think all of those 30% of researchers convinced by 100 papers get to the 10%-50% level of “safety consciousness from them will work”. Maybe 2% get to 50%, 10% get to 10%, and 88% get to 2% or worse aka 1%. That brings this factor down to 3%.
(Edited: this is a non-issue) There’s also the possibility of very negative consequences to buying up the bright grad students (I assume we need the bright ones to get good papers produced in the right journals). I don’t know if this is actually any concern at all to those with at least some intuition into the matter—I have no such intuition. (Edited) This came from my thought: “if it was generally well-known that a relatively small group of people was trying to buy up 30% of the AI research, might that cause a social backlash?” which is just flat-out wrong, we’re trying to write 100 papers to convince 30% of the community, not actually buy 30% of the research. :)
In the other direction there’s the possibility that “100 good papers” leads to “30% convinced” leads to “balloon upwards to 80% due to networking and no-longer-non-mainstream effects”. (Edited: if this happens, it gives us a factor of about 1.5, so its total contribution is pretty small unless it’s very likely)
Oh, and there’s the expected time until FAI as compared to GAI… if FAI is too much longer, we only get a benefit from the 1% piece of that model which would make it down by an extremely unstable (1% plus or minus 1% ;)) factor of 50. (Edited) Let’s put some crazy numbers down and say FAI being so much friggin’ harder than GAI is 25%, plus 10% FAI is actually just impossible, for a 35% chance we only get the benefits from the 1% (plus or minus 1%) piece of the “delaying general AI” model. My other intuitions were coming from FAI being really hard but not a century harder than GAI. This takes my original 0.02 + 0.065 + 0.048 = 0.133 down to 0.35 0.02 + 0.65 0.133 = 0.0935, which is still about a 10x factor.
Anyone with better intuitions/experience/(gasp)knowledge want to redo those numbers or note why one of the models is terribly broken or brainstorm other yet-unmentioned factors?
I don’t understand this sentence. Please explain.
What negative consequences?
Edited. I would guess that “being safety conscious” isn’t enough to guarantee good effects, and we only get some fraction of the benefit that an Ideal Safety Conscious Researcher would give.
The negative consequences I was thinking of are in retrospect based on a silly error.
Thanks for pointing those out!
After reading through the post and all the comments I think the most important moral is that a simple quantitative model thought up by very smart people in a context emphasizing rationality and examined and found lacking in significant sources of error (to the point that one of these smart people is willing to post it to Less Wrong main) can still ultimately be off by many orders of magnitude.
(Not to say that drafting a simple quantitative model isn’t a great starting point, but instead that when interpreting such models one should assume that the margin of error is really really big, especially when pondering implications of the model, especially especially when pondering implications for decision policies.)
It is challenging to know what will help:
Maybe pointing at machine intelligence and shouting “DANGER!” and “WEAPON!” will just attract the attention of the military.
Maybe getting the safety-conscious teams to slow down will mean a greater chance of the unscrupulous teams getting there first.
This is one of my concerns about the SIAI. They seem to be enthusiastic about caution—but excessive caution in this area seems likely to increase the chances of an undesirable outcome—via the mechanism in the link—so they may be having a particularly negative impact.
The “key insight” model seems deeply flawed. We know that the technical side of the problem involves performing inductive inference—which is a close cousin of stream compression. So, progress is very likely to look like progress with stream compression. Some low-hanging fruit—and then gradually diminishing returns. Rather like digging a big hole in the ground.
Here’s Bob Mottram making much the same point as I just made:
How confident should we be that general AI involves solely hard work on existing problems like performing inductive inference? I agree that if there are no more Key Insights, and instead just a bunch of insights that some researcher will eventually have, then most of the gains from the proposal can’t be realized. Next steps: somehow estimate the probability that there are 0, 1, or several Key Insights remaining before general AI is “just” a matter of tons of hard research/experimentation, and estimate the gains from the 100-paper-strategy for the scenarios in which there are 0 or several Key Insights remaining.
I didn’t really claim that. There’s also the whole issue of what utility function to use—and some other things as well—tree pruning strategies, for instance. Just that inductive inference is the key technology for the technical side of the problem—the part not to do with values.
Much has been written about the link between induction and intelligence: Hutter. Mahoney. Me.
“Estimate a 10% current AI risk”… wait, where did that come from? You say “Let A be the probability that an AI will be created”, but actually your A is the probability that an AI will be created which then goes on to wipe out humanity unless precautions are taken, but which will also fail to wipe out humanity if the proper precautions are taken.
Your estimate for that is a whopping 10%? Without any sort of substantiating argument??
… Let’s say I claim 0.000001% is a much more reasonable figure for this: what would be your rationale supporting that your estimate is more plausible than mine? Using my estimate, it suddenly becomes much more worthwhile in terms of lives saved per dollar to just build wells in Africa.
(addendum, in fact, I would argue that utility is not to me measured in lives-saved-per-dollar, or else you would need to invest in increasing fertility in Africa so you can then go on to save more lives by building wells. Instead your utility should be a stable and happy Africa (Africa because that’s the most unhappy continent right now, so your payoff will tend to be greatest if you invest in Africa) -- for which end the rational thing to do will be invest in birth control rather than wells. But that’s a different story)
Marginal taking-of-safety-seriously, as Eliezer points out, doesn’t look good enough: you just delay the inevitable a little bit, if even that. On the other hand, establishing a widely-accepted consensus that AGI is as dangerous as A-bombs that blow up the whole universe might influence the field in more systematic ways (although it’s unclear how, and achieving this goal doesn’t look plausible).
If AGI is a long way away, then seeding a safety message to current and future grad students could influence the directions they take, and turn the field in the direction of higher safety.
If AGI comes soon, then influencing people is much less useful, I agree.
Is there a body of knowledge about controlling self-modifying programs which could be used as a stepping stone to explaining what would be involved in FAI?
People like me wrote self-modifying machine code programs back in the 1980s—but self-modification quickly went out of fashion. For one thing, you couldn’t run from read-only storage. For another, it made your code difficult to maintain and debug.
Self-modifying code never really came back into fashion. We do have programs writing other programs, though: refactoring, compilers, code-generating wizards and genetic programming.
Until people figure out how to create reliable self-modifying programs that have modest goals, I’m not going to worry about self-improving AI of any sort being likely any time soon.
Perhaps the rational question is: How far are we from useful self-modifying programs?
Self-modifying programs seems like a bit of a red herring. Most likely groups of synthetic agents will become capable of improving the design of machine minds before individual machines can do that. So, you would then have a self-improving ecosystem of synthetic intelligent agents.
This probably helps with the wirehead problem, and with any Godel-like problems associated with a machine trying to understand its entire mind.
Today, companies that work on editing/refactoring/lint etc tools are already using their own software to build the next generation of programming tools. There are still humans in the loop—but the march of automation is working on that gradually.
I agree that a multi-agent systems perspective is the most fruitful way of looking at the problem. And I agree that coalitions are far less susceptible to the pathologies that can arise with mono-maniacal goal systems. A coalition of agents is rational in a different, softer way than is a single unified agent. For example, it might split its charitable contributions among charities. Does that weaker kind of rationality mean that coalitions should be denigrated? I think not.
To answer Nancy’s question, there is a huge and growing body of knowledge about controlling multi-agent systems. Unfortunately, so far as I know, little of it deals with the scenario in which the agents are busily constructing more agents.
That does happen quite a bit in genetic and memetic algorithms—and artificial life systems.
I checked with the Gates Foundation. 7549 grants and counting!
It seems as though relatively united agents can split their charitable contributions too.
A note, though… if I had a billion dollars and decided just to give it to whoever GiveWell recommended as their top-rated international charities, due to most charities’ difficulty in converting significant extra funds into the same level of effect, I would end up giving 1+10+50+1+0.3+5=67.3 million to 6 different charities and then become confused at what to do with my 932.7 million dollars.
I know the Gates Foundation does look like a coalition of agents rather than a single agent, but it doesn’t look like a coalition of 7549+ agents. I’d guess at most about a dozen and probably fewer Large Components.
Their fact sheet says 24 billion dollars.
Is maintaining sufficient individuality likely to be a problem for the synthetic agents?
Only if they are built to want individuality. We will probably start of with collective systems—because if you have one agent, it is easy to make another one the same, whereas it is not easy to make an agent with a brain twice as big (unless you are trivially adding memory or something). So: collective systems are easier to get off the ground with—they are the ones we are likely to build first.
You can see this in most data centres—they typically contain thousands of small machines, loosely linked together.
Maybe they will ultimately find ways to plug their brains into each other and more comprehensively merge together—but that seems a bit further down the line.
I was concerned that synthetic agents might become so similar to each other that the advantages of different points of view would get lost. You brought up the possibility that they might start out very similar to each other.
If they started out similar, such agents could still come to differ culturally. So, one might be a hardware expert, another might be a programmer, and another might be a tester, as a result of exposure to different environments.
However, today we build computers of various sizes, optimised for various different applications—so probably more like that.
There’s a limit to how similar people can be made to each other, but if there are efforts to optimize all the testers (for example), it could be a problem.
Well, I doubt machines being too similar to each other will cause too many problems. The main case where that does cause problems is with resistance to pathogens—and let’s hope we do a good job of designing most of those out of existence. Apart from that, being similar is usually a major plus point. It facilitates mass production, streamlined and simplified support, etc.
Yes. As tim points out below, the main thing that programmers are taught is “self-modifying programs are almost always more trouble than they’re worth—don’t do it.”
My hunch is that self-modifying AI is far more likely to crash than it is to go FOOM, and that non-self-modifying AI (or AI that self-modifies in very limited ways) may do fairly well by comparison.
My understanding was that the CEV approach is a meta-level approach to stable self improvement, aiming to design code that outputs what we would want an FAI’s code to look like (or something like this). I could certainly be wrong of course, and I have very little to go on here, as the Knowability of FAI and CEV are both more vague than I would like (since, of course, the problems are still way open) and several years old, so I have to piece the picture together indirectly.
If that interpretation is correct it seems (and I stress that I might be totally off base with this) that stable recursive self-improvement over time is not the biggest conceptual concern, but rather the biggest conceptual difficulty is determining how to derive a coherent goal set from a bunch of Bayesian utility maximizers equipped with each individual person’s utility function (and how to extract each person’s utility function), or something like that. A stable self-improving code would then (hopefully) be extrapolated by the resulting CEV, which is actually the initial dynamic.
My comment wasn’t directed towards CEV at all—CEV sounds like a sensible working definition of “friendly enough”, and I agree that it’s probably computationally hard.
I was suggesting that any program, AI or no, that is coded to rewrite critical parts of itself in substantial ways is likely to go “splat”, not “FOOM”—to degenerate into something that doesn’t work at all.
This sounds like decision theory stuff that Eliezer and others are trying to figure out.
Just one paper (AI safety or FAI design)...I will be very impressed. I will donate a minimum of $10 ($20 for a technical paper on FAI design) per peer-reviewed research paper per journal to the SIAI.
I doubt I’ll have to donate even once within the next 50 years. But I would be happy to be proven wrong.
There are some of those in the works, but note that the Future of Humanity Institute converts funds into research papers on these topics as well (Nick Bostrom is working on an academic book now which pretty comprehensively summarizes the work of folk around SIAI).
FHI accepts donations, and estimates a cost of about $200k (USD, although currency swings may have changed this number) per 2 year postdoc, including travel, share of overhead and administrative costs, conferences, journal fees, etc. As part of Oxford, they have comparative advantage in hiring academics and lending prestige to the work. You can look at their research record on their website and assess things that way.
Converts funds, or converts marginal funds?
I’ve been meaning to start the SIAI vs FHI conversation here in its own thread for some time, if people don’t think it falls afoul of Common Interest of Many Causes.
Marginal funds. FHI is funding-limited in its number of positions there. The marginal hires do not average Bostrom-level productivity (it’s hard to get academics to pursue a research agenda other than one they were already working on), but you can look at the last several hires and average across them.
I don’t know who counts as the last several hires, but while I’m sure everyone at FHI does fine work, only Bostrom and Sandberg seem to be doing research related to AI risks. Also Hanson, I suppose, to the extent that he counts as working at FHI. I don’t dispute that some marginal funds would on expectation go to research on these topics, but surely it would be a lot less than half.
Much of the dispersion is caused by the lack of unrestricted funds (and lack of future funding guarantees). Since we don’t have enough funding from private philanthropists, we have to chase academic funding pots, and that then forces us to do some work that is less relevant to the important problems we would rather be working on. It would be unfortunate if potential private funders then looked at the fact that we’ve done some less-relevant work as a reason not to give.
Thank you for weighing in! Your point sounds valid. After taking it into account, if you considered marginal dollars donated to FHI without explicit earmarking, what is your estimate for the fraction of such dollars that end up causing a dollar’s worth of research into topics that would be seen as highly relevant by someone with roughly SIAI-typical estimates for the future?
A high fraction. “A dollar’s worth of research” is not a well-defined quantity—that is, the worth of the research produced by a dollar varies a lot depending on whom the dollar is given to. I like to think FHI is good at converting dollars into research. The kind of research I’d prefer to do with unrestricted funds at the moment probably coincides pretty well with what a person with SIAI-typical estimates would prefer, though what can be researched also depends on the capabilities and interests of the research staff one can recruit. (There are various tradeoffs here—e.g. a weaker researcher who has a long record of working in this area or taking a chance with a slighly stronger researcher and risk that she will do irrelevant work? headhunting somebody who is already actively contributing to the area or attempt to involve a new mind who would otherwise not have contributed? etc.)
There are also indirect effects, which might lead to the fraction being larger than one—for example, if discussions, conferences, and various kinds of influence encourage external researchers to enter the field. FHI does some of that, as does the SIAI.
Thanks. When I said “a dollar’s worth of research”, I had in mind the estimate Carl mentioned of $200k per 2-year postdoc. I guess that doesn’t affect the fraction question.
The details depend on how you count the methodology/general existential risks stuff, e.g. the “probing the improbable” paper by Ord, Sandberg, and Hillerbrand. Also note that many of Bostrom’s and Sandberg’s publications, including the catastrophic risks book, and events like the Winter Intelligence Conference benefit from help by other FHI staff. Still, some hires have definitely done essentially no existential risk-relevant work. My guess is something like 1 Sandberg or Ord equivalent per 2-3 hires (with differential attrition leading to accumulation of the good).
Also, given earmarked funding they can create positions specifically for machine intelligence issues, the results of which are easier to track (the output of that person).
But presumably that would only be a consideration if FHI received very large amounts of such earmarked funding?
$200k USD for one postdoc. One could save up for that with a donor-advised fund alone or with others, or use something like kickstarter.com.
Comments like this are evidence that focus on getting papers into journals is important, relative to the amount of effort currently going into it.
And every time someone doesn’t make a comment like this, it’s evidence that such a focus is unimportant, so what makes you think it comes out one way rather than the other on net?
LessWrong seems significantly more likely than normal to produce vocal dissent (“I wouldn’t find this useful”) rather than silence. That said, LessWrong is probably also not the majority of AI researchers, who are the actual target audience, so using ourselves as a “test market” is probably flawed on a few levels...
Does this one count?
It has had some peer review—and should be in the AGI-11 Conference Proceedings.
I contest this use of the term “safety”. If your goal is for humanity to survive, say that your goal is for humanity to survive. Not to “promote safety”.
“Safety” means avoiding certain bad outcomes. By using the word “safety”, you’re trying to sneak past us the assumption “humans remaining the dominant lifeform = good, humans not remaining dominant = bad”.
The argument should be over what humans have that is valuable, and how we can contribute that to the future. Not over how humans can survive.
What is value? What things are valuable, and what are not?
Everything that we know about value, everything that we can know, is encoded within the current state of humanity.
As long as that knowledge remains, there is hope for the Best Possible Future. It may be a future that includes no humans, but it will be a future based on that knowledge.
If that knowledge is destroyed, or it loses power since it is no longer riding inside the dominant life form, then the future will be, morally, as chaos—as likely to eat babies as to love them.
To figure out how we can contribute to the future, what should replace us, and so on, takes time. Time we do not have if we do not focus on safety first.
Well, our distant descendants, whether uploads or cyborgs or other life-forms, could be considered part of “generalized humanity”, as long as they retain what humans have that is valuable.
And regardless, we certainly want current humanity (that is, all the people alive now) to survive, in the sense of not being killed by the AI.
My point being, it’s not necessarily right to take “the survival of humanity” to mean that we have to retain this physical form, and I don’t think the OP was using the words in that sense.
Agreed. People seem to get hold of the idea that humans are good, and machines are bad, and then get into an us vs them mindset. Surely all the best possible futures involve an engineered world, where the agony of being a meat brained human who was cobbled together by natural selection is mostly a distant memory.
But we have to keep the humans around until humans are capable of engineering that world carefully and without screwing it up. If we don’t engineer it, who will?
Right. There are pretty good instrumental reasons for all the parties concerned to do that. Humans may also be useful for a while for rebooting the system—if there is a major setback. They have successfully booted things up once already. Other backup systems are likely to be less well tested.
Here’s where I’d stick in the 10^-3 penalty. It’s reasonable to assume that taking safety seriously will keep you safe from accidental leaks of toxic chemicals, deadly viruses, etc. because these are well-understood phenomena that pose a single, predictable risk. If you can keep the muriatic acid off your skin, it won’t burn you. If you can keep the swine flu out of your lungs, it won’t infect you.
A truly general AI, though, almost by definition, would be able to think up countless ways of overpowering you. It’s very unlikely that you could adequately guard against all of those ways, and the AI only needs to succeed once to cause an existential risk. Thus, it’s not enough that the AI be ‘securely’ boxed; the AI also has to be provably friendly. And that means we have to figure out what ‘friendly’ even means. And while there are certainly researchers out there who can be convinced to invest in a bit of safety, proving that an AI is friendly requires way more resources than just putting on a pair of gloves or working in an airtight room.
It isn’t likely to be you vs the superintelligence, though. People keep imagining that—and then wringing their hands. The restraints on intelligent agents while they are being developed and tested are likely to consist of a prison built by the last generation of intelligent agents, featuring them as guards.
You focus on visibly HAL-like or Skynet-like AI—the sort of thing that AI researchers produce as demos. However, we have large, smart, durable, existing entities (businesses and other computer+human teams) that are continuously getting smarter (and entrenching themselves deeper into our society) by automating their existing business practices.
I don’t advocate trying to stop business automation, or humans organizing themselves into better and better teams; I think that would be throwing the baby out with the bathwater. However, I do think “business as usual” or “the default future” is the threat that existential risks people should be imagining.
The vast majority of writing about these issues has a story of terrorists or scientists (who are wizards meddling with things man was not meant to know) accidentally creating paperclip-making machines. That isn’t thinking, that’s straight out of folklore; e.g. Why the Sea is Salt.
Automation leads to a world where humans vote for government welfare for themselves. Governments then seem likely to compete with each other to attract corporations with low tax regimes, and get rid of their human burdens. This scenario is similar to the early parts of Manna. It leads to a world where humans are functionally redundant—though they may persist as a kind of parasitic organic layer on top of the machine world.
Meanwhile, many humans seem likely to be memetically hijacked, potentially leading to fertility and population declines. That may be a slow process, though.
Well, only around here. Other folk are looking at the effects of automation. Here’s my overview:
http://alife.co.uk/essays/will_machines_take_our_jobs/
1) You assign no probability to the AI being “Unfriendly” as you put it.. In particular you assume that AI = AI that kills everyone. For all I know this could be 0 and I am certainly of the opinion that it is very low.
2) The idea that the number of papers counts but the quality doesn’t (except in that they are “good”) is ridiculous—not only could 1 excellent paper be worth 1000 “good” ones, the “1000” good ones may not even be written if the excellent one comes first.
IMHO the only way to assess the risk of “unfriendly” AI is to build an AI (carefully) and ask it :)