Students asked to defend AGI danger update in favor of AGI riskiness
From Geoff Anders of Leverage Research:
In the Spring semester of 2011, I decided to see how effectively I could communicate the idea of a threat from AGI to my undergraduate classes. I spent three sessions on this for each of my two classes. My goal was to convince my students that all of us are going to be killed by an artificial intelligence. My strategy was to induce the students to come up with the ideas themselves. I gave out a survey before and after. An analysis of the survey responses indicates that the students underwent a statistically significant shift in their reported attitudes. After the three sessions, students reported believing that AGI would have a larger impact1 and also a worse impact2 than they originally reported believing.
Not a surprising result, perhaps, but the details of how Geoff taught AGI danger and the reactions of his students are quite interesting.
This is less than surprising. I can’t think of any threat already existing in the minds of some undergraduates that a competent believing professor requiring attendance couldn’t, on average, increase. Control groups are needed.
What would you want to do with the control groups? Teach them that AGI won’t destroy the world? Not teach them anything in particular about AI? Teach them that invading aliens will destroy the world, or that the biblical End Times are near? Any of these would yield useful information. Which one(s) do you favor?
I was specifically thinking of these exact two conditions, which is why I said “groups”, for they are different in kind. The aliens example is even better than the supernatural end times one.
I thought of but rejected “Teach them that AGI won’t destroy the world?” when I couldn’t think of how to implement that neutrally. How would one do that?
True. Most arguments against the AGI-apocalypse scenario are responses to arguments for it; it would be difficult to present only one side of the question.
This seems like an incredibly biased presentation, with the author never realizing the depth of the bias. Then again, he writes “My goal was to convince my students that all of us are going to be killed by an artificial intelligence,” not to probe the validity of the point, so his bottom line was already written.
He says “I presented a neutral summary” after judiciously guiding the students through one-sided claims and refutations about the AGI (can never play -- plays ), not any of the claims that have not (yet) been refuted, then spicing it up with the Terminator quotes.
He says “At all points in the discussion I did my best to appear neutral and to not reveal my views.” right after scaring them with a bomb in a trashcan.
He assigned no homework and gave no time outside the class for the students to come up with counter-arguments.
He writes:
The idea that an argument can sometimes be tested experimentally seems utterly foreign to him (even when it is in his favor, like the “AI can never be better at chess” one). Must be something about the philosophers in general, I suppose.
He primed his students in advance:
He did not attempt to provide a balanced context by inviting (or at least quoting) an expert in the area who does not share his views.
So his conclusion, that it is possible to convince a person who never thought about a topic before of the dangers of an AGI, was a foregone one. He could probably have convinced them that AGI is the second coming of Christ, if he bothered (it is a Catholic college, so the leap is not that large).
Sort of. Assuming he was basically convinced that ”...all of us are going to be killed by an artificial intelligence,” he knew he was trying to convince his students of that but he did not know if he would succeed at doing so with this method. He wasn’t testing the dangers of AI, he was testing a method of persuasion.
The paper gives what it describes as the “AGI Apocalypse Argument”—which ends with the following steps:
It is hard to tell whether anyone took this seriously—but it seems that an isomorphic argument ‘proves’ that computer programs will crash—since “almost any” computer program crashes. The “AGI Apocalypse Argument” as stated thus appears to be rather silly.
If the stated aim was: “to convince my students that all of us are going to be killed by an artificial intelligence”—why start with such a flawed argument?
More obviously, an isomorphic argument ‘proves’ that books will be gibberish—since “almost any” string of characters is gibberish. An additional argument that non-gibberish books are very difficult to write and that naively attempting to write a non-gibberish book will almost certainly fail on the first try, is required. The analogous argument exists for AGI, of course, but is not given there.
Right—so we have already had 50+ years of trying and failing. A theoretical argument that we won’t succeed the first time does not tell us very much that we didn’t already know.
What is more interesting is the track record of engineers of not screwing up or killing people the first time.
We have records about engineers killing people for cars, trains, ships, aeroplanes and rockets. We have failure records from bridges, tunnels and skyscrapers.
Engineers do kill people—but often it is deliberately—e.g. nuclear bombs—or with society’s approval—e.g. car accidents. There are some accidents which are not obviously attributable to calculated risks—e.g. the Titanic, or the Tacoma Narrows bridge—but they typicallly represent a small fraction of the overall risks involved.
I don’t see why this makes the argument seem silly. It seems to me that the isomorphic argument is correct, and that computer programs do crash.
Some computer programs crash—just as some possible superintelligences would kill alll humans.
However, the behavior of a computer program chosen at random tells you very little about how an actual real-world computer program will behave—since computer programs are typically produced by selection processes performed by intelligent agents.
The “for almost any goals” argument is bunk.
No most computer programs crash, it’s just that most people never see them because said programs are repeatedly tested and modified until they no longer crash before being shown to people (this process is called “debugging”). With a self-modifying AI this is a lot harder to do.
By “no”, you apparently mean “yes”.
Well, that is a completely different argument—and one that would appear to be in need of supporting evidence—since automated testing, linting and the ability to program in high-level languages are all improving simultaneously.
I am not aware of any evidence that real computer programs are getting more crash-prone with the passage of time.
The point is that the first time you run the seed AI it will attempt to take over the world, so you don’t have the luxury of debugging it.
That is not a very impressive argument, IMHO.
We will have better test harnesses by then—allowing such machines to be debugged.
Almost certainly, the first time you run the seed AI, it’ll crash quickly. I think it’s very unlikely that you construct a successful-enough-to-be-dangerous AI without a lot of mentally crippled ones first.
If so then we are all going to die. That is, if you have that level of buggy code then it is absurdly unlikely that the first time the “intelligence” part works at all it works well enough to be friendly. (And that scenario seems likely.)
The first machine intellligences we build will be stupid ones.
By the time smarter ones are under developpment we will have other trustworthy smart machines on hand to help keep the newcomers in check.
I am not entirely sure I disagree with you. However, I am having difficulty modeling you.
“Achieving a goal” seems to mean, for our purposes, something along the lines of “Bringing about a world-state.” Most possible world-states do not involve human existence. Thus, it seems that for most possible goals, achieving a goal entails human extinction.
However, your mention of computer programs being produced by intelligent agents is interesting. Are you implying that most AGI’s (assume these intelligences can go FOOM) would not result in human extinction?
If this is not what you were implying, I apologize for modeling you poorly. If this is what you were implying, I would like to indicate that this post was non-hostile.
Questions about fractions of infinite sets require an enumeration strategy to be specified—or they don’t make much sense. Assuming lexicographic ordering of their source code—and only considering the set of superintelligent programs—no: I don’t mean to imply that.
A statement which we can derive from the simple fact that the mere existence of general intelligence (apes) does not result automatically in catastrophe.
I wonder how long it’ll take before people catch onto the notion that artificial “dumbness” is in many ways a more interesting field than artificial “intelligence”? (As in, how much could an AGI no smarter than a dog, but hooked into expert systems similar to Watson, do?)
It was pretty well accepted at MIT’s Media Lab back when my orbit took me around there periodically, a decade or so ago, that there was a huge amount of low-hanging fruit in this area… not necessarily of academic interest, but damned useful (and commercial).
That’s interesting since my impression if anything is the exact opposite. There seem to be a lot of people trying to apply Bayesian learning systems and expert learning systems to all sorts of different practical problems. I wonder if this is a new thing or whether I simply don’t have a good view of the field.
For what it’s worth, I consider Bayesian learning systems and expert learning systems to be “narrow” AI—hence the example I gave of Watson.
I think Ben Goertzel’s Novamente project is the closest extant project to a ‘general’ AI of any form that I’ve heard of.
I can see that for expert systems, but Bayesian learning systems seem to be a distinct category. The primary limits seem to be scalibility not architecture.
Bayesian learning systems are essentially another form of trainable neural network. That makes them very good in a narrow range of categories but also makes them insufficient to the cause of achieving general intelligence.
I do not see that scaling Bayesian learning networks would ever achieve general intelligence. No matter how big the hammer, it’ll never be a wrench. That being said, I do believe that some form of pattern recognition and ‘selective forgetting’ is important to cognition and as such Bayesian learning architecture is a good tool towards that end.
Actually, I’m curious that isn’t seen as an area of significan academic interest—designing artificial systems around being efficient parsers of extraneous data. I recall that one of the major differences between Deep Blue and Deep Fritz in the Kasperov chess matches was precisely that Fritz was designed around not probing every last possible set of playable moves; that is, Deep Fritz was “learning to forget the right things”.
It seems to me that understanding this mechanism and how it behaves in humans could have huge potential for opening up the understanding of general intelligence and cognition. And that’s a very academic concern.
Yes, 15 does not follow unless we resolve this question from 14:
Does anyone else see a problem with the data table on page 22 of that PDF file?
The paper mentions this criteria on page 22:
Yet the data points at 15 8:30am, 16 8:30am, and 17 8:30am on page 22 all appear to have blank Before, present After, and a Change which is identical to the After column.
This also appears to affect the change column statistics on page 24, (8:30 a.m) and on page 26 (Both Classes Combined) For page 28, the people who only have one survey are dropped entirely, and I no longer see this problem.
Since he uses the statistics on page 28 for his conclusion, this may not change the conclusion, but I did want to point it out.
Thanks for pointing this out. There was in fact an error. I’ve fixed the error and updated the study. Some of the conclusions embedded in tables change; the final conclusions reported stay the same.
I’ve credited you on p.3 of the new version. If you want me to credit you by name, please let me know.
Thanks again!
Hi everyone. Thanks for taking an interest. I’m especially interested in (a) errors committed in the study, (b) what sorts of follow-up studies would be the most useful, (c) how the written presentation of the study could be clarified.
On errors, Michaelos already found one—I forgot to delete some numbers from one of the tables. That error has been fixed and Michaelos has been credited. Can anyone see any other errors?
On follow-up studies, lessdazed has suggested some. I don’t know if we need to see what happens when nothing is presented on AGI; I think our “before” surveys are sufficient here. But trying to teach some alternative threat is an interesting idea. I’m interested in other ideas as well.
On clarity of presentation, it will be worth clarifying a few things. For instance, the point of the study was to test a method of persuasion, not to see what students would do with an unbiased presentation of evidence. I’ll try to make that more obvious in the next version of the document. It would be good to know what other things might be misunderstood.
I appreciate the credit and have sent you a message with my name, but I have to let you know that while Version 1.2 contains the fix, version 1.3 appears to have reverted back to the unfixed version as if it was made off of version 1.0 instead of 1.2.
Fixed.
This post seems to serve no purpose except to promote the dark arts.
A quote from the PDF:
So, yes, dark arts. But the way he kept asking “And how would the AI do that?” was excellent.
I found this an extremely surprising result. Geoff Anders claims immediate effects from essentially only two interventions:
and
There were more interventions between these and the surveys of average belief, but these interventions caused at least a few students to generate the idea that AGIs are much more creative and powerful than in Terminator 2. The effect on the tail seems to me more important and surprising than the effect on the mean.
There were a lot of interventions before these two, including whatever idiosyncrasies Anders’s philosophy course had, but the outcome before these two interventions seemed pretty standard. The first AI day seemed pretty standard. The chess exercise is probably not common and the two quotes above require its context, but the initial reaction to the chess exercise did not surprise me.
I don’t know of any studies backing this up, but I’d be expect that if a position doesn’t go against deeply held beliefs then having to argue for it will make you update in that direction.
Sounds like a classic case of priming to me.