Also, I might as well share the approximate text of my short talk from that evening:
Hi everyone,
As most of you know, my name is Luke Muehlhauser, I’m the Executive Director at MIRI, and our mission is to ensure that the creation of smarter-than-human intelligence has a positive impact.
I’m going to talk for about 5 minutes on what we’re doing at MIRI these days, and at the end I’m going to make an announcement that I’m very excited about, and then can we can all return to our pizza and beer and conversation.
I’m also going to refer to my notes regularly because I’m terrible at memorizing things.
So here’s what we’re doing at MIRI these days. The first thing is that we’re writing up descriptions of open problems in Friendly AI theory, so that more mathematicians and computer scientists and formal philosophers can be thinking about these issues and coming up with potential solutions and so on.
As a first step, we’re publishing these descriptions to LessWrong.com, and the posts have a nice mix of dense technical prose and equations but also large colorful cartoon drawings of laptops dropping anvils on their heads, which I think is the mark of a sober research article if there ever was one. That work is being led by Eliezer Yudkowsky and Robby Bensinger, both of whom are here tonight.
We’re also planning more research workshops like we did last year, except this year we’ll experiment with several different formats so we can get a better sense of what works and what doesn’t. For example the experiment for our May workshop is that it’s veterans-only — everyone attending it has been to at least one workshop before, so there won’t be as much need to bring people up to speed before diving into the cutting edge of the research.
Later this year we’ll be helping to promote Nick Bostrom’s book on machine superintelligence for Oxford University Press, which when it’s released this summer will be by far the most comprehensive and well-organized analysis of what the problem is and what we can do about it. I was hoping he could improve the book by adding cartoons of laptops dropping anvils on their heads, but unfortunately Oxford University Press might have a problem with that.
One thing I’ve been doing lately is immersing myself in the world of what I call “AI safety engineering.” These are the people who write the AI software that drives trains and flies planes, and they prove that they won’t crash into each other if certain conditions are met and so on. I’m basically just trying to figure out what they do and don’t know already, and I’m trying to find the people in the field who are most interested in thinking about long-term AI safety issues, so they can potentially contribute their skill and expertise to longer-term issues like Friendly AI.
So far, my experience is that AI safety engineers have much better intuitions about AI safety than normal AI folk tend to have. Like, I haven’t yet encountered anybody in this field who thinks we’ll get desirable behavior from fully autonomous systems by default. They all understand that it’s extremely difficult to translate into intuitively desirable behavior into mathematically precise design requirements. They understand that when high safety standards are required, you’ve got to build the system from the ground up for safety rather than slapping on a safety module near the end. So I’ve been mildly encouraged by these conversations even though almost none of them are thinking about the longer-term issues — at least not yet.
And lastly, I’d like to announce that we’ve now hired two workshop participants from 2013 as full-time Friendly AI researchers at MIRI, Benja Fallenstein and Nate Soares. Neither of them are here today, they’re in the UK and Seattle respectively, but they’ll be joining us shortly and I’m very excited. Some of you who have been following MIRI for a long time can mark this down on your FAI development timeline: March 2014, MIRI starts building its Friendly AI team.
Okay, that’s it! Thanks everyone for coming. Enjoy the pizza and beer.
This deserves a top level post, at least in discussion. I assume MIRI just can’t afford to hire anyone to make LW posts. As such, I’ve just made a $2,000 donation, earmarked for just that purpose.*
So far, my experience is that AI safety engineers have much better intuitions about AI safety than normal AI folk tend to have. Like, I haven’t yet encountered anybody in this field who thinks we’ll get desirable behavior from fully autonomous systems by default. They all understand that it’s extremely difficult to translate into intuitively desirable behavior into mathematically precise design requirements. They understand that when high safety standards are required, you’ve got to build the system from the ground up for safety rather than slapping on a safety module near the end. So I’ve been mildly encouraged by these conversations even though almost none of them are thinking about the longer-term issues — at least not yet.
I did not follow the interviews in detail. But I doubt that most of these AI safety engineers believe that you could achieve AI software that can drive trains and fly planes without crashing, but which nonetheless drives and flies people to locations they do not desire. In other words, my guess is that these people believe that without being able to prove that programs meet certain conditions you won’t achieve FOOM in the first place. What they probably do not believe is MIRI’s idea of an AI that works perfectly along a huge number of dimensions (e.g. making itself superhuman smart, solving the protein folding problem etc.), but which nonetheless fails at doing what people designed it to do (except that it does not fail at all the aforementioned tasks).
The problem isn’t so much that the AI doesn’t do what is was designed to do, it’s that what you implemented is subtly different from what you designed. This is something that commonly happens in programming, not just a hypothetical concern.
Also, I might as well share the approximate text of my short talk from that evening:
This deserves a top level post, at least in discussion. I assume MIRI just can’t afford to hire anyone to make LW posts. As such, I’ve just made a $2,000 donation, earmarked for just that purpose.*
*Not actually earmarked for silly things.
Thanks very much!
I did not follow the interviews in detail. But I doubt that most of these AI safety engineers believe that you could achieve AI software that can drive trains and fly planes without crashing, but which nonetheless drives and flies people to locations they do not desire. In other words, my guess is that these people believe that without being able to prove that programs meet certain conditions you won’t achieve FOOM in the first place. What they probably do not believe is MIRI’s idea of an AI that works perfectly along a huge number of dimensions (e.g. making itself superhuman smart, solving the protein folding problem etc.), but which nonetheless fails at doing what people designed it to do (except that it does not fail at all the aforementioned tasks).
The problem isn’t so much that the AI doesn’t do what is was designed to do, it’s that what you implemented is subtly different from what you designed. This is something that commonly happens in programming, not just a hypothetical concern.