Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”
I liked this talk by Ben.
I think it raises some very important points. OTTMH, I think the most important one is: We have no good critics. There is nobody I’m aware of who is seriously invested in knocking down AI-Xrisk arguments and qualified to do so. For many critics in machine learning (like Andrew Ng and Yann Lecun), the arguments seem obviously wrong or misguided, and so they do not think it’s worth their time to engage beyond stating that.
A related point which is also important is: We need to clarify and strengthen the case for AI-Xrisk. Personally, I think I have a very good internal map of the path arguments about AI-Xrisk can take, and the type of objections one encounters. It would be good to have this as some form of flow-chart. Let me know if you’re interested in helping make one.
Regarding machine learning, I think he made some very good points about how the the way ML works doesn’t fit with the paperclip story. I think it’s worth exploring the disanalogies more and seeing how that affects various Xrisk arguments.
As I reflect on what’s missing from the conversation, I always feel the need to make sure it hasn’t been covered in Superintelligence. When I read it several years ago, I found Superintelligence to be remarkably thorough. For example, I’d like to point out that FOOM isn’t necessary for a unilateral AI-takeover, since an AI could be progressing gradually in a box, and then break out of the box already superintelligent; I don’t remember if Bostrom discussed that.
The point about justification drift is quite apt. For instance, I think the case for MIRI’s veiwpoint increasingly relies on:
1) optimization daemons (aka “inner optimizers”)
2) adversarial examples (i.e. current ML systems seem to learn superficially similar but deeply flawed versions of our concepts)
TBC, I think these are quite good arguments, and I personally feel like I’ve come to appreciate them much more as well over the last several years. But I consider them far from conclusive, due to our current lack of knowledge/understanding.
One thing I didn’t quite agree with in the talk: I think he makes a fairly general case against trying to impact the far future. I think the magnitude of impact and uncertainty we have about the direction of impact mostly cancel each other out, so even if we are highly uncertain about what effects our actions will have, it’s often still worth making guesses and using them to inform our decisions. He basically acknowledges this.
Holden Karnofsky used to be a critic, but then changed his mind.
I think the case for AI being an x-risk is highly disjunctive (see below), so if someone engages with the arguments in detail, they’re pretty likely to find at least one line of argument convincing. (It might be that one of these lines of arguments, namely local FOOM of a utility maximizer, got emphasized a bit too much so some outsiders dismiss the field thinking that’s the only argument.)
Have you seen this and this (including my comment)?
FWIW I’ve found Sarah Constantin to be best “serious critic”, with Performance Trends in AI being the single post that’s updated me most in the direction of “I should perhaps be less worried or thinking in terms of longer timelines than I currently am.”
Eliezer periodically points to Robin Hanson but the disagreements in frame seem weirdly large to the point that I don’t know how to draw much actionable stuff from it (i.e. AFAICT Robin would be quite happy with outcomes that seem quite horrifying to me, and/or is uninterested in doing much about it)
RE Sarah: Longer timelines don’t change the picture that much, in my mind. I don’t find this article to be addressing the core concerns. Can you recommend one that’s more focused on “why AI-Xrisk isn’t the most important thing in the world”?
RE Robin Hanson: I don’t really know much of what he thinks, but IIRC his “urgency of AI depends on FOOM” was not compelling.
What I’ve noticed is that critics are often working from very different starting points, e.g. being unwilling to estimate probabilities of future events, using common-sense rather than consequentialist ethics, etc.
No, for the reasons I mention elsethread. I basically don’t think there are good “real*” critics.
I agree with the general sense that “Sarah Constantin Timelines look more like 40 years than 200, and thus don’t really change the overall urgency of the problem unless you’re not altruistic and are already old enough that you don’t expect to be alive by the same superintelligence exists.”
(I do also think I’ve gotten some reasonable updates from Jeff Kaufman’s discussion and arguments (link to the most recent summary he gave but I’m thinking more of various thoughts I’ve heard him give over the years. I think my notion that X-Risk efforts are hard due to poor feedback loops came from him, which I think is one of the most important concerns to address. Although my impression is he got it from somewhere else originally)
[Edit: much of what Jeff did in his “official research project” was talk to lots of AI professionals, who theoretically had more technically relevant expertise. I’m not sure how many of them fit the criteria I listed in my reply to Ted. My impression is they were some mix of “already involved in the AI Safety field”, and “didn’t really meet the criteria I think are important for critics”, although I don’t remember the details offhand.
I do observe that the best internal criticism we have comes from people who roughly buy the Superintelligence arguments and are involved with the AI Alignment field but who disagree a lot on the particulars a la Paul]
Re: your request for collaboration—I am skeptical of ROI of research on AI X-risk, and I would be happy to help offer insight on that perspective, either as a source or as a giver of feedback. Feel free to email me at {last name}{first name}@gmail.com
I’m not an expert in AI, but I have a PhD in semiconductors (which gives me perspective on hardware) and currently work on machine learning at Netflix (which gives me perspective on software). I also was one of the winners of the SciCast prediction market a few years back, which is evidence that my judgment of near-term tech trends is decently calibrated.
Although I’m not the one trying to run a project, there are a couple credentials I’d be looking for to evaluate in a serious critic. (I very much agree with the OP that “serious critics” are an important thing to have more of).
Not meant to be a comment one way or another on whether you fit this, just that you didn’t mention it yet:
Fluency in the arguments presented in Superintelligence (ideally, fluency in the broader spectrum of arguments relating to AI and X-Risk, but Superintelligence does a thorough enough job that it works okay as a litmus test). I stopped paying much attention to AI Safety critics because they didn’t seem knowledgeable about the basic arguments, let alone up to speed on the state of the field.
“Seriousness of trying to solve the problem.” I.e if we knew that aliens were coming in 30 years (or 10, or even 300), we wouldn’t just shrug if we couldn’t come up with tractable things to do. We’d be building some kind of model of whatever-the-best-course-of-action was. If the correct course of action is “wait until the field has progressed more”, there should be a clear sense of when you have waited long enough and what to do when things change. (Maybe this point could be summarized as “familiar with and responding to the concerns in No Fire Alarm for AI Safety.”)
The criticism is expecting counter-criticism. i.e. What I think we’re missing is critics who are in it for the long haul, who see their work as the first step of an iterative process, with an expectation that the AI safety field will respond and/or update to their critiques.
As someone who sometimes writes things that are a bit skeptical regarding AI doom, I find the difficulty of getting counter-criticism frustrating.
Nod. That does seem like another important part of the equation. I admit I have not read that much of your work. I just went to revisit the last essay of yours I remember looking at and amusingly found this comment of mine apologizing for not having commented more. :(
I made a relevant post in the Meta section.
I might slightly alter to one of
This seems much more specific than I was aiming for at this stage. That’s certainly one way to operationalize it, but I’m much more worried about the binary condition of “they expect to have to update or defend the critique at some point after people have responded to it, in some fashion, at all”, than any particular operationalization.
(edit: in fact, I think it is more important that they expect to have skin in the game 2 years later than whether they respond 2 weeks later)
What does OTTMH mean?
OTTOMH—Off the top of my head
General rationality question that should not be taken to reflect any particular opinion of mine on the topic at hand:
At what point should “we can’t find any knowledgeable critics offering meaningful criticism against <position>” be interpreted as substantial evidence in favor of <position>, and prompt one to update accordingly?
I feel it’s like “A → likely B” being an evidence for “B → likely A”; generally true, but it could be either very strong or very weak evidence depending on the base rates of A and B.
Not having knowledgeable criticism against position “2 + 2 = 4” is strong evidence, because many people are familiar with the statement, many use it in their life or work, so if it is wrong, it would be likely someone would already offer some solid criticism.
But for statements that are less known or less cared about, it becomes more likely that there are good arguments against them, but no one noticed them yet, or no one bothered to write a solid paper about them.
Good talk. I’d like to hear what he thinks about the accelerating change/singularity angle, as applied to the point about the person living during the industrial revolution who’s trying to improve the far future.