Epistemic Status: Especially about the future.
Response To (Eliezer Yudkowsky): There’s No Fire Alarm for Artificial General Intelligence
It’s long, but read the whole thing. Eliezer makes classic Eliezer points in classic Eliezer style. Even if you mostly know this already, there’s new points and it’s worth a refresher. I fully endorse his central point, and most of his supporting arguments.
What Eliezer has rarely been, is fair. That’s part of what makes The Sequences work. I want to dive in where he says he’s going to be blunt – as if he’s ever not been – so you know it’s gonna be good:
Okay, let’s be blunt here. I don’t think most of the discourse about AGI being far away (or that it’s near) is being generated by models of future progress in machine learning. I don’t think we’re looking at wrong models; I think we’re looking at no models.
I was once at a conference where there was a panel full of famous AI luminaries, and most of the luminaries were nodding and agreeing with each other that of course AGI was very far off, except for two famous AI luminaries who stayed quiet and let others take the microphone.
I got up in Q&A and said, “Okay, you’ve all told us that progress won’t be all that fast. But let’s be more concrete and specific. I’d like to know what’s the least impressive accomplishment that you are very confident cannot be done in the next two years.”
There was a silence.
Eventually, two people on the panel ventured replies, spoken in a rather more tentative tone than they’d been using to pronounce that AGI was decades out. They named “A robot puts away the dishes from a dishwasher without breaking them”, and Winograd schemas. Specifically, “I feel quite confident that the Winograd schemas—where we recently had a result that was in the 50, 60% range—in the next two years, we will not get 80, 90% on that regardless of the techniques people use.”
A few months after that panel, there was unexpectedly a big breakthrough on Winograd schemas. The breakthrough didn’t crack 80%, so three cheers for wide credibility intervals with error margin, but I expect the predictor might be feeling slightly more nervous now with one year left to go. (I don’t think it was the breakthrough I remember reading about, but Rob turned up this paper as an example of one that could have been submitted at most 44 days after the above conference and gets up to 70%.)
But that’s not the point. The point is the silence that fell after my question, and that eventually I only got two replies, spoken in tentative tones. When I asked for concrete feats that were impossible in the next two years, I think that that’s when the luminaries on that panel switched to trying to build a mental model of future progress in machine learning, asking themselves what they could or couldn’t predict, what they knew or didn’t know. And to their credit, most of them did know their profession well enough to realize that forecasting future boundaries around a rapidly moving field is actually really hard, that nobody knows what will appear on arXiv next month, and that they needed to put wide credibility intervals with very generous upper bounds on how much progress might take place twenty-four months’ worth of arXiv papers later.
(Also, Demis Hassabis was present, so they all knew that if they named something insufficiently impossible, Demis would have DeepMind go and do it.)
The question I asked was in a completely different genre from the panel discussion, requiring a mental context switch: the assembled luminaries actually had to try to consult their rough, scarce-formed intuitive models of progress in machine learning and figure out what future experiences, if any, their model of the field definitely prohibited within a two-year time horizon. Instead of, well, emitting socially desirable verbal behavior meant to kill that darned hype about AGI and get some predictable applause from the audience.
I’ll be blunt: I don’t think the confident long-termism has been thought out at all. If your model has the extraordinary power to say what will be impossible in ten years after another one hundred and twenty months of arXiv papers, then you ought to be able to say much weaker things that are impossible in two years, and you should have those predictions queued up and ready to go rather than falling into nervous silence after being asked.
In reality, the two-year problem is hard and the ten-year problem is laughably hard. The future is hard to predict in general, our predictive grasp on a rapidly changing and advancing field of science and engineering is very weak indeed, and it doesn’t permit narrow credible intervals on what can’t be done.
I agree that most discourse around AGI is not based around models of machine learning. I agree the AI luminaries seem to not have given good reasons for their belief in AGI being far away.
I also think Eliezer’s take on their response is entirely unfair. Eliezer asks an excellent question, but the response is quite reasonable.
I
It is entirely unfair to expect a queued up answer.
Suppose I have a perfectly detailed mental model for future AI developments. If you ask, “What’s the chance ML can put away the dishes within two years?” I’ll need to do math, but: 3.74%.
Eliezer asks me his question.
Have I recently worked through that question? There are tons of questions. Questions about least impressive things in any reference class are rare. Let alone this particular class, confidence level and length of time.
So, no. Not queued up. The only reason to have this answer queued up is if someone is going to ask.
I did not anticipate that. I certainly did not in the context of a listening Dennis Hassabis. This is quite the isolated demand for rigor. I’ll need to think.
II
Assume a mental model of AI development.
I am asked for the least impressive thing. To answer well, I must maximize.
What must be considered?
I need to decide what Eliezer meant by very confident, and what other people will think it means, and what they think Eliezer meant. Three different values. Very confident as actually used varies wildly. Sometimes it means 90% or less. Sometimes it means 99% or more. Eliezer later claims I should know what my model definitely prohibits but asked about very confident. There is danger of misinterpretation.
I need to decide what impressiveness means in context. Impressiveness in terms of currently perceived difficulty? In terms of the public or other researchers going ‘oh, cool’? Impressive for a child? Some mix? Presumably Eliezer means perceived difficulty but there is danger of willful misinterpretation.
I need to query my model slash brainstorm for unimpressive things I am very confident cannot be done in two years. I adjust for the Hassabis effect that tasks I name will be accomplished faster.
I find the least impressive thing.
Finally I choose whether to answer.
This process isn’t fast even with a full model of future AI progress.
III
I have my answer: “A robot puts away the dishes from a dishwasher without breaking them.”
Should I say it?
My upside is limited.
It won’t be the least impressive thing not done within two years. Plenty of less impressive things might be done within two years. Some will and some won’t. My answer will seem lousy. The Hassabis effect compounds this, since some things that did not happen in two years might have if I’d named them.
Did Eliezer’s essay accelerate work done on unloading a dishwasher? On the Winograd schemas?
If I say something that doesn’t happen but comes close, such as getting 80% on the Winograd schemas if we get to 78%, I look wrong and lucky. If it doesn’t come close, I look foolish.
Also, humans are terrible at calibration.
A true 98% confident answer looks hopelessly conservative to most people, and my off-the-cuff 98% confident answer likely isn’t 98% reliable.
Whatever I name might happen. How embarrassing! People will laugh, distrust and panic. My reputation suffers.
The answer Eliezer gets might be important. If I don’t want laughter, distrust or panic, it might be bad if even one answer given happens within two years.
In exchange, Eliezer sees a greater willingness to answer, and I transfer intuition. Does that seem worth it?
IV
Eliezer asked his question. What happened?
The room fell silent. Multiple luminaries stopped to think. That seems excellent. Positive reinforcement!
Two gave tentative answers. Those answers seemed honest, reasonable and interesting. The question was hard. They were on the spot. Tentativeness was the opposite of a missing mood. It properly expresses low confidence. Positive reinforcement!
Others chose not to answer. Under the circumstances, I sympathize.
These actions do not seem like strong evidence of a lack of models, or of bad faith. This seems like what you hope to see.
V
I endorse Eliezer’s central points. There will be no fire alarm. We won’t have a clear sign AGI is coming soon until AGI arrives. We need to act now. It’s an emergency now. Public discussion is mostly not based on models of AI progress or concrete short term predictions.
Most discussions of the future are not built around concrete models of the future. It is unsurprising that AI discussions follow this pattern.
One can still challenge that one needs short-term predictions about AI progress to make long-term predictions. It is not obvious long-term prediction is harder, or that it depends upon short-term predictions. AGI might come purely from incremental machine learning progress. It might require major insights. It might not come from machine learning.
There are many ways to then conclude that AGI is far away where far away means decades out. Not that decades out is all that far away. Eliezer conflating the two should freak you out. AGI reliably forty years away would be quite the fire alarm.
You could think there isn’t much machine learning progress, or that progress is nearing its limits. You could think that progress will slow dramatically, perhaps because problems will get exponentially harder.
You might think problems will get exponentially harder and resources spent will get exponentially larger too, so estimates of future progress move mostly insofar as they move the expected growth rate of future invested resources.
You could think incentive gradients from building more profitable or higher scoring AIs won’t lead to AGIs, even if other machine learning paths might work. Dario Amodei says OpenAI is “following the gradient.”
You could believe our civilization incapable of effort that does not follow incentive gradients.
You might think that our civilization will collapse or cease to do such research before it gets to AGI.
You could think building an AGI would require doing a thing, and our civilization is no longer capable of doing things.
You could think that there is a lot of machine learning progress to be made between here and AGI, such that even upper bounds on current progress leave decades to go.
You could think that even a lot of the right machine learning progress won’t lead to AGI at all. Perhaps it is an entirely different type of thought. Perhaps it does not qualify as thought at all. We find more and more practical tasks that AIs can do with machine learning, but one can think both ‘there are a lot of tasks machine learning will learn to do’ and ‘machine learning in anything like its current form cannot, even fully developed, do all tasks needed for AGI.’
And so on.
Most of those don’t predict much about the next two years, other than a non-binding upper bound. With these models, when machine learning does a new thing, that teaches us more about that problem’s difficulty than about how fast machine learning is advancing.
Under these models, Go and Heads Up No-Limit Hold ’Em Poker are easier problems than we expected. We should update in favor of well-defined adversarial problems with compact state expressions but large branch trees being easier to solve. That doesn’t mean we shouldn’t update our progress estimates at all, but perhaps we shouldn’t update much.
This goes with everything AI learns to do ceasing to be AI.
Thus, one can reasonably have a model where impressiveness of short-term advances does not much move our AGI timelines.
I saw an excellent double crux on AI timelines, good enough to update me dramatically on the value of double crux and greatly enrich my model of AI timelines. Two smart, highly invested people had given the problem a lot of thought, and were doing their best to build models and assign probabilities and seek truth. Many questions came up. Short-term concrete predictions did not come up. At all.
VI
That does not mean any of that is what is happening.
I think mostly what Eliezer thinks is happening, is happening. People’s incentive gradients on short term questions say not to answer. People’s incentive gradients on long term questions say to have AGI be decades out. That’s mostly what they answer. Models might exist, but why let them change your answer? If you answer AGI is near and it doesn’t happen you look foolish. If you answer AGI is near and it happens, who cares what you said?
When asked a question, good thinkers generate as much model as they need. Less good thinkers, or the otherwise motivated, instead model of what it is in their interest to say.
Most people who say productive AI safety work cannot currently be done have not spent two hours thinking about what could currently be done. Again, that’s true of all problems. Most people never spend two hours thinking about what could be done about anything. Ever. See Eliezer entire essential sequence (sequence Y).
That is how someone got so frustrated with getting people to actually think about AI safety that he decided it would be easier to get them to actually think in general.
To do that, it’s important to be totally unfair to not thinking. Following incentive gradients and social queues and going around with inconsistent models and not trying things for even five minutes before declaring them impossible won’t cut it and that is totally not OK.
He emphasizes nature not grading on a curve, and fails everyone. Hard. The Way isn’t just A Thing, it’s a necessary thing.
Then we realize that no, it’s way worse than that. People are not only not following The Way. No one does the thing they are supposedly doing. The world is mad on a different level than inaccurate models without proper Bayesian updating and not stopping to think or try for five minutes once in their life let alone two hours. There are no models anywhere.
Fairness can’t always be a thing. Trying to make it a thing where it isn’t a thing tends to go quite badly.
Sometimes, though, you still need fairness. Without it groups can’t get along. Without it you can’t cooperate. Without it we treat thinking about a new and interesting question as evidence of a lack of thinking.
Holding everyone to heroic responsibility wins you few friends, influences few people and drives you insane.
VII
Where does that leave us? Besides the original takeaway that There Is No Fire Alarm For Artificial General Intelligence and we need to work on the problem now? And your periodic reminder that people are crazy and the world is mad?
Microfoundations are great, but some useful models don’t have them. It would be great if everyone had probabilistic time distributions for every possible event, but this is totally not reasonable, and totally not required to have a valid opinion. Some approaches answer some questions but not others.
We must hold onto our high standards for ourselves and those who opt into them. For others, we must think about circumstance and incentive, and stop at ‘tough, but fair.’
Predictions are valuable. They are hard to do well and socially expensive to do honestly. A culture of stating your probabilities upon request is good. Betting on your beliefs is better. Part of that is understanding not everyone has thought through everything. And understanding adverse selection and bad social odds. And realizing sometimes best guesses would get taken too seriously, or commit people to things. Sometimes people need to speak tentatively. Or say “I don’t know.” Or say nothing.
Allies won’t always ponder what you’re pondering. They aren’t perfectly rigorous thinkers. They don’t think hard for two hours about your problem. They don’t often make extraordinary efforts.
Most of what they want will involve social reality and incentive gradients and muddled thinking. They’re doing it for the wrong reasons. They will often be unreliable and untrustworthy. They’re defecting constantly.
You go to war with the army you have.
We can’t afford to hold everyone to impossible standards. Even holding ourselves to impossible standards requires psychologically safe ways to do that.
When someone genuinely thinks, and offers real answers, cheer that. Especially answers against interest. They do the best they can. From another perspective they could obviously do so much more, but one thing at a time.
Giving them the right social incentive gradient, even in a small way, matters a lot.
Someone is doing their best to break through the incentive gradients of social reality.
We can work with that.
Promoted to Featured, for being a solid critique of a point in a previous Featured post on LW (also, it was one by Eliezer, and I feel that good counterargument to Eliezer is rarer than I’d like).
It seems Eliezer still has guru status; I’m not sure the presence of a guru figure is best for promoting group epistemic rationality especially considering the Affect Heuristic, and the chill against dissenting opinions (especially when you’re disagreeing with the guru) which may lead to evaporative cooling of group beliefs.
I get the feeling that disagreeing with Eliezer is only well received when it is by other high status community members.
Promoted to the frontpage: I really like this post, though I do much prefer the first half over the second half. It participates in currently relevant discourse in important ways (and productively and publicly disagrees with Eliezer, which I expect to be undersupplied).
At the end things feel a bit more train-of-thought-like and a bit more like being in the middle of a bad trip (I’ve never been on a trip, but I watched a bunch of trailers for Fear and Loathing in Las Vegas, which hopefully compensates well-enough). And I guess I would prefer it to sustain its clarity all throughout, but I realize that some models are hard to communicate with stoic clarity.
It does feel too call-to-action-y overall, and as soon as I see more of the negative effects that I expect to come from that style of writing, I will significantly increase my threshold for promoting content with that style, and so in some feature this post might no longer make the cut for me.
What is the issue with having a strong call-to-action? As I’ve been taught, having a clear call-to-action is a pretty integral part of persuasive writing.
I agree looking over it that the first half is stronger. As you suggest, talking about a concrete thing is a lot easier to be clear about than talking about general things, especially hard to reason about general things. In many ways I think, looking back, that I felt the second half was ‘necessary’ to cover various concerns, including making things more explicit in various ways, but perhaps it wasn’t necessary?
On the call to action front, acknowledged and I’m certainly guilty at the end. I do want to use them far more sparingly and will start doing that real soon now.
It is very possible to talk about long term climate trends and predict future climate changes without being able to predict the weather; it seems Eliezer misses this.
And why should we privilege those models? Why should we assign significant priors to hypotheses that permit amazing short term AI improvements to not alter long term AI timelines? What evidence has brought those hypotheses to our attention?
What makes such a model reasonable? One can “reasonably have a model where impressiveness of short-term advances does not much move our AGI timelines”, but why should we trust such a model? Why would such a model be good in the first place.
How probable so we believe the underlying reasons behind such models are? For example, consider the “different kinds of thought” that Sarah mentions; I think Sarah completely misses the point.
The strength of a model is not what it permits, but what it doesn’t. A model that predicts A or !A is always accurate, but confers zero information. A model of AGI that predicts AGI being distant, but does not update towards AGI being near when impressive ML achievements occur is a model that makes me sceptical. It sounds a little too liberal for my tastes, and is bordering on being difficult to falsify.
A model that doesn’t update it’s AGI timelines nearer when impressive ML achievements occurs and doesn’t update its AGI timelines when ML predictions fail to realise seems like an unfalsifiable (and thus unscientific) model to me.
A model that doesn’t update it’s AGI timelines nearer when impressive ML achievements occurs and updates its AGI timelines farther when ML predictions fail to realise seems like a model that violates conservation of expected evidence.
A model that doesn’t update AGI timelines nearer proportional to the awesomeness of the ML achievement is a model I’m sceptical of. I’m sceptical of a model hat looks at an amazing ML achievement, and instead updates towards the problem being easier than initially expected—that’s a fully general counterargument and can apply to all ML achievements.
Emphatic agreement.
P.S: I cannot quote on mobile; the above is an improvisation.
Fixed it for you
Thank you very much.
I don’t think I understand this point. Is the conflation “having a model of the long-term that builds on a short-term model” and “having any model of the long term”, in which case the conflation is akin to expecting climate scientists to predict the weather? If so I agree that that’s a slip up, but my alarm level isn’t raised to “freaked out” yet, what am I missing?
I think the conflation is “decades out” and “far away”.
This is quite good. Don’t have much more to say, just, this struck a great balance of noting an important thing that was missing from Eliezer’s post, criticizing where applicable but still clearly maintaining a collaborative intellectual approach.
I’d nominated for this being fully promoted to Featured, since it seems like an important followup to a recent prominent featured post.
I think I’m going to stake out a general disagreement position with this post, mainly because: 1) I mostly disagree with it (I am not simply being a devil’s advocate) and 2) I haven’t seen any rebuttals to it yet. Sorry if this response is too long, and I hope my tone does not sound confrontational.
When I first read Eliezer’s post, it made a lot of sense to me and seemed to match with points he’s emphasized many times in the past. I would make a general summarization of the points I’m referring to as: There have been many situations throughout history where policy makers, academics, or other authorities have made fairly strong statements about the future and actions we should collectively take without using any reliable models. This has had pretty disastrous consequences.
In regards to the example of Eliezer supposedly asking an unfair question, the context that I grabbed from his post was that this occurred during a very important summit on AI safety and policy between academics and other luminaries. This was supposed to be a conference where these influential people were actually trying to decide on specific courses of action to take, not merely a media-related press extravaganza, or some kind of informal social gathering between important people in a private context. I don’t remember if he actually states what conference it was but I’m guessing it was probably the Asilomar conference that occurred earlier this year.
If it was an informal social gathering, I think I would agree that it would be sort of unfair to ask random people tough questions like this and expect queued up answers, but as it stands, I’m fairly certain this was an important meeting that could influence the course of events for many years to come. AI safety is only just starting to become accepted in the mainstream, so whatever occurs at these events has to sort of nudge it in the right direction.
So we essentially have a few important reasons why it’s ok to be blunt here and ask tough questions to a panel of AI experts. Eliezer stood up and asked a question he probably expected not to receive a great answer to right away, and he did it in front of a bunch of luminaries who may have been embarrassed by this. So Eliezer broke a social norm because this could have been interpreted as disrespectful to these people, and he probably lowered his own status in the process. This is a risky move.
But in doing this, he forced them to display a weakness in their understanding of this specific subject. He mentioned that most of them accepted with high confidence that AGI was “really far away” (which I suppose means long enough from now that we don’t have to worry that much). So they must believe they have some model, but under more scrutiny it appears that they really don’t.
You say it’s unfair of Eliezer to expect them to have a good model, and to have a good answer queued up, but I also think it’s unfair to claim AGI is very far away without having any models to back that up. What they say on that stage probably matters a lot.
It’s technically true that the question he asked was unfair, because I am pretty sure he expected not to receive a good answer and that was why he asked it. So perhaps it was not asked purely in the spirit of intellectual discourse, it had rhetorical motivations as well. We can call that unfair if we must.
But I am also fairly certain that it was an important move to make from a consequentialist standpoint. It might have been disrespectful to the panelists, but it could have made them stop to think about it, or perhaps made others see that our understanding isn’t quite good enough to make claims about what definitely should or should not be done about AI safety.
I think he was totally considering social cues and incentive gradients when he did this, and it was precisely because of them that he did. Influential people under a lot of spotlight and public scrutiny will be more
under pressure from these things. Therefore in order to give a “nudge”, if you’re someone who also happens to have a bit of influence, you might have to call them out in public a bit. It has a negative cost to you, but in the long run it might pay off.
I think it’s still a reasonable question whether or not this actually will pay off (will they try to more carefully consider their models of future AI development?) but I think his reasoning for doing this was pretty solid. I don’t get the impression that he’s demanding everyone has a solid model that makes predictions with hard numbers that they can query on demand, nor that he’s suggesting that we enforce negative social consequences for everyone for not having one.
Yes, you can always take into account everyone’s circumstances and incentives, but if those are generally pointing in a wrong enough direction for people who have real influence, I think it’s okay to do something about it.
I appreciated how Zvi presented different models of paths to AGI. People do believe many of these different models—I hear people discuss them in physical space conversation—but I haven’t seen many of these presented on the internet, apart from random Facebook discussion. Even if models are wrong, if people have put effort into them it’s useful to articulate them.
If I was asked any question of the form ‘what’s the least impressive X that you are very confident cannot be done in the next Y years?’, I would hesitate for a long time because it would take me a long time to parse the sentence and work out what a reply would even consist of.
I think that I am unusually dumb at parsing abstract sentences like this, so that may not apply to any people on the panel, but I’m not certain of that. (I have a physics PhD, so being dumb at parsing abstract sentences hasn’t excluded me from quantitative fields.)
I notice that I’m currently unable to intuitively hold the sense of this sentence in my head in one go, but I am able to generate answers anyway, by the method of coming up with a bunch of Xs that I think couldn’t be done in two years, and then looking for the least impressive such one. It feels kind of unsatisfying doing that when I can’t hold the sense of the whole problem in my head, and that slows me down.
If I was put on the spot in front of lots of people, though, I might just panic about being asked to parse an abstract sentence rather than doing any useful cognitive work, and not come up with much at all.