I’m curious about this specifically because I was physically in the MIRI office in the week that alphago came out, and floating around conversations that included Nate, Eliezer, Anna Salamon, Scott, Critch, Malo, etc., and I would not describe the mood as “blindsided” so much as “yep, here’s a bit of What Was Foretold; we didn’t predict this exact thing but this is exactly the kind of thing we’ve been telling everyone about ad infinitum; care to make bets about how much people update or not?”
But also, there may have been e.g. written commentary that you read, that I did not read, that says “we were blindsided.” i.e. I’m not saying you’re wrong, I’m saying I’m surprised, and I’m curious if you can explain away my surprise.
I was surprised alphago happened in january 2016 rather than november 2016. Everyone in the MIRI sphere seemed surprise it happened in the 2010s rather than the 2020s or 2030s. My sense at the time was that this was because they had little to no interest in tracking SOTA of the most capable algorithms inspired by the brain, due to what I still see as excessive focus on highly abstract and generalized relational reasoning about infinities. Though it’s not so bad now, and I think Discovering Agents, Kenton et al’s agent foundations work at deepmind, and the followup Interpreting systems as solving POMDPs: a step towards a formal understanding of agency, are all incredible work on Descriptive Models of Agency, so now the MIRI normative agent foundations approach does have a shot. But I think that MIRI could have come up with Discovering Agents if their attitude towards deep learning had been more open-minded. And I still find it very frustrating how they respond when I tell them that ASI is incredibly soon. It’s obvious to anyone who knows how to build it that there’s nothing left in the way, and that there could not possibly be; anyone who is still questioning that has deeply misunderstood deep learning, and while they shouldn’t update hard off my assertion, they should take a step back and realize that the model that deep learning and connectionism doesn’t work has been incrementally shredded over time and it is absolutely reasonable to expect that exact distillations of intelligence can be discovered by connectionism. (though maybe not deep gradient backprop. We’ll see about that one.)
I agree voted you because, yes, they did see it as evidence they were right about some of their predictions, but it didn’t seem hard to me to predict it—all I’d been doing was, since my CFAR workshop in 2015, spending time reading lots and lots and lots of abstracts and attempting to estimate from abstracts which papers actually had real improvements to show in their bodies. Through this, I trained a mental model of the trajectory of abstracts that has held comfortably to this day and predicted almost every major transition of deep learning, including that language-grade conversation AI was soon, and I continue to believe that this mental model is easy to obtain and that the only way one fails to obtain it is stubbornness that it can’t be done.
Critch can tell you I was overconfident about the trajectory in early 2016, after alphago and before attention failed to immediately solve complicated problems. It took a while before someone pointed out attention was all you need and that recurrence was (for a while) a waste of effort, and that is what allowed things to finally get really high scores on big problems, but again, similar to go, attention had been starting to work in 2015-2016 as well.
If you’d like I can try to reconstruct this history more precisely with citations.
I don’t know if I need citations so much as “I’m curious what you observed that led you to store ‘Everyone in the MIRI sphere seemed surprise it happened in the 2010s rather than the 2020s or 2030s.’ in your brain.”
Like, I don’t need proof, but like … did you hear Eliezer say something, did you see some LW posts from MIRI researchers, is this more just a vibe that you picked up from people around MIRI and you figured it was representative of the people at MIRI, etc.
I don’t remember, so I’ll interpret this as a request for lit review (...partly just cuz I wanna). To hopefully reduce “you don’t exist” feeling, I recognize this isn’t quite what you asked and that I’m reinterpreting. ETA one to two days, I’ve been putting annoying practical human stuff off to do unnecessary lit review again today anyway.
I wouldn’t have predicted AlphaGo and lost money betting against the speed of its capability gains, because reality held a more extreme position than I did on the Yudkowsky-Hanson spectrum.
IMO AlphaGo happening so soon was an important update for Eliezer and a lot of MIRI. There are things about AlphaGo that matched our expectations (and e.g. caused Eliezer to write this post), but the timing was not one of them.
The parts of Gears’ account I’m currently skeptical of are:
Gears’ claim (IIUC) that ~every non-stupid AI researcher who was paying much attention knew in advance that Go was going to fall in ~2015. (For some value of “non-stupid” that’s a pretty large number of people, rather than just “me and two of my friends and maybe David Silver” or whatever.)
Gears’ claim that ML has been super predictable and that Gears has predicted it all so far (maybe I don’t understand what Gears is saying and they mean something weaker than what I’m hearing?).
Gears’ level of confidence in predicting imminent AGI. (Seems possible, but not my current guess.)
Gears’ claim (IIUC) that ~every non-stupid AI researcher who was paying much attention knew in advance that Go was going to fall in ~2015. (For some value of “non-stupid” that’s a pretty large number of people, rather than just “me and two of my friends and maybe David Silver” or whatever.)
Specifically the ones *working on or keeping up with* go could *see it coming* enough to *make solid research bets* about what would do it. If they had read up on go, their predictive distribution over next things to try contained the thing that would work well enough to be worth scaling seriously if you wanted to build the thing that worked. What I did was, as someone not able to implement it myself at the time, read enough of the go research and general pattern of neural network successes to have a solid hunch about what it looks like to approximate a planning trajectory with a neural network. It looked very much like the people actually doing the work at facebook were on the same track. What was surprising was mostly that google funded scaling it so early, which relied on them having found an algorithm that scaled well sooner than I expected, by a bit. Also, I lost a bet about how strong it would be; after updating on the matches from when it was initially announced, I thought it would win some but lose overall, instead it won outright.
Gears’ claim that ML has been super predictable and that Gears has predicted it all so far (maybe I don’t understand what Gears is saying and they mean something weaker than what I’m hearing?).
I have hardly predicted all ml, but I’ve predicted the overall manifold of which clusters of techniques would work well and have high success at what scales and what times. Until you challenged me to do it on manifold, I’d been intentionally keeping off the record about this except when trying to explain my intuitive/pretheoretic understanding of the general manifold of ML hunchspace, which I continue to claim is not that hard to do if you keep up with abstracts and let yourself assume it’s possible to form a reasonable manifold of what abstracts refine the possibility manifold. Sorry to make strong unfalsifiable claims, I’m used to it. But I think you’ll hear something similar—if phrased a bit less dubiously—from deep learning researchers experienced at picking which papers to work on in the pretheoretic regime. Approximately, it’s obvious to everyone who’s paying attention to a particular subset what’s next in that subset, but it’s not necessarily obvious how much compute it’ll take, whether you’ll be able to find hyperparameters that work, if your version of the idea is subtly corrupt, or whether you’ll be interrupted in the middle of thinking about it because boss wants a new vision model for ad ranking.
Gears’ level of confidence in predicting imminent AGI. (Seems possible, but not my current guess.)
Sure, I’ve been the most research-trajectory optimistic person in any deep learning room for a long time, and I often wonder if that’s because I’m arrogant enough to predict other people’s research instead of getting my year-scale optimism burnt out by the pain of the slog of hyperparameter searching ones own ideas, so I’ve been more calibrated about what other people’s clusters can do (and even less calibrated about my own). As a capabilities researcher, you keep getting scooped by someone else who has a bigger hyperparam search cluster! As a capabilities researcher, you keep being right about the algorithms’ overall structure, but now you can’t prove you knew it ahead of time in any detail! More effective capabilities researchers have this problem less, I’m certainly not one. Also, you can easily exceed my map quality by reading enough to train your intuitions about the manifold of what works—just drastically decrease your confidence in *everything* you’ve known since 2011 about what’s hard and easy on tiny computers, treat it as a palette of inspiration for what you can build now that computers are big. Roleplay as a 2015 capabilities researcher and try to use your map of the manifold of what algorithms work to predict whether each abstract will contain a paper that lives up to its claims. Just browse the arxiv, don’t look at the most popular papers, those have been filtered by what actually worked well.
Btw, call me gta or tgta or something. I’m not gears, I’m pre-theoretic map of or reference to them or something. ;)
Also, I should mention—Jessicata, jack gallagher, and poossssibly tsvi bt can tell you some of what I told them circa 2016-2017 about neural networks’ trajectory. I don’t know if they ever believed me until each thing was confirmed, and I don’t know which things they’d remember or exactly which things were confirmed as stated, but I definitely remember arguing in person in the MIRI office on addison, in the backest back room with beanbags and a whiteboard and if I remember correctly a dripping ceiling (though that’s plausibly just memory decay confusing references), that neural networks are a form of program inference that works with arbitrary complicated nonlinear programs given an acceptable network interference pattern prior, just a shitty one that needs a big network to have enough hypotheses to get it done (stated with the benefit of hindsight; it was a much lower quality claim at the time). I feel like that’s been pretty thoroughly demonstrated now, though pseudo-second-order gradient descent (ADAM and friends) still has weird biases that make its results less reliable than the proper version of itself. It’s so damn efficient, though, that you’d need a huge real-wattage power benefit to use something that was less informationally efficient relative to its vm.
I’m curious about this specifically because I was physically in the MIRI office in the week that alphago came out, and floating around conversations that included Nate, Eliezer, Anna Salamon, Scott, Critch, Malo, etc., and I would not describe the mood as “blindsided” so much as “yep, here’s a bit of What Was Foretold; we didn’t predict this exact thing but this is exactly the kind of thing we’ve been telling everyone about ad infinitum; care to make bets about how much people update or not?”
But also, there may have been e.g. written commentary that you read, that I did not read, that says “we were blindsided.” i.e. I’m not saying you’re wrong, I’m saying I’m surprised, and I’m curious if you can explain away my surprise.
I was surprised alphago happened in january 2016 rather than november 2016. Everyone in the MIRI sphere seemed surprise it happened in the 2010s rather than the 2020s or 2030s. My sense at the time was that this was because they had little to no interest in tracking SOTA of the most capable algorithms inspired by the brain, due to what I still see as excessive focus on highly abstract and generalized relational reasoning about infinities. Though it’s not so bad now, and I think Discovering Agents, Kenton et al’s agent foundations work at deepmind, and the followup Interpreting systems as solving POMDPs: a step towards a formal understanding of agency, are all incredible work on Descriptive Models of Agency, so now the MIRI normative agent foundations approach does have a shot. But I think that MIRI could have come up with Discovering Agents if their attitude towards deep learning had been more open-minded. And I still find it very frustrating how they respond when I tell them that ASI is incredibly soon. It’s obvious to anyone who knows how to build it that there’s nothing left in the way, and that there could not possibly be; anyone who is still questioning that has deeply misunderstood deep learning, and while they shouldn’t update hard off my assertion, they should take a step back and realize that the model that deep learning and connectionism doesn’t work has been incrementally shredded over time and it is absolutely reasonable to expect that exact distillations of intelligence can be discovered by connectionism. (though maybe not deep gradient backprop. We’ll see about that one.)
I agree voted you because, yes, they did see it as evidence they were right about some of their predictions, but it didn’t seem hard to me to predict it—all I’d been doing was, since my CFAR workshop in 2015, spending time reading lots and lots and lots of abstracts and attempting to estimate from abstracts which papers actually had real improvements to show in their bodies. Through this, I trained a mental model of the trajectory of abstracts that has held comfortably to this day and predicted almost every major transition of deep learning, including that language-grade conversation AI was soon, and I continue to believe that this mental model is easy to obtain and that the only way one fails to obtain it is stubbornness that it can’t be done.
Critch can tell you I was overconfident about the trajectory in early 2016, after alphago and before attention failed to immediately solve complicated problems. It took a while before someone pointed out attention was all you need and that recurrence was (for a while) a waste of effort, and that is what allowed things to finally get really high scores on big problems, but again, similar to go, attention had been starting to work in 2015-2016 as well.
If you’d like I can try to reconstruct this history more precisely with citations.
[editing with more points complete.]
I don’t know if I need citations so much as “I’m curious what you observed that led you to store ‘Everyone in the MIRI sphere seemed surprise it happened in the 2010s rather than the 2020s or 2030s.’ in your brain.”
Like, I don’t need proof, but like … did you hear Eliezer say something, did you see some LW posts from MIRI researchers, is this more just a vibe that you picked up from people around MIRI and you figured it was representative of the people at MIRI, etc.
I don’t remember, so I’ll interpret this as a request for lit review (...partly just cuz I wanna). To hopefully reduce “you don’t exist” feeling, I recognize this isn’t quite what you asked and that I’m reinterpreting. ETA one to two days, I’ve been putting annoying practical human stuff off to do unnecessary lit review again today anyway.
From AlphaGo Zero and the Foom Debate:
IMO AlphaGo happening so soon was an important update for Eliezer and a lot of MIRI. There are things about AlphaGo that matched our expectations (and e.g. caused Eliezer to write this post), but the timing was not one of them.
The parts of Gears’ account I’m currently skeptical of are:
Gears’ claim (IIUC) that ~every non-stupid AI researcher who was paying much attention knew in advance that Go was going to fall in ~2015. (For some value of “non-stupid” that’s a pretty large number of people, rather than just “me and two of my friends and maybe David Silver” or whatever.)
Gears’ claim that ML has been super predictable and that Gears has predicted it all so far (maybe I don’t understand what Gears is saying and they mean something weaker than what I’m hearing?).
Gears’ level of confidence in predicting imminent AGI. (Seems possible, but not my current guess.)
Specifically the ones *working on or keeping up with* go could *see it coming* enough to *make solid research bets* about what would do it. If they had read up on go, their predictive distribution over next things to try contained the thing that would work well enough to be worth scaling seriously if you wanted to build the thing that worked. What I did was, as someone not able to implement it myself at the time, read enough of the go research and general pattern of neural network successes to have a solid hunch about what it looks like to approximate a planning trajectory with a neural network. It looked very much like the people actually doing the work at facebook were on the same track. What was surprising was mostly that google funded scaling it so early, which relied on them having found an algorithm that scaled well sooner than I expected, by a bit. Also, I lost a bet about how strong it would be; after updating on the matches from when it was initially announced, I thought it would win some but lose overall, instead it won outright.
I have hardly predicted all ml, but I’ve predicted the overall manifold of which clusters of techniques would work well and have high success at what scales and what times. Until you challenged me to do it on manifold, I’d been intentionally keeping off the record about this except when trying to explain my intuitive/pretheoretic understanding of the general manifold of ML hunchspace, which I continue to claim is not that hard to do if you keep up with abstracts and let yourself assume it’s possible to form a reasonable manifold of what abstracts refine the possibility manifold. Sorry to make strong unfalsifiable claims, I’m used to it. But I think you’ll hear something similar—if phrased a bit less dubiously—from deep learning researchers experienced at picking which papers to work on in the pretheoretic regime. Approximately, it’s obvious to everyone who’s paying attention to a particular subset what’s next in that subset, but it’s not necessarily obvious how much compute it’ll take, whether you’ll be able to find hyperparameters that work, if your version of the idea is subtly corrupt, or whether you’ll be interrupted in the middle of thinking about it because boss wants a new vision model for ad ranking.
Sure, I’ve been the most research-trajectory optimistic person in any deep learning room for a long time, and I often wonder if that’s because I’m arrogant enough to predict other people’s research instead of getting my year-scale optimism burnt out by the pain of the slog of hyperparameter searching ones own ideas, so I’ve been more calibrated about what other people’s clusters can do (and even less calibrated about my own). As a capabilities researcher, you keep getting scooped by someone else who has a bigger hyperparam search cluster! As a capabilities researcher, you keep being right about the algorithms’ overall structure, but now you can’t prove you knew it ahead of time in any detail! More effective capabilities researchers have this problem less, I’m certainly not one. Also, you can easily exceed my map quality by reading enough to train your intuitions about the manifold of what works—just drastically decrease your confidence in *everything* you’ve known since 2011 about what’s hard and easy on tiny computers, treat it as a palette of inspiration for what you can build now that computers are big. Roleplay as a 2015 capabilities researcher and try to use your map of the manifold of what algorithms work to predict whether each abstract will contain a paper that lives up to its claims. Just browse the arxiv, don’t look at the most popular papers, those have been filtered by what actually worked well.
Btw, call me gta or tgta or something. I’m not gears, I’m pre-theoretic map of or reference to them or something. ;)
Also, I should mention—Jessicata, jack gallagher, and poossssibly tsvi bt can tell you some of what I told them circa 2016-2017 about neural networks’ trajectory. I don’t know if they ever believed me until each thing was confirmed, and I don’t know which things they’d remember or exactly which things were confirmed as stated, but I definitely remember arguing in person in the MIRI office on addison, in the backest back room with beanbags and a whiteboard and if I remember correctly a dripping ceiling (though that’s plausibly just memory decay confusing references), that neural networks are a form of program inference that works with arbitrary complicated nonlinear programs given an acceptable network interference pattern prior, just a shitty one that needs a big network to have enough hypotheses to get it done (stated with the benefit of hindsight; it was a much lower quality claim at the time). I feel like that’s been pretty thoroughly demonstrated now, though pseudo-second-order gradient descent (ADAM and friends) still has weird biases that make its results less reliable than the proper version of itself. It’s so damn efficient, though, that you’d need a huge real-wattage power benefit to use something that was less informationally efficient relative to its vm.