julianjm comments on Extrapolating GPT-N performance

julianjm 24 Dec 2020 9:04 UTC
LW: 2 AF: 2
AF
Re: how to update based on benchmark progress in general, see my response to you above.
On the rest, I think the best way I can think of explaining this is in terms of alignment and not correctness.
My naive interpretation is that we only use ML when we can’t be bothered to write a traditional solution, but I don’t think you believe that. (To take a trivial example: ML can recognise birds far better than any software we can write.)
The bird example is good. My contention is basically that when it comes to making something like “recognizing birds” economically useful, there is an enormous chasm between 90% performance on a subset of ImageNet and money in the bank. For two reasons, among others:
- Alignment. What do we mean by “recognize birds”? Do pictures of birds count? Cartoon birds? Do we need to identify individual organisms e.g. for counting birds? Are some kinds of birds excluded?
- Engineering. Now that you have a module which can take in an image and output whether it has a bird in it, how do you produce value?
I’ll admit that this might seem easy to do, and that ML is doing pretty much all the heavy lifting here. But my take on that is it’s because object recognition/classification is a very low-level and automatic, sub-cognitive, thing. Once you start getting into questions of scene understanding, or indeed language understanding, there is an explosion of contingencies beyond silly things like cartoon birds. What humans are really really good at is understanding these (often unexpected) contingencies in the context of their job and business’s needs, and acting appropriately. At what point would you be willing to entrust an ML system to deal with entirely unexpected contingencies in a way that suits your business needs (and indeed, doesn’t tank them)? Even the highest level of robustness on known contingencies may not be enough, because almost certainly, the problem is fundamentally underspecified from the instructions and input data. And so, in order to successfully automate the task, you need to successfully characterize the full space of contingencies you want the worker to deal with, perhaps enforcing it by the architecture of your app or business model. And this is where the design, software engineering, and domain-specific understanding aspects come in. Because no matter how powerful our ML systems are, we only want to use them if they’re aligned (or if, say, we have some kind of bound on how pathologically they may behave, or whatever), and knowing that is in general very hard. More powerful ML does make the construction of such systems easier, but is in some way orthogonal to the alignment problem. I would make this more concrete but I’m tired so I hope the concrete examples I gave earlier in the discussion serve as inspiration enough.
And, yeah. I should also clarify that my position is in some way contingent on ML already being good enough to eat all kinds of stuff that 10 years ago would be unheard of. I don’t mean to dunk on ML’s economic value. But basically what I think is that a lot of pretty transformative AI is already here. The market has taken up a lot of it, but I’m sure there’s plenty more slack to be made up in terms of productivity gains from today’s (and yesterday’s) ML. This very well might result in a doubling of worker productivity, which we’ve seen many times before and which seems to meet some definition of “producing the majority of the economic value that a human is capable of.” Maybe if I had a better sense of the vision of “transformative AI” I would be able to see more clearly how ML progress relates to it. But again, even then I don’t necessarily see the connection to extrapolation on benchmarks, which are inherently just measuring sticks of their day and kind of separate from the economic questions.
Anyway, thanks for engaging. I’m probably going to duck out of responding further because of holiday and other duties, but I’ve enjoyed this exchange. It’s been a good opportunity to express, refine, & be challenged on my views. I hope you’ve felt that it’s productive as well.
- Lukas Finnveden 24 Dec 2020 15:37 UTC
  LW: 1 AF: 1
  AF Parent
  This has definitely been productive for me. I’ve gained useful information, I see some things more clearly, and I’ve noticed some questions I still need to think a lot more about. Thanks for taking the time, and happy holidays!