ESRogs comments on Yudkowsky and Christiano discuss “Takeoff Speeds”

ESRogs 24 Nov 2021 19:48 UTC
6 points
But after the 10^10 point, something interesting happens: the score starts growing much faster (~N).
And for some tasks, the plot looks like a hockey stick (a sudden change from ~0 to almost-human).

...
Judging the preliminary results, the FOOM could start like this:
“The GPT-5 still sucks on most tasks. It’s mostly useless. But what if we increase parameters_num by 2? What could possibly go wrong?”
Hypothesis:
- doing things in the real world requires diverse skills (strong performance on a diverse set of tasks)
- hockey-sticking performance on a particular task makes that task no longer the constraint on what you can accomplish
- but now some other task is the bottleneck
- so, unless you can hockey-stick on all the tasks all at once, your overall ability to do things in the world will get smoothed out a bunch, even if it still grows very rapidly
- ESRogs 24 Nov 2021 20:01 UTC
  6 points
  Parent
  Seems like there’s a spectrum between smooth accelerating progress and discontinuous takeoff. And where we end up on that spectrum depends on a few things:
  - how much simple improvements (better architecture, more compute) help with a wide variety of tasks
  - how much improvements in AI systems is bottlenecked on those tasks
  - how many resources the world is pouring into finding and making those improvements
  Recent evidence (success of transformers, scaling laws) seems to suggest that Eliezer was right in the FOOM debate that simple input changes could make a large difference across a wide variety of tasks.
  It’s less clear to me though whether that means a local system is going to outcompete the rest of the economy, because it seems plausible to me that the rest of the economy is also going to be full-steam ahead searching the same improvement space that a local system will be searching.
  And I think in general real world complexity tends to smooth out lumpy graphs. As an example, even once we realize that GPT-2 is powerful and GPT-3 will be even better, there’s a whole bunch of engineering work that had to go into figuring out how to run such a big neural network across multiple machines.
  That kind of real-world messiness seems like it will introduce new bottlenecks at every step along the way, and every order-of-magnitude change in scale, which makes me think that the actual impact of AI will be a lot more smooth than we might otherwise think just based on simple architectures being generally useful and scalable.