I’ll just note that several of these bets don’t work as well if I expect discontinuous &/or inconsistently distributed progress. As was observed on many of individual tasks in PaLM: https://twitter.com/LiamFedus/status/1511023424449114112 (obscured by % performance & by the top-level benchmark averaging 24 subtasks that spike at different levels of scaling)
I might expect performance just prior to AGI to be something like 99% 40% 98% 80% on 4 subtasks, where parts of the network developed (by gd) for certain subtasks enable more general capabilities
Naive analogy: two tasks for humans: (1) tell time (2) understand mechanical gears. Training a human on (1) will outperform (2) for a good while, but once they get a really good model for (2) they can trivially do (1) & performance would spike dramatically
Trivially do better than the naive thing I human would do*, sry (e.g. v.s. looking at the sun & seasons, which is what I think human trying to tell time would do to locally improve). Definitely agree can’t trivially do a great job on traditional standards. Wasn’t a carefully chosen example
The broader point was that some subskills can enable better performance at many tasks, which causes spiky performance in humans at least. I see no reason why this wouldn’t apply to nns. (e.g. part of the nn develops a model of something for one task, once it’s good enough discovers that it can use that model for very good performance on an entirely different task—likely observed as a relatively sudden, significant improvement)
I’ll just note that several of these bets don’t work as well if I expect discontinuous &/or inconsistently distributed progress. As was observed on many of individual tasks in PaLM: https://twitter.com/LiamFedus/status/1511023424449114112 (obscured by % performance & by the top-level benchmark averaging 24 subtasks that spike at different levels of scaling)
I might expect performance just prior to AGI to be something like 99% 40% 98% 80% on 4 subtasks, where parts of the network developed (by gd) for certain subtasks enable more general capabilities
Naive analogy: two tasks for humans: (1) tell time (2) understand mechanical gears. Training a human on (1) will outperform (2) for a good while, but once they get a really good model for (2) they can trivially do (1) & performance would spike dramatically
[Citation Needed]
Designing an accurate mechanical clock is non-trivial[1], even assuming knowledge of gears
Understatement.
Trivially do better than the naive thing I human would do*, sry (e.g. v.s. looking at the sun & seasons, which is what I think human trying to tell time would do to locally improve). Definitely agree can’t trivially do a great job on traditional standards. Wasn’t a carefully chosen example
The broader point was that some subskills can enable better performance at many tasks, which causes spiky performance in humans at least. I see no reason why this wouldn’t apply to nns. (e.g. part of the nn develops a model of something for one task, once it’s good enough discovers that it can use that model for very good performance on an entirely different task—likely observed as a relatively sudden, significant improvement)