don’t you expect learning algorithms much more efficient than SGD to show up and accelerate a lot the rate of development of capabilities?
Brains use somewhat less lifetime training compute (perhaps 0 to a few OOM less) than GPT4, and 2 or 3 OOM less data, which provides existence proof of somewhat better scaling curves, along with some evidence that scaling curves much better than those brains are on are probably hard.
AI systems already train on the entire internet so I don’t see how that is an overhang.
There are diminishing returns to context for in-context learning; it is extremely RAM intensive and GPUs are RAM starved compared to the brain, and finally brains already use it with much longer context, so its more like one of the hard challenges to achieve brain parity at all rather than a big overhang.
Brains use somewhat less lifetime training compute (perhaps 0 to a few OOM less) than GPT4, and 2 or 3 OOM less data, which provides existence proof of somewhat better scaling curves, along with some evidence that scaling curves much better than those brains are on are probably hard.
AI systems already train on the entire internet so I don’t see how that is an overhang.
There are diminishing returns to context for in-context learning; it is extremely RAM intensive and GPUs are RAM starved compared to the brain, and finally brains already use it with much longer context, so its more like one of the hard challenges to achieve brain parity at all rather than a big overhang.