In some of the tests where there is asymptotic performance, it’s already pretty close to human or to 100% anyway (Lambada, Record, CoQA). In fact, when the performance is measured as accuracy, it’s impossible for performance not to be asymptotic.
The model has clear limitations which are discussed in the paper—particularly, the lack of bidirectionality—and I don’t think anyone actually expects scaling an unchanged GPT-3 architecture would lead to an Oracle AI, but it also isn’t looking like we will need some major breakthrough to do it.
In some of the tests where there is asymptotic performance, it’s already pretty close to human or to 100% anyway (Lambada, Record, CoQA). In fact, when the performance is measured as accuracy, it’s impossible for performance not to be asymptotic.
The model has clear limitations which are discussed in the paper—particularly, the lack of bidirectionality—and I don’t think anyone actually expects scaling an unchanged GPT-3 architecture would lead to an Oracle AI, but it also isn’t looking like we will need some major breakthrough to do it.
True. Do these tests scale out to super-human performance or are they capped at 100%?