Empirically, I don’t think it’s true that you’d need to rely on superhuman intelligence. The latest paper from the totally anonymous and definitely not google team suggests PaL- I mean an anonymous 540B parameter model- was good enough to critique itself into better performance. Bootstrapping to some degree is apparently possible.
I don’t think this specific instance of the technique is enough by itself to get to spookyland, but it’s evidence that token bottlenecks aren’t going to be much of a concern in the near future. There are a lot of paths forward.
I’d also argue that it’s very possible for even current architectures to achieve superhuman performance in certain tasks that were not obviously present in its training set. As a trivial example, these token predictors are obviously superhuman at token predicting without having a bunch of text about the task of token predicting provided. If some technique serves the task of token prediction and can be represented within the model, it may arise as a result of helping to predict tokens better.
It’s hard to say exactly what techniques fall within this set of “representable techniques which serve token predicting.” The things an AI can learn from the training set isn’t necessarily the same thing as what a human would say the text is about. Even current kinda-dumb architectures can happen across non-obvious relationships that grow into forms of alien reasoning (which, for now, remain somewhat limited).
Even if you did that, you might need a superhuman intelligence to generate tokens of sufficient quality to further scale the output.
(Jay’s interpretation was indeed my intent.)
Empirically, I don’t think it’s true that you’d need to rely on superhuman intelligence. The latest paper from the totally anonymous and definitely not google team suggests PaL- I mean an anonymous 540B parameter model- was good enough to critique itself into better performance. Bootstrapping to some degree is apparently possible.
I don’t think this specific instance of the technique is enough by itself to get to spookyland, but it’s evidence that token bottlenecks aren’t going to be much of a concern in the near future. There are a lot of paths forward.
I’d also argue that it’s very possible for even current architectures to achieve superhuman performance in certain tasks that were not obviously present in its training set. As a trivial example, these token predictors are obviously superhuman at token predicting without having a bunch of text about the task of token predicting provided. If some technique serves the task of token prediction and can be represented within the model, it may arise as a result of helping to predict tokens better.
It’s hard to say exactly what techniques fall within this set of “representable techniques which serve token predicting.” The things an AI can learn from the training set isn’t necessarily the same thing as what a human would say the text is about. Even current kinda-dumb architectures can happen across non-obvious relationships that grow into forms of alien reasoning (which, for now, remain somewhat limited).