And I haven’t heard of anyone saying GPT-3 can be made into AGI with a bit of tweaking, scaling, and prompt engineering.
I am one who says that (not certain, but high probability), so i thought I will chime in.
The main ideas of my belief is that
Kaplan paper/chinchilla paper shows the function between resources and cross entropy loss. With high probability I believe that this scaling won’t break down significantly, ie. We can get ever closer to the theoretical irreducible entropy with transformer architectures.
Cross entropy loss measures the distance between two probability distributions, in this case the distribution of human generated text (encoded with tokens) and the empirical distribution generated by the model. I believe with high probability that this measure is relevant, ie we can only get to a low enough cross entropy loss when the model is capable of doing human comparable intellectual work (irrespective of it actually doing it).
After the model achieves the necessary cross entropy loss and consequently becomes capable somewhere in it to produce agi level work (as per 2.), we can get the model to output that level of work with minor tweaks (I don’t have specifics, but think on the level of letting the model to recusrively call itself on some generated text with a special output command or some such)
I don’t think prompt engineering is relevant to agi.
I would be glad for any information that can help me update.
I am one who says that (not certain, but high probability), so i thought I will chime in. The main ideas of my belief is that
Kaplan paper/chinchilla paper shows the function between resources and cross entropy loss. With high probability I believe that this scaling won’t break down significantly, ie. We can get ever closer to the theoretical irreducible entropy with transformer architectures.
Cross entropy loss measures the distance between two probability distributions, in this case the distribution of human generated text (encoded with tokens) and the empirical distribution generated by the model. I believe with high probability that this measure is relevant, ie we can only get to a low enough cross entropy loss when the model is capable of doing human comparable intellectual work (irrespective of it actually doing it).
After the model achieves the necessary cross entropy loss and consequently becomes capable somewhere in it to produce agi level work (as per 2.), we can get the model to output that level of work with minor tweaks (I don’t have specifics, but think on the level of letting the model to recusrively call itself on some generated text with a special output command or some such)
I don’t think prompt engineering is relevant to agi.
I would be glad for any information that can help me update.