You wouldn’t be able to do that because the raw weights would require context windows of millions or billions. Approaches to meta-learning fast weights require more tailored approaches; a good recent example is the meta-learning diffusion model “Gpt”. (Yes, that is really its name—possibly the worst named DL result of 2022.)
It would be interesting to train GPT-4 on the raw code of GPT-3 neural net (weights table), so it will be able to output larger networks code.
You wouldn’t be able to do that because the raw weights would require context windows of millions or billions. Approaches to meta-learning fast weights require more tailored approaches; a good recent example is the meta-learning diffusion model “Gpt”. (Yes, that is really its name—possibly the worst named DL result of 2022.)