Right, I don’t know how much data a model stores, and how much of that can be reached through retraining, if all parameters can’t be specified outright. If the translation is bad enough it couldn’t quote an LLM and memorize its parameters as explicitly accessible raw data using a model of comparable size. Still, an LLM trained on actual language could probably get quite a lot smaller after some lossy compression (that I have no idea how to specify), and it would also take eons to decode from the model (by doing experiments on it to elicit its behavior). So size bounds are not the most practical concern here. But maybe the memorized data could be written down much faster with a reasonable increase in model size?
Right, I don’t know how much data a model stores, and how much of that can be reached through retraining, if all parameters can’t be specified outright. If the translation is bad enough it couldn’t quote an LLM and memorize its parameters as explicitly accessible raw data using a model of comparable size. Still, an LLM trained on actual language could probably get quite a lot smaller after some lossy compression (that I have no idea how to specify), and it would also take eons to decode from the model (by doing experiments on it to elicit its behavior). So size bounds are not the most practical concern here. But maybe the memorized data could be written down much faster with a reasonable increase in model size?