No, because we have tons of information about what specific kinds of information on the internet is/isn’t usually fabricated. It’s not like we have no idea at all which internet content is more/less likely to be fabricated.
Information about, say, how to prove that there are infinitely many primes is probably not usually fabricated. It’s standard basic material, there’s lots of presentations of it, it’s not the sort of thing which people usually troll about. Yes, the distribution of internet text about the infinitude of primes contains more-than-zero trolling and mistakes and the like, but that’s not the typical case, so low-temperature sampling from the LLM should usually work fine for that use-case.
On the other end of the spectrum, “fusion power plant blueprints” on the internet today will obviously be fictional and/or wrong, because nobody currently knows how to build a fusion power plant which works. This generalizes to most use-cases in which we try to get an LLM to do something (using only prompting on a base model) which nobody is currently able to do. Insofar as the LLM is able to do such things, that actually reflects suboptimal next-text prediction on its part.
I would add “and the kind of content you want to get from aligned AGI definitely is fabricated on the Internet today”. So the powerful LM trying to predict it will predict how the fabrication would look like.
“Some content on the Internet is fabricated, and therefore we can never trust LMs trained on it”
Is this a fair summary?
No, because we have tons of information about what specific kinds of information on the internet is/isn’t usually fabricated. It’s not like we have no idea at all which internet content is more/less likely to be fabricated.
Information about, say, how to prove that there are infinitely many primes is probably not usually fabricated. It’s standard basic material, there’s lots of presentations of it, it’s not the sort of thing which people usually troll about. Yes, the distribution of internet text about the infinitude of primes contains more-than-zero trolling and mistakes and the like, but that’s not the typical case, so low-temperature sampling from the LLM should usually work fine for that use-case.
On the other end of the spectrum, “fusion power plant blueprints” on the internet today will obviously be fictional and/or wrong, because nobody currently knows how to build a fusion power plant which works. This generalizes to most use-cases in which we try to get an LLM to do something (using only prompting on a base model) which nobody is currently able to do. Insofar as the LLM is able to do such things, that actually reflects suboptimal next-text prediction on its part.
I would add “and the kind of content you want to get from aligned AGI definitely is fabricated on the Internet today”. So the powerful LM trying to predict it will predict how the fabrication would look like.