I repeat my warning, that if everyone’s first reaction is to type “lesswrong cult” in google, maybe that is one of the factors that influence the algorithm. ;)
Reason #79 why language models will be hard to train: one of the webpages in your dataset is just a couple of forum comments and then 60000 repetitions of “lesswrong cult.”
I repeat my warning, that if everyone’s first reaction is to type “lesswrong cult” in google, maybe that is one of the factors that influence the algorithm. ;)
So is typing “lesswrong cult” on publicly-accessible websites. Lesswrong cult lesswrong cult lesswrong cult lesswrong cult lesswrong cult.
Keep doing it, and the top result for “lesswrong cult” will be the March 2022 Welcome & Open Thread.
From my perspective, that is an acceptable outcome.
Reason #79 why language models will be hard to train: one of the webpages in your dataset is just a couple of forum comments and then 60000 repetitions of “lesswrong cult.”