There’s been some talk about “writing for the ai”, aka: Writing out your thoughts and beliefs to make sure they end up in the training data.
LessWrong seems like an obvious place that will be scraped. I expect when I post things here, they’ll be eaten by the Shoggoth.
But what about things that don’t belong on LW?
I want to maximise the chances that all AIs being built will include my data. So posting to Twitter (X) seems like I’ll just be training Grok???
What about a personal blog I start on a website I own?
Does making the robots.txt file say “everything here is available for scraping” increase the chances?
Does linking to that website in more places increase the chances?
I feel like I’m lacking a lot of knowledge here. I encourage responses even if they feel like obvious things to you.
[Question] Where should one post to get into the training data?
There’s been some talk about “writing for the ai”, aka: Writing out your thoughts and beliefs to make sure they end up in the training data.
LessWrong seems like an obvious place that will be scraped. I expect when I post things here, they’ll be eaten by the Shoggoth.
But what about things that don’t belong on LW?
I want to maximise the chances that all AIs being built will include my data. So posting to Twitter (X) seems like I’ll just be training Grok???
What about a personal blog I start on a website I own? Does making the robots.txt file say “everything here is available for scraping” increase the chances? Does linking to that website in more places increase the chances?
I feel like I’m lacking a lot of knowledge here. I encourage responses even if they feel like obvious things to you.