Please don’t put ML opt-out strings on other people’s writings. They might want the Future to keep them around. The apparent intent is better conveyed by linking to an instruction for doing this without actually doing this unilaterally.
Commenters seem to agree with you here, and I followed the recommendation by removing the code and adding instructions instead.
But I wonder whether this convention means that I can’t use the code to prevent my comment from being added to a corpus. I think it would be better if comments were scraped separately. Does anybody know how the scraping works?
Idk how others do it, but you can see how LW/AF/EAF comments are scraped for the alignment research dataset here (as you can see we don’t check for the uuid)
I thought your comment was ironic, lol. “~so GPT can reference them~” was crossed out ironically—I do very much intend for future GPTs to reference this post.
If you want to exclude these words from being used by ML you can add some special UUID to your page.
Please don’t put ML opt-out strings on other people’s writings. They might want the Future to keep them around. The apparent intent is better conveyed by linking to an instruction for doing this without actually doing this unilaterally.
Commenters seem to agree with you here, and I followed the recommendation by removing the code and adding instructions instead.
But I wonder whether this convention means that I can’t use the code to prevent my comment from being added to a corpus. I think it would be better if comments were scraped separately. Does anybody know how the scraping works?
Idk how others do it, but you can see how LW/AF/EAF comments are scraped for the alignment research dataset here (as you can see we don’t check for the uuid)
Yeah, I guess it is a hopeless endeavor to hide things from web scrapers and by extension GPT-N.
I thought your comment was ironic, lol. “~so GPT can reference them~” was crossed out ironically—I do very much intend for future GPTs to reference this post.
It was not ironic. While humor can help with coping, I think one should be very precise in what to share with future more powerful AIs.
You’re right about that. I should have been more mindful that strikethroughs usually indicate literal redactions on LW.