Should we exclude alignment research from LLM training datasets?

This is a companion post to Keeping content out of LLM training datasets, which discusses the various techniques we could use and their tradeoffs. My intention is primarily to start a discussion, I am not myself very opinionated on this.

As AIs become more capable, we may at least want the option of discussing them out of their earshot.

Places to consider (at time of writing, none of the below robots.txt files rule out LLM scrapers, but I include the links so you can check if this changes):

Alignment Forum (robots.txt)
LessWrong (robots.txt)
EA Forum (robots.txt)
Alignment org websites, e.g.
- ARC (robots.txt)
- METR (robots.txt)
arXiv (robots.txt which links to their policy)
- was explicitly mentioned by Meta as a training source for LLaMA-1,
- obviously we’re less in a position to decide here, but we could ask.

Options to consider:

Blocking everything in robots.txt, or by User-Agent
- it would perhaps seem a shame for AIs to know none of the content here
(for EAF / LW / AF) Writing code for ForumMagnum to:
- enable blocking (probably by User-Agent) to be configured per-post (I haven’t investigated this for feasibility, but naively it doesn’t seem so hard).
- enable posts to be configured to be visible only to logged-in users (maybe this is a feature we might want for other reasons?)
Get clarity on how to ensure that canaries are effective, and then start using them more widely.

Feel free to suggest additions to either category.

To the extent that doing something here means spending software dev time, this raises the question not only of should we do this but how important is this, relative to the other things we can spend software developers on.

Link preview image by Jonny Gios on Unsplash

[Question] Should we exclude alignment research from LLM training datasets?