Filtering out entire sites seems too broad and too crude to have much benefit.
I see plenty of room to turn this into a somewhat good proposal by having GPT-4 look through the dataset for a narrow set of topics. Something close to “how we will test AIs for deception”.
Yes, I think what I proposed here is the broadest and crudest thing that will work. It can of course be much more targeted to specific proposals or posts that we think are potentially most dangerous. Using existing language models to rank these is an interesting idea.
Filtering out entire sites seems too broad and too crude to have much benefit.
I see plenty of room to turn this into a somewhat good proposal by having GPT-4 look through the dataset for a narrow set of topics. Something close to “how we will test AIs for deception”.
Yes, I think what I proposed here is the broadest and crudest thing that will work. It can of course be much more targeted to specific proposals or posts that we think are potentially most dangerous. Using existing language models to rank these is an interesting idea.