Hi Nathan, thanks for playing and pointing out the issue. My apologies for the inappropriate text.
Half the text samples are from Open Web Text, which is scraped web data that GPT2 was trained on. I don’t know the exact details, but I believe some of it was reddit and other places.
If you DM me the neurons address next time you see them, I can start compiling a filter. I will also try to look for an open source library to categorize into safe and not safe.
My apologies again. This is a beta experiment, thanks for putting up with this while I fix the issues.
You can use the “mp-net2” model from sentence transformers for zero-shot classification (scalar product between the text and the embeddings of “sex” and “violence”) decide a cut-off and you are done.
Hi Nathan, thanks for playing and pointing out the issue. My apologies for the inappropriate text.
Half the text samples are from Open Web Text, which is scraped web data that GPT2 was trained on. I don’t know the exact details, but I believe some of it was reddit and other places.
If you DM me the neurons address next time you see them, I can start compiling a filter. I will also try to look for an open source library to categorize into safe and not safe.
My apologies again. This is a beta experiment, thanks for putting up with this while I fix the issues.
You can use the “mp-net2” model from sentence transformers for zero-shot classification (scalar product between the text and the embeddings of “sex” and “violence”) decide a cut-off and you are done.
Thank you! i will put this on the TODO.