Send me anonymous feedback: https://docs.google.com/forms/d/e/1FAIpQLScLKiFJbQiuRYBhrBbVYUo_c6Xf0f8DN_blbfpJ-2Ml39g1zA/viewform
Any type of feedback is welcome, including arguments that a post/comment I wrote is net negative.
Some quick info about me:
I have a background in computer science (BSc+MSc; my MSc thesis was in NLP and ML, though not in deep learning).
You can also find me on the EA Forum.
Feel free to reach out by sending me a PM. (Update: I’ve turned off email notifications for private messages. If you send me a time sensitive PM, consider also pinging me about it via the anonymous feedback link above.)
Maybe the question here is whether including certain texts in relevant training datasets can cause [language models that pose an x-risk] to be created X months sooner than otherwise.
The relevant texts I’m thinking about here are:
Descriptions of certain tricks to evade our safety measures.
Texts that might cause the ML model to (better) model AIS researchers or potential AIS interventions, or other potential AI systems that the model might cooperate with (or that might “hijack” the model’s logic).