I have recently started working towards a research-focused MSc in AI under the supervision of Dr. Pilehvar and Dr. Soleymani at Sharif University of Technology. (CV, LinkedIn)
Since we essentially get no funding here in Iran, I have ample freedom in choosing what research topics to pursue. What are some interesting papers that are alignment-adjacent in the NLP/transformer space that I can read? I have taken a cursory look at the previously aggregated resources, but the papers there pattern-matched in my brain with non-concrete abstractions that speak in hypotheses (i.e., “philosophy”). I am inclined towards more mainstream/technical works that I can apply to the models we already have. I like capability research, if that matters.
The fields which remind me of alignment:
interpretability
AI ethics
privacy
fairness and biases
out-of-distribution generalization
robustness to adversarial attacks
P.S.: Recent events in Iran might have created novel opportunities for effective altruism. Please take a moment to review the situation if you don’t already know the elephant in the room.
P.P.S: I assume I can’t apply for any grants because I am an Iranian and thus under sanctions. If you know otherwise, please let me know.
[Question] Papers to start getting into NLP-focused alignment research
I have recently started working towards a research-focused MSc in AI under the supervision of Dr. Pilehvar and Dr. Soleymani at Sharif University of Technology. (CV, LinkedIn)
Since we essentially get no funding here in Iran, I have ample freedom in choosing what research topics to pursue. What are some interesting papers that are alignment-adjacent in the NLP/transformer space that I can read? I have taken a cursory look at the previously aggregated resources, but the papers there pattern-matched in my brain with non-concrete abstractions that speak in hypotheses (i.e., “philosophy”). I am inclined towards more mainstream/technical works that I can apply to the models we already have. I like capability research, if that matters.
The fields which remind me of alignment:
interpretability
AI ethics
privacy
fairness and biases
out-of-distribution generalization
robustness to adversarial attacks
P.S.: Recent events in Iran might have created novel opportunities for effective altruism. Please take a moment to review the situation if you don’t already know the elephant in the room.
P.P.S: I assume I can’t apply for any grants because I am an Iranian and thus under sanctions. If you know otherwise, please let me know.
Related:
TAI Safety Bibliography
AI safety starter pack—EA Forum
Resources I send to AI researchers about AI safety—LessWrong
2021 AI Alignment Literature Review and Charity Comparison—LessWrong 2.0 viewer
AI Alignment Research Overview (Oct 2019) by Jacob Steinhardt
AI Alignment Curriculum—AGI Safety Fundamentals
colah’s blog
Information-Theoretic Probing with Minimum Description Length