Papers to start getting into NLP-focused alignment research

I have recently started working towards a research-focused MSc in AI under the supervision of Dr. Pilehvar and Dr. Soleymani at Sharif University of Technology. (CV, LinkedIn)

Since we essentially get no funding here in Iran, I have ample freedom in choosing what research topics to pursue. What are some interesting papers that are alignment-adjacent in the NLP/transformer space that I can read? I have taken a cursory look at the previously aggregated resources, but the papers there pattern-matched in my brain with non-concrete abstractions that speak in hypotheses (i.e., “philosophy”). I am inclined towards more mainstream/technical works that I can apply to the models we already have. I like capability research, if that matters.

The fields which remind me of alignment:

interpretability
AI ethics
- privacy
- fairness and biases
out-of-distribution generalization
robustness to adversarial attacks

P.S.: Recent events in Iran might have created novel opportunities for effective altruism. Please take a moment to review the situation if you don’t already know the elephant in the room.

P.P.S: I assume I can’t apply for any grants because I am an Iranian and thus under sanctions. If you know otherwise, please let me know.

[Question] Papers to start getting into NLP-focused alignment research

Related: