[Question] Papers to start getting into NLP-focused alignment research

I have recently started working towards a research-focused MSc in AI under the supervision of Dr. Pilehvar and Dr. Soleymani at Sharif University of Technology. (CV, LinkedIn)

Since we essentially get no funding here in Iran, I have ample freedom in choosing what research topics to pursue. What are some interesting papers that are alignment-adjacent in the NLP/​transformer space that I can read? I have taken a cursory look at the previously aggregated resources, but the papers there pattern-matched in my brain with non-concrete abstractions that speak in hypotheses (i.e., “philosophy”). I am inclined towards more mainstream/​technical works that I can apply to the models we already have. I like capability research, if that matters.

The fields which remind me of alignment:

  • interpretability

  • AI ethics

    • privacy

    • fairness and biases

  • out-of-distribution generalization

  • robustness to adversarial attacks

P.S.: Recent events in Iran might have created novel opportunities for effective altruism. Please take a moment to review the situation if you don’t already know the elephant in the room.

P.P.S: I assume I can’t apply for any grants because I am an Iranian and thus under sanctions. If you know otherwise, please let me know.

Related:

No answers.
No comments.