ryan_greenblatt comments on EIS II: What is “Interpretability”?

ryan_greenblatt 9 Feb 2023 19:35 UTC
LW: 2 AF: 2
0
AF
Can you give examples of alignment research which isn’t interpretability research?
- ryan_greenblatt 9 Feb 2023 19:39 UTC
  LW: 3 AF: 2
  0
  AF Parent
  Fair enough if you’re interested in just talking about ‘approaches to acquiring information wrt. AIs’ and you’d like to call this interpretability.
- scasper 9 Feb 2023 20:00 UTC
  LW: 1 AF: 1
  0
  AF Parent
  There are not that many that I don’t think are fungible with interpretability work :)
  But I would describe most outer alignment work to be sufficiently different...