Can you give examples of alignment research which isn’t interpretability research?
Fair enough if you’re interested in just talking about ‘approaches to acquiring information wrt. AIs’ and you’d like to call this interpretability.
There are not that many that I don’t think are fungible with interpretability work :)
But I would describe most outer alignment work to be sufficiently different...
Can you give examples of alignment research which isn’t interpretability research?
Fair enough if you’re interested in just talking about ‘approaches to acquiring information wrt. AIs’ and you’d like to call this interpretability.
There are not that many that I don’t think are fungible with interpretability work :)
But I would describe most outer alignment work to be sufficiently different...