The way I’m using “sensitivity”: sensitivity to X = the meaningfulness of X spurs responsive caring action.
I’m fine with that, although it seems important to have a definition for the more limited definition of sensitivity so we can keep track of that distinction: maybe adaptability?
One of the main concerns of the discourse of aligning AI can also be phrased as issues with internalization: specifically, that of internalizing human values. That is, an AI’s use of the word “yesterday” or “love” might only weakly refer to the concepts you mean.
Internalising values and internalising concepts are distinct. I can have a strong understanding of your definition of “good” and do the complete opposite.
This means being open to some amount of ontological shifts in our basic conceptualizations of the problem, which limits the amount you can do by building on current ontologies.
I think it’s reasonable to say something along the lines of: “AI safety was developed in a context where most folks weren’t expecting language models before ASI, so insufficient attention has been given to the potential of LLM’s to help fill in or adapt informal definitions. Even though folks who feel we need a strongly principled approach may be skeptical that this will work, there’s a decent argument that this should increase our chances of success on the margins”.
This seems to underrate the value of distribution. I suspect another factor to take into account is the degree of audience overlap. Like there’s a lot of value in booking a guest who has been on a bunch of podcasts, so long as your particular audience isn’t likely to have been exposed to them.