It’s not exactly clear what you do with such a story or what the upside is, it’s kind of a vague theory of change and most people have some specific theory of change they are more excited about (even if this kind of story is a bit of a public good that’s useful on a broader variety of perspectives / to people who are skeptical).
Ah, interesting! I’m surprised to hear that. I was under the impression that while many researchers had a specific theory of change, it was often motivated by an underlying threat model, and that different threat models lead to different research interests.
Eg, someone worries about a future where AI control the world but are not human comprehensible, feels very different from someone worried about a world where we produce an expected utility maximiser that has a subtly incorrect objective, resulting in bad convergent instrumental goals.
Do you think this is a bad model of how researchers think? Or are you, eg, arguing that having a detailed, concrete story isn’t important here, just the vague intuition for how AI goes wrong?
I think most people have expectations regarding e.g. how explicitly will systems represent their preferences, how much will they have preferences, how will that relate to optimization objectives used in ML training, how well will they be understood by humans, etc.
Then there’s a bunch of different things you might want: articulations of particular views on some of those questions, stories that (in virtue of being concrete) show a whole set of guesses and how they can lead to a bad or good outcome, etc. My bullet points were mostly regarding the exercise of fleshing out a particular story (which is therefore most likely to be wrong), rather than e.g. thinking about particular questions about the future.
Ah, interesting! I’m surprised to hear that. I was under the impression that while many researchers had a specific theory of change, it was often motivated by an underlying threat model, and that different threat models lead to different research interests.
Eg, someone worries about a future where AI control the world but are not human comprehensible, feels very different from someone worried about a world where we produce an expected utility maximiser that has a subtly incorrect objective, resulting in bad convergent instrumental goals.
Do you think this is a bad model of how researchers think? Or are you, eg, arguing that having a detailed, concrete story isn’t important here, just the vague intuition for how AI goes wrong?
I think most people have expectations regarding e.g. how explicitly will systems represent their preferences, how much will they have preferences, how will that relate to optimization objectives used in ML training, how well will they be understood by humans, etc.
Then there’s a bunch of different things you might want: articulations of particular views on some of those questions, stories that (in virtue of being concrete) show a whole set of guesses and how they can lead to a bad or good outcome, etc. My bullet points were mostly regarding the exercise of fleshing out a particular story (which is therefore most likely to be wrong), rather than e.g. thinking about particular questions about the future.