Glad to see you’re working on this. It seems even more clearly correct (the goal, at least :)) for not-so-short timelines. Less clear how best to go about it, but I suppose that’s rather the point!
A few thoughts:
I expect it’s unusual that [replace methodology-1 with methodology-2] will be a pareto improvement: other aspects of a researcher’s work will tend to have adapted to fit methodology-1. So I don’t think the creation of some initial friction is a bad sign. (also mirrors therapy—there’s usually a [take things apart and better understand them] phase before any [put things back together in a more adaptive pattern] phase)
It might be useful to predict this kind of thing ahead of time, to develop a sense of when to expect specific side-effects (and/or predictably unpredictable side effects).
I do think it’s worth interviewing at least a few carefully selected non-alignment researchers. I basically agree with your alignment-is-harder case. However, it also seems most important to be aware of things the field is just completely missing.
In particular, this may be useful where some combination of cached methodologies is a local maximum for some context. Knowing something about other hills seems useful here.
I don’t expect it’d work to import full sets of methodologies from other fields, but I do expect there are useful bits-of-information to be had.
Similarly, if thinking about some methodology x that most alignment researchers currently use, it might be useful to find and interview other researchers that don’t use x. Are they achieving [things-x-produces] in other ways? What other aspects of their methodology are missing/different?
This might hint both at how a methodology change may impact alignment researchers, and how any negative impact might be mitigated.
Worth considering that there’s less of a risk in experimenting (kindly, that is) on relative newcomers than on experienced researchers. It’s a good idea to get a clear understanding of the existing process of experienced researchers. However, once we’re in [try this and see what happens] mode there’s much less downside with new people—even abject failure is likely to be informative, and the downside in counterfactual object-level research lost is much smaller in expectation.
Thanks for the kind words and useful devil’s advocate! (I’m expecting nothing less from you ;p)
I expect it’s unusual that [replace methodology-1 with methodology-2] will be a pareto improvement: other aspects of a researcher’s work will tend to have adapted to fit methodology-1. So I don’t think the creation of some initial friction is a bad sign. (also mirrors therapy—there’s usually a [take things apart and better understand them] phase before any [put things back together in a more adaptive pattern] phase)
It might be useful to predict this kind of thing ahead of time, to develop a sense of when to expect specific side-effects (and/or predictably unpredictable side effects)
I agree that pure replacement of methodology is a massive step that is probably premature before we have a really deep understanding both of the researcher’s approach and of the underlying algorithm for knowledge production. Which is why in my model, this comes quite late; instead the first step are more revealing the cached methodology to the researcher, and showing alternatives from History of Science (and Technology) to make more options and approaches credible for them.
Also looking at the “sins of the fathers” for philosophy of science (how methodologies have fucked up people across history) is part of our last set of framing questions. ;)
I do think it’s worth interviewing at least a few carefully selected non-alignment researchers. I basically agree with your alignment-is-harder case. However, it also seems most important to be aware of things the field is just completely missing.
In particular, this may be useful where some combination of cached methodologies is a local maximum for some context. Knowing something about other hills seems useful here.
I don’t expect it’d work to import full sets of methodologies from other fields, but I do expect there are useful bits-of-information to be had.
Similarly, if thinking about some methodology x that most alignment researchers currently use, it might be useful to find and interview other researchers that don’t use x. Are they achieving [things-x-produces] in other ways? What other aspects of their methodology are missing/different?
This might hint both at how a methodology change may impact alignment researchers, and how any negative impact might be mitigated.
Two reactions here:
I agree with the need to find things that are missing and alternatives, which is where the history and philosophy of science works come to help. One advantage of it is that you can generally judge whether the methodology was successful or problematic in hindsight there, compared to interviews.
I hadn’t thought about interviewing other researchers. I expect it to be less efficient in a lot of ways than the HPS work, but I’m also now on the lookout for the option, so thanks!
Worth considering that there’s less of a risk in experimenting (kindly, that is) on relative newcomers than on experienced researchers. It’s a good idea to get a clear understanding of the existing process of experienced researchers. However, once we’re in [try this and see what happens] mode there’s much less downside with new people—even abject failure is likely to be informative, and the downside in counterfactual object-level research lost is much smaller in expectation.
I see what you’re pointing out. A couple related thoughts:
The benefits of working with established researchers is that you have a historical record of what they did, which makes it easier to judge whether you’re actually helping.
I also expect helping established researchers to be easier on some dimensions, because they have more experience learning new models and leveraging them.
Related to your first point, I don’t worry too much about messing people up because the initial input will far less invasive than replacements of methodologies wholesale. But we’re still investigating the risks to be sure we’re not doing something net negative.
Glad to see you’re working on this. It seems even more clearly correct (the goal, at least :)) for not-so-short timelines. Less clear how best to go about it, but I suppose that’s rather the point!
A few thoughts:
I expect it’s unusual that [replace methodology-1 with methodology-2] will be a pareto improvement: other aspects of a researcher’s work will tend to have adapted to fit methodology-1. So I don’t think the creation of some initial friction is a bad sign. (also mirrors therapy—there’s usually a [take things apart and better understand them] phase before any [put things back together in a more adaptive pattern] phase)
It might be useful to predict this kind of thing ahead of time, to develop a sense of when to expect specific side-effects (and/or predictably unpredictable side effects).
I do think it’s worth interviewing at least a few carefully selected non-alignment researchers. I basically agree with your alignment-is-harder case. However, it also seems most important to be aware of things the field is just completely missing.
In particular, this may be useful where some combination of cached methodologies is a local maximum for some context. Knowing something about other hills seems useful here.
I don’t expect it’d work to import full sets of methodologies from other fields, but I do expect there are useful bits-of-information to be had.
Similarly, if thinking about some methodology x that most alignment researchers currently use, it might be useful to find and interview other researchers that don’t use x. Are they achieving [things-x-produces] in other ways? What other aspects of their methodology are missing/different?
This might hint both at how a methodology change may impact alignment researchers, and how any negative impact might be mitigated.
Worth considering that there’s less of a risk in experimenting (kindly, that is) on relative newcomers than on experienced researchers. It’s a good idea to get a clear understanding of the existing process of experienced researchers. However, once we’re in [try this and see what happens] mode there’s much less downside with new people—even abject failure is likely to be informative, and the downside in counterfactual object-level research lost is much smaller in expectation.
Thanks for the kind words and useful devil’s advocate! (I’m expecting nothing less from you ;p)
I agree that pure replacement of methodology is a massive step that is probably premature before we have a really deep understanding both of the researcher’s approach and of the underlying algorithm for knowledge production. Which is why in my model, this comes quite late; instead the first step are more revealing the cached methodology to the researcher, and showing alternatives from History of Science (and Technology) to make more options and approaches credible for them.
Also looking at the “sins of the fathers” for philosophy of science (how methodologies have fucked up people across history) is part of our last set of framing questions. ;)
Two reactions here:
I agree with the need to find things that are missing and alternatives, which is where the history and philosophy of science works come to help. One advantage of it is that you can generally judge whether the methodology was successful or problematic in hindsight there, compared to interviews.
I hadn’t thought about interviewing other researchers. I expect it to be less efficient in a lot of ways than the HPS work, but I’m also now on the lookout for the option, so thanks!
I see what you’re pointing out. A couple related thoughts:
The benefits of working with established researchers is that you have a historical record of what they did, which makes it easier to judge whether you’re actually helping.
I also expect helping established researchers to be easier on some dimensions, because they have more experience learning new models and leveraging them.
Related to your first point, I don’t worry too much about messing people up because the initial input will far less invasive than replacements of methodologies wholesale. But we’re still investigating the risks to be sure we’re not doing something net negative.