Yeah, some caveats I should’ve added in the interview:
Don’t listen to my project selection advice if you don’t like my research
The forward-chaining -style approach I’m advocating for is controversial among the alignment forum community (and less controversial in the ML/LLM research community and to some extent among LLM alignment groups)
Part of why I like this approach is that I (personally) think there are at least some somewhat promising agendas out there, that aren’t getting executed on enough (or much at all), and it’s doable to e.g. double the amount of good work happening on some agenda by executing quickly/well
If you don’t think existing agendas are that promising (or think they have more work done on them than they deserve), then this is the wrong approach
The back-chaining approach I’m advocating for is pretty standard in the alignment community, I think most alignment forum community researchers would probably endorse it. I’m also excited about this approach to research as well, and have done some work in this way as well (e.g., sleepers agents and model organisms of misalignment)
I’m guessing part of the disagreement here is coming from disagreement on how much alignment progress is idea/agenda bottlenecked vs. execution bottlenecked. I really like Tim Dettmer’s blog post on credit assignment in research, which has a good framework for thinking about when you’ll have more counterfactual impact working on ideas vs. working on execution.
Yeah, some caveats I should’ve added in the interview:
Don’t listen to my project selection advice if you don’t like my research
The forward-chaining -style approach I’m advocating for is controversial among the alignment forum community (and less controversial in the ML/LLM research community and to some extent among LLM alignment groups)
Part of why I like this approach is that I (personally) think there are at least some somewhat promising agendas out there, that aren’t getting executed on enough (or much at all), and it’s doable to e.g. double the amount of good work happening on some agenda by executing quickly/well
If you don’t think existing agendas are that promising (or think they have more work done on them than they deserve), then this is the wrong approach
The back-chaining approach I’m advocating for is pretty standard in the alignment community, I think most alignment forum community researchers would probably endorse it. I’m also excited about this approach to research as well, and have done some work in this way as well (e.g., sleepers agents and model organisms of misalignment)
I’m guessing part of the disagreement here is coming from disagreement on how much alignment progress is idea/agenda bottlenecked vs. execution bottlenecked. I really like Tim Dettmer’s blog post on credit assignment in research, which has a good framework for thinking about when you’ll have more counterfactual impact working on ideas vs. working on execution.