I’m not too concerned about this. ML skills are not sufficient to do good alignment work, but they seem to be very important for like 80% of alignment work and make a big difference in the impact of research (although I’d guess still smaller than whether the application to alignment is good)
The explosion of research in the last ~year is partially due to an increase in the number of people in the community who work with ML. Maybe you would argue that lots of current research is useless, but it seems a lot better than only having MIRI around
The field of machine learning at large is in many cases solving easier versions of problems we have in alignment, and therefore it makes a ton of sense to have ML research experience in those areas. E.g. safe RL is how to get safe policies when you can optimize over policies and know which states/actions are safe; alignment can be stated as a harder version of this where we also need to deal with value specification, self-modification, instrumental convergence etc.
I should have said ‘prestige within capabilities research’ rather than ML skills which seems straightforwardly useful.
The former is seems highly corruptive.
I’m not too concerned about this. ML skills are not sufficient to do good alignment work, but they seem to be very important for like 80% of alignment work and make a big difference in the impact of research (although I’d guess still smaller than whether the application to alignment is good)
Primary criticisms of Redwood involve their lack of experience in ML
The explosion of research in the last ~year is partially due to an increase in the number of people in the community who work with ML. Maybe you would argue that lots of current research is useless, but it seems a lot better than only having MIRI around
The field of machine learning at large is in many cases solving easier versions of problems we have in alignment, and therefore it makes a ton of sense to have ML research experience in those areas. E.g. safe RL is how to get safe policies when you can optimize over policies and know which states/actions are safe; alignment can be stated as a harder version of this where we also need to deal with value specification, self-modification, instrumental convergence etc.
I mostly agree with this.
I should have said ‘prestige within capabilities research’ rather than ML skills which seems straightforwardly useful. The former is seems highly corruptive.