From my perspective, most alignment work I’m interested in is just ML research. Most capabilities work is also just ML research. There are some differences between the flavors of ML research for these two, but it seems small.
So LLMs are about similarly good at accelerating the two.
There is also alignment researcher which doesn’t look like ML research (mostly mathematical theory or conceptual work).
For the type of conceptual work I’m most interested in (e.g. catching AIs red-handed) about 60-90% of the work is communication (writing things up in a way that they make sense to others, finding the right way to frame the ideas when talking to people, etc.) and LLMs could theoretically be pretty useful for this. For the actual thinking work, the LLMs are pretty worthless (and this is pretty close to philosophy).
For mathematical theory, I expect LLMs are somewhat worse at this than ML research, but there won’t clearly be a big gap going forward.
Yeah, I don’t think I have any disagreements there. I agree that current models lack important capabilities across all sorts of different dimensions.
So you agree with the claim that current LLMs are a lot more useful for accelerating capabilities work than they are for accelerating alignment work?
From my perspective, most alignment work I’m interested in is just ML research. Most capabilities work is also just ML research. There are some differences between the flavors of ML research for these two, but it seems small.
So LLMs are about similarly good at accelerating the two.
There is also alignment researcher which doesn’t look like ML research (mostly mathematical theory or conceptual work).
For the type of conceptual work I’m most interested in (e.g. catching AIs red-handed) about 60-90% of the work is communication (writing things up in a way that they make sense to others, finding the right way to frame the ideas when talking to people, etc.) and LLMs could theoretically be pretty useful for this. For the actual thinking work, the LLMs are pretty worthless (and this is pretty close to philosophy).
For mathematical theory, I expect LLMs are somewhat worse at this than ML research, but there won’t clearly be a big gap going forward.