Hmm. Have you tried to have conversations with Claude or other LLMs for the purpose of alignment work? If so, what happened?
For me, what happens is that Claude tries to work constitutional AI in as the solution to most problems. This is part of what I mean by “bad at philosophy”.
But more generally, I have a sense that I just get BS from Claude, even when it isn’t specifically trying to shoehorn its own safety measures in as the solution.
From my perspective, most alignment work I’m interested in is just ML research. Most capabilities work is also just ML research. There are some differences between the flavors of ML research for these two, but it seems small.
So LLMs are about similarly good at accelerating the two.
There is also alignment researcher which doesn’t look like ML research (mostly mathematical theory or conceptual work).
For the type of conceptual work I’m most interested in (e.g. catching AIs red-handed) about 60-90% of the work is communication (writing things up in a way that they make sense to others, finding the right way to frame the ideas when talking to people, etc.) and LLMs could theoretically be pretty useful for this. For the actual thinking work, the LLMs are pretty worthless (and this is pretty close to philosophy).
For mathematical theory, I expect LLMs are somewhat worse at this than ML research, but there won’t clearly be a big gap going forward.
Hmm. Have you tried to have conversations with Claude or other LLMs for the purpose of alignment work? If so, what happened?
For me, what happens is that Claude tries to work constitutional AI in as the solution to most problems. This is part of what I mean by “bad at philosophy”.
But more generally, I have a sense that I just get BS from Claude, even when it isn’t specifically trying to shoehorn its own safety measures in as the solution.
Yeah, I don’t think I have any disagreements there. I agree that current models lack important capabilities across all sorts of different dimensions.
So you agree with the claim that current LLMs are a lot more useful for accelerating capabilities work than they are for accelerating alignment work?
From my perspective, most alignment work I’m interested in is just ML research. Most capabilities work is also just ML research. There are some differences between the flavors of ML research for these two, but it seems small.
So LLMs are about similarly good at accelerating the two.
There is also alignment researcher which doesn’t look like ML research (mostly mathematical theory or conceptual work).
For the type of conceptual work I’m most interested in (e.g. catching AIs red-handed) about 60-90% of the work is communication (writing things up in a way that they make sense to others, finding the right way to frame the ideas when talking to people, etc.) and LLMs could theoretically be pretty useful for this. For the actual thinking work, the LLMs are pretty worthless (and this is pretty close to philosophy).
For mathematical theory, I expect LLMs are somewhat worse at this than ML research, but there won’t clearly be a big gap going forward.