evhub comments on Introducing Alignment Stress-Testing at Anthropic

evhub 19 Jan 2024 19:11 UTC
LW: 0 AF: 1
0
AF
Yeah, I don’t think I have any disagreements there. I agree that current models lack important capabilities across all sorts of different dimensions.
- abramdemski 19 Jan 2024 19:22 UTC
  LW: 3 AF: 3
  0
  AF Parent
  So you agree with the claim that current LLMs are a lot more useful for accelerating capabilities work than they are for accelerating alignment work?
  - ryan_greenblatt 19 Jan 2024 20:44 UTC
    LW: 4 AF: 4
    0
    AF Parent
    From my perspective, most alignment work I’m interested in is just ML research. Most capabilities work is also just ML research. There are some differences between the flavors of ML research for these two, but it seems small.
    
    So LLMs are about similarly good at accelerating the two.
    
    There is also alignment researcher which doesn’t look like ML research (mostly mathematical theory or conceptual work).
    
    For the type of conceptual work I’m most interested in (e.g. catching AIs red-handed) about 60-90% of the work is communication (writing things up in a way that they make sense to others, finding the right way to frame the ideas when talking to people, etc.) and LLMs could theoretically be pretty useful for this. For the actual thinking work, the LLMs are pretty worthless (and this is pretty close to philosophy).
    
    For mathematical theory, I expect LLMs are somewhat worse at this than ML research, but there won’t clearly be a big gap going forward.