I think you are probably right about the arguments favoring “automating alignment is harder than automating capabilities.” Do you have any particular reasons to think,
AI assistants might be uniquely good at discovering new paradigms (as opposed to doing empirical work).
What comes to mind for me is Janus’s account of using LLMs to explore many more creative directions than previously, but this doesn’t feel like strong evidence to me. Reasons this doesn’t feel like strong evidence: seems hard to scale and it sure seems the OpenAI plan relies on scalability; seems quite hard to evaluate new paradigms and if you take humans out of the loop this is likely harder.
I think you are probably right about the arguments favoring “automating alignment is harder than automating capabilities.” Do you have any particular reasons to think,
What comes to mind for me is Janus’s account of using LLMs to explore many more creative directions than previously, but this doesn’t feel like strong evidence to me. Reasons this doesn’t feel like strong evidence: seems hard to scale and it sure seems the OpenAI plan relies on scalability; seems quite hard to evaluate new paradigms and if you take humans out of the loop this is likely harder.