Can you say more about what types of AI safety research you are referring to? Interpretability, evals, and steering for deep nets, I assume, but not work that’s attempting to look forward and apply to AGI and ASI?
Can you say more about what types of AI safety research you are referring to? Interpretability, evals, and steering for deep nets, I assume, but not work that’s attempting to look forward and apply to AGI and ASI?