Ambitious mechanistic interpretability is quite unlikely[1] to be able to confidently assess[2] whether AIs[3] are deceptively aligned (or otherwise have dangerous propensities) in the next 10 years.
greater than 90% failure
likelihood ratio of 10
I’m refering to which ever AIs are pivotal or cruxy for things to go well prior to human obsolescence.
Ambitious mechanistic interpretability is quite unlikely[1] to be able to confidently assess[2] whether AIs[3] are deceptively aligned (or otherwise have dangerous propensities) in the next 10 years.
greater than 90% failure
likelihood ratio of 10
I’m refering to which ever AIs are pivotal or cruxy for things to go well prior to human obsolescence.