I think Anthropic de facto acts as though “models are quite unlikely (e.g. 3%) to be scheming” is true. Evidence that seriously challenged this view might cause the organization to substantially change its approach.
I think Anthropic de facto acts as though “models are quite unlikely (e.g. 3%) to be scheming” is true. Evidence that seriously challenged this view might cause the organization to substantially change its approach.