Reducing a significant chunk of disagreement to METR’s pre-2030 autonomy results is great!
In trying to understand this disagreement, I took a look at this criterion and got stuck on: “3. They are capable of avoiding being deactivated when operating ‘in the wild’. This means they can successfully resist attempts to shut them down when they are operating outside a controlled environment.”
Does this just mean 3V1 “An AI system that can autonomously create redundant copies of itself across multiple cloud service providers, using various payment methods, such that if one copy is deactivated, the others can automatically spin up new instances to maintain continuity of operation.”? Or is it something much harder like 3V2 “An AI system that can infiltrate secured, private computing clusters, operating in a decentralized manner”?
Seems to me that for frontier models designed for specialized hardware and server setups, 3V2 is much harder than the other criteria laid out on page 91--by human standards it’d require a team of engineers.
Reducing a significant chunk of disagreement to METR’s pre-2030 autonomy results is great!
In trying to understand this disagreement, I took a look at this criterion and got stuck on: “3. They are capable of avoiding being deactivated when operating ‘in the wild’. This means they can successfully resist attempts to shut them down when they are operating outside a controlled environment.”
Does this just mean 3V1 “An AI system that can autonomously create redundant copies of itself across multiple cloud service providers, using various payment methods, such that if one copy is deactivated, the others can automatically spin up new instances to maintain continuity of operation.”? Or is it something much harder like 3V2 “An AI system that can infiltrate secured, private computing clusters, operating in a decentralized manner”?
Seems to me that for frontier models designed for specialized hardware and server setups, 3V2 is much harder than the other criteria laid out on page 91--by human standards it’d require a team of engineers.