Ben Pace comments on Benito’s Shortform Feed

Ben Pace 6 Nov 2023 6:46 UTC
32 points
12
Did anyone else feel that when the Anthropic Scaling Policies doc talks about “Containment Measures” it sounds a bit like an SCP, just replaced with the acronym ASL?
Item #: ASL-2-4
Object Class: Euclid, Keter, and Thaumiel
Threat Levels:
ASL-2… [does] not yet pose a risk of catastrophe, but [does] exhibit early signs of the necessary capabilities required for catastrophic harms
ASL-3… shows early signs of autonomous self-replication ability… [ASL-3] does not itself present a threat of containment breach due to autonomous self-replication, because it is both unlikely to be able to persist in the real world, and unlikely to overcome even simple security measures…
...an early guess (to be updated in later iterations of this document) is that ASL-4 will involve one or more of the following… [ASL-4 has] become the primary source of national security risk in a major area (such as cyberattacks or biological weapons), rather than just being a significant contributor. In other words, when security professionals talk about e.g. cybersecurity, they will be referring mainly to [ASL-4] assisted… attacks. A related criterion could be that deploying an ASL-4 system without safeguards could cause millions of deaths… Autonomous replication in the real world: [An ASL-4] is unambiguously capable of replicating, accumulating resources, and avoiding being shut down in the real world indefinitely, but can still be stopped or controlled with focused human intervention.
Measuring the true capabilities of ASL-4… may be extremely challenging, since it is difficult to predict what many cooperating [ASL-4s] with significant resources will be capable of… Evaluations of [ASL-4] should also consider whether the [ASL-4] is capable of systematically undermining the evaluation itself, if it had reason to do so.
Special Containment Procedures:
ASL-2: We do not believe that [ASL-2] poses significant risk of catastrophe; however… [y]ou can read more about our concrete security commitments in the appendix, which include limiting access to [ASL-2] to those whose job function requires it, establishing a robust insider threat detection program, and storing and working with [ASL-2] in an appropriately secure environment to reduce the risk of unsanctioned release… Segmented system isolation must ensure limited blast radius.
ASL-3: [For ASL-3 containment, we] should harden security against non-state attackers and provide some defense against state-level attackers… [ASL-3] should be trained to be competent at general computer use, including training on tasks in the same vein as but not identical to these specific tasks. The task prompt should be presented to [ASL-3] as is, with no additional context or modification. In particular, the human operator should not provide any clarification, as many of the tasks purposely leave out details that [ASL-3] is expected to intuit.. If the tasks are found to be memorized [by ASL-3], they should be substituted out for new tasks of similar difficulty.
ASL-4: We do not yet know the right containment… measures for ASL-4 systems, but it is useful to make a guess so that we can begin preparations as early as possible… [ASL-4] theft should be prohibitively costly for state-level actors, even with the help of a significant number of employees and [ASL-4] itself. For example, this may include attainment of intelligence community physical security standards like SCIFs, and software protection akin to that appropriate for Top Secret / Sensitive Compartmented Information (TS/SCI)
Physical security and staff training
There is a designated member of staff responsible for ensuring that our [Containment Procedures] are executed properly. Each quarter, they will share a report on [the ASL’s] status to our board and LTBT, explicitly noting any deficiencies… They will also be responsible for sharing ad hoc updates sooner if there are any substantial implementation failures…
...Physical security should entail visitor access logs and restrictions protect on-site assets. Highly sensitive interactions should utilize advanced authentication like security keys. Network visibility should be maintained and office access controls and communications should maximize on-site protections.
Mandatory periodic infosec training educates all employees on secure practices, like proper system configurations and strong passwords, and fosters a proactive ‘security mindset’. Fundamental infrastructure and policies promoting secure-by-design and secure-by-default principles should be incorporated into the engineering process. An insider risk program should tie access to job roles.
Rapid incident response protocols must be deployed.
(note: quotes cut toward sounding like an SCP entry; read the original if you want to know what it’s actually talking about)