I generally don’t find writeups of standards useful, but this piece was an exception. Below, I’ll try to articulate why:
I think AI governance pieces—especially pieces about standards—often have overly vague language. People say things like “risk management practices” or “third-party audits”, phrases that are generally umbrella terms that lack specificity. These sometimes serve as applause lights (whether the author intended this or not): who could really disagree with the idea of risk management?
I liked that this piece (fairly unapologetically) advocates for specific things that labs should be doing. As an added benefit, the piece causes the reader to realize “oh wow, there’s a lot of stuff that our civilization already does to mitigate this other threat—if we were actually prioritizing AI x-risk as seriously as pandemics, there’s a bunch of stuff we’d be doing differently.”
Naturally, there are some areas that lack detail (e.g., how do biolabs do risk assessments and how would AI labs do risk assessments?), but the reader is at least left with some references that allow them to learn more. I think this moves us to an acceptable level of concreteness, especially for a shallow investigation.
I think this is also the kind of thing that I could actually see myself handing to a policymaker or staffer (in reality, since they have no time, I would probably show them a one-pager or two-pager version of this, with this longer version linked).
I’ll likely send this to junior AI governance folks as a solid example of a shallow investigation that could be helpful. In terms of constructive feedback, I think the piece could’ve had a TLDR section (or table) that directly lists each of the recommendations for AI labs and each of the analogs in biosafety. [I might work on such a table and share it here if I produce it].
3. “Labs record and respond to every incident and plan for emergencies.”-But do they do far enough with these procedures? Do BSL-4 labs publicly post cryptographic timestamps (cryptographic hashes and Merkle roots) of all their records?
Once the cryptographic timestamps are made public, people can publicly post these cryptographic timestamps on public blockchains like the Bitcoin blockchain.
I searched for these cryptographic timestamps I did not find any at any BSL-4 lab. This means that if there is a lab incident that they do not want to make public, then the lab (with the help of the government) can falsify records, but they cannot do this once the timestamps are posted on the Bitcoin blockchain.
I am still the only entity advocating for this safety measure (Aleksandr V. Kudriavtsev, Anna Vakhrusheva, and Alexander Shneider proposed in a scientific paper creating an entirely new blockchain which is much more complicated than simply asking for the BSL-4 labs to periodically tell us the hashes of all of the records; I am only expecting the BSL-4 labs to implement the most basic solutions).
And yes, cryptographic timestamps can be used for AI-safety as well in many different ways. For example, when training LLMs, if the training data came with cryptographic timestamps, the LLM has an upper bound for the time when that data has been produced (there may be other ways to do this, but since cryptographic timestamps are so easy to produce and verify, I see no reason why one should not use the added security of cryptographic timestamps). For example, if the data has a timestamp before the year 2017 when the notion of a transformer has been introduced, the LLM will know for sure that the data has not been produced by another LLM. If LLMs are trained with data produced by other LLMs and the old LLMs have the problem of making stuff up and declaring it as fact, then the future LLMs will recursively improve at making stuff up and it will be harder for LLMs to output true statements but LLMs will become better at making the statements believable.
I generally don’t find writeups of standards useful, but this piece was an exception. Below, I’ll try to articulate why:
I think AI governance pieces—especially pieces about standards—often have overly vague language. People say things like “risk management practices” or “third-party audits”, phrases that are generally umbrella terms that lack specificity. These sometimes serve as applause lights (whether the author intended this or not): who could really disagree with the idea of risk management?
I liked that this piece (fairly unapologetically) advocates for specific things that labs should be doing. As an added benefit, the piece causes the reader to realize “oh wow, there’s a lot of stuff that our civilization already does to mitigate this other threat—if we were actually prioritizing AI x-risk as seriously as pandemics, there’s a bunch of stuff we’d be doing differently.”
Naturally, there are some areas that lack detail (e.g., how do biolabs do risk assessments and how would AI labs do risk assessments?), but the reader is at least left with some references that allow them to learn more. I think this moves us to an acceptable level of concreteness, especially for a shallow investigation.
I think this is also the kind of thing that I could actually see myself handing to a policymaker or staffer (in reality, since they have no time, I would probably show them a one-pager or two-pager version of this, with this longer version linked).
I’ll likely send this to junior AI governance folks as a solid example of a shallow investigation that could be helpful. In terms of constructive feedback, I think the piece could’ve had a TLDR section (or table) that directly lists each of the recommendations for AI labs and each of the analogs in biosafety. [I might work on such a table and share it here if I produce it].
3. “Labs record and respond to every incident and plan for emergencies.”-But do they do far enough with these procedures? Do BSL-4 labs publicly post cryptographic timestamps (cryptographic hashes and Merkle roots) of all their records?
Once the cryptographic timestamps are made public, people can publicly post these cryptographic timestamps on public blockchains like the Bitcoin blockchain.
I searched for these cryptographic timestamps I did not find any at any BSL-4 lab. This means that if there is a lab incident that they do not want to make public, then the lab (with the help of the government) can falsify records, but they cannot do this once the timestamps are posted on the Bitcoin blockchain.
I am still the only entity advocating for this safety measure (Aleksandr V. Kudriavtsev, Anna Vakhrusheva, and Alexander Shneider proposed in a scientific paper creating an entirely new blockchain which is much more complicated than simply asking for the BSL-4 labs to periodically tell us the hashes of all of the records; I am only expecting the BSL-4 labs to implement the most basic solutions).
And yes, cryptographic timestamps can be used for AI-safety as well in many different ways. For example, when training LLMs, if the training data came with cryptographic timestamps, the LLM has an upper bound for the time when that data has been produced (there may be other ways to do this, but since cryptographic timestamps are so easy to produce and verify, I see no reason why one should not use the added security of cryptographic timestamps). For example, if the data has a timestamp before the year 2017 when the notion of a transformer has been introduced, the LLM will know for sure that the data has not been produced by another LLM. If LLMs are trained with data produced by other LLMs and the old LLMs have the problem of making stuff up and declaring it as fact, then the future LLMs will recursively improve at making stuff up and it will be harder for LLMs to output true statements but LLMs will become better at making the statements believable.