Zach Stein-Perlman comments on Zach Stein-Perlman’s Shortform

Zach Stein-Perlman 31 May 2024 0:00 UTC
23 points
11
Labs should give deeper model access to independent safety researchers (to boost their research)
Sharing deeper access helps safety researchers who work with frontier models, obviously.
Some kinds of deep model access:
1. Helpful-only version
2. Fine-tuning permission
3. Activations and logits access
4. [speculative] Interpretability researchers send code to the lab; the lab runs the code on the model; the lab sends back the results
See Shevlane 2022 and Bucknall and Trager 2023.
A lab is disincentivized from sharing deep model access because it doesn’t want headlines about how researchers got its model to do scary things.
It has been suggested that labs are also disincentivized from sharing because they want safety researchers to want to work at the labs and sharing model access with independent researchers make those researchers not need to work at the lab. I’m skeptical that this is real/nontrivial.
Labs should limit some kinds of access to avoid effectively leaking model weights. But sharing limited access with a moderate number of safety researchers seems very consistent with keeping control of the model.
This post is not about sharing with independent auditors to assess risks from a particular model.
@Buck suggested I write about this but I don’t have much to say about it. If you have takes—on the object level or on what to say in a blogpost on this topic—please let me know.