Nice. I mostly agree on current margins — the more I mistrust a lab, the more I like transparency.
I observe that unilateral transparency is unnecessary on the view-from-the-inside if you know you’re a reliably responsible lab. And some forms of transparency are costly. So the more responsible a lab seems, the more sympathetic we should be to it saying “we thought really hard and decided more unilateral transparency wouldn’t be optimific.” (For some forms of transparency.)
I think I agree with this idea in principle, but I also feel like it misses some things in practice (or something). Some considerations:
I think my bar for “how much I trust a lab such that I’m OK with them not making transparency commitments” is fairly high. I don’t think any existing lab meets that bar.
I feel like a lot of forms of helpful transparency are not that costly. The main ‘cost’ EDIT: I think one of the most noteworthy costs is something like “maybe the government will end up regulating the sector if/when it understands how dangerous industry people expect AI systems to be and how many safety/security concerns they have”. But I think things like “report dangerous stuff to the govt”, “have a whistleblower mechanism”, and even “make it clear that you’re willing to have govt people come and ask about safety/security concerns” don’t seem very costly from an immediate time/effort perspective.
If a Responsible Company implemented transparency stuff unilaterally, it would make it easier for the government to have proof-of-concept and implement the same requirements for other companies. In a lot of cases, showing that a concept works for company X (and that company X actually thinks it’s a good thing) can reduce a lot of friction in getting things applied to companies Y and Z.
I do agree that some of this depends on the type of transparency commitment and there might be specific types of transparency commitments that don’t make sense to pursue unilaterally. Off the top of my head, I can’t think of any transparency requirements that I wouldn’t want to see implemented unilaterally, and I can think of several that I would want to see (e.g., dangerous capability reports, capability forecasts, whistleblower mechanisms, sharing if-then plans with govt, sharing shutdown plans with govt, setting up interview program with govt, engaging publicly with threat models, having clear OpenAI-style tables that spell out which dangerous capabilities you’re tracking/expecting).
@Zach Stein-Perlman@ryan_greenblatt feel free to ignore but I’d be curious for one of you to explain your disagree react. Feel free to articulate some of the ways in which you think I might be underestimating the costliness of transparency requirements.
(My current estimate is that whistleblower mechanisms seem very easy to maintain, reporting requirements for natsec capabilities seem relatively easy insofar as most of the information is stuff you already planned to collect, and even many of the more involved transparency ideas (e.g., interview programs) seem like they could be implemented with pretty minimal time-cost.)
Requirements which just involve informing one part of the government (say US AISI) in ways which don’t cost much personnel time mostly have the effect of potentially making the government much more aware at some point. I think this is probably good, but presumably labs would prefer to retain flexibility and making the government more aware can go wrong (from the lab’s perspective) in ways other than safety focused regulation. (E.g., causing labs to be merged as part of a national program to advance capabilities more quickly.)
Whistleblower requirements with teeth can have information leakage concerns. (Internal only whistleblowering policies like Anthropic’s don’t have this issue, but also have basically no teeth for the company overall.)
Any sort of public discussion (about e.g. threat models, government involvement, risks) can have various PR and reputational cost. (Idk if you were counting this under transparency.)
To the extent you expect to disagree with third party inspectors about safety (and they aren’t totally beholden to you), this might end up causing issues for you later.
I’m not claiming that “reasonable” labs shouldn’t do various types of transparency unilaterially, but I don’t think the main cost is in making safety focused regulation more likely.
Agree that the public stuff has immediate effects that could be costly. (Hiding stuff from the public, refraining from discussing important concerns publicly, or developing a reputation for being kinda secretive/sus can also be costly; seems like an overall complex thing to model IMO.)
Sharing info with government could increase the chance of a leak, especially if security isn’t great. I expect the most relevant info is info that wouldn’t be all-that-costly if leaked (e.g., the government doesn’t need OpenAI to share its secret sauce/algorithmic secrets. Dangerous capability eval results leaking or capability forecasts leaking seem less costly, except from a “maybe people will respond by demanding more govt oversight” POV.
I think all-in-all I still see the main cost as making safety regulation more likely, but I’m more uncertain now, and this doesn’t seem like a particularly important/decision-relevant point. Will edit the OG comment to language that I endorse with more confidence.
Nice. I mostly agree on current margins — the more I mistrust a lab, the more I like transparency.
I observe that unilateral transparency is unnecessary on the view-from-the-inside if you know you’re a reliably responsible lab. And some forms of transparency are costly. So the more responsible a lab seems, the more sympathetic we should be to it saying “we thought really hard and decided more unilateral transparency wouldn’t be optimific.” (For some forms of transparency.)
I think I agree with this idea in principle, but I also feel like it misses some things in practice (or something). Some considerations:
I think my bar for “how much I trust a lab such that I’m OK with them not making transparency commitments” is fairly high. I don’t think any existing lab meets that bar.
I feel like a lot of forms of helpful transparency are not that costly.
The main ‘cost’EDIT: I think one of the most noteworthy costs is something like “maybe the government will end up regulating the sector if/when it understands how dangerous industry people expect AI systems to be and how many safety/security concerns they have”. But I think things like “report dangerous stuff to the govt”, “have a whistleblower mechanism”, and even “make it clear that you’re willing to have govt people come and ask about safety/security concerns” don’t seem very costly from an immediate time/effort perspective.If a Responsible Company implemented transparency stuff unilaterally, it would make it easier for the government to have proof-of-concept and implement the same requirements for other companies. In a lot of cases, showing that a concept works for company X (and that company X actually thinks it’s a good thing) can reduce a lot of friction in getting things applied to companies Y and Z.
I do agree that some of this depends on the type of transparency commitment and there might be specific types of transparency commitments that don’t make sense to pursue unilaterally. Off the top of my head, I can’t think of any transparency requirements that I wouldn’t want to see implemented unilaterally, and I can think of several that I would want to see (e.g., dangerous capability reports, capability forecasts, whistleblower mechanisms, sharing if-then plans with govt, sharing shutdown plans with govt, setting up interview program with govt, engaging publicly with threat models, having clear OpenAI-style tables that spell out which dangerous capabilities you’re tracking/expecting).
@Zach Stein-Perlman @ryan_greenblatt feel free to ignore but I’d be curious for one of you to explain your disagree react. Feel free to articulate some of the ways in which you think I might be underestimating the costliness of transparency requirements.
(My current estimate is that whistleblower mechanisms seem very easy to maintain, reporting requirements for natsec capabilities seem relatively easy insofar as most of the information is stuff you already planned to collect, and even many of the more involved transparency ideas (e.g., interview programs) seem like they could be implemented with pretty minimal time-cost.)
It depends on the type of transparency.
Requirements which just involve informing one part of the government (say US AISI) in ways which don’t cost much personnel time mostly have the effect of potentially making the government much more aware at some point. I think this is probably good, but presumably labs would prefer to retain flexibility and making the government more aware can go wrong (from the lab’s perspective) in ways other than safety focused regulation. (E.g., causing labs to be merged as part of a national program to advance capabilities more quickly.)
Whistleblower requirements with teeth can have information leakage concerns. (Internal only whistleblowering policies like Anthropic’s don’t have this issue, but also have basically no teeth for the company overall.)
Any sort of public discussion (about e.g. threat models, government involvement, risks) can have various PR and reputational cost. (Idk if you were counting this under transparency.)
To the extent you expect to disagree with third party inspectors about safety (and they aren’t totally beholden to you), this might end up causing issues for you later.
I’m not claiming that “reasonable” labs shouldn’t do various types of transparency unilaterially, but I don’t think the main cost is in making safety focused regulation more likely.
TY. Some quick reactions below:
Agree that the public stuff has immediate effects that could be costly. (Hiding stuff from the public, refraining from discussing important concerns publicly, or developing a reputation for being kinda secretive/sus can also be costly; seems like an overall complex thing to model IMO.)
Sharing info with government could increase the chance of a leak, especially if security isn’t great. I expect the most relevant info is info that wouldn’t be all-that-costly if leaked (e.g., the government doesn’t need OpenAI to share its secret sauce/algorithmic secrets. Dangerous capability eval results leaking or capability forecasts leaking seem less costly, except from a “maybe people will respond by demanding more govt oversight” POV.
I think all-in-all I still see the main cost as making safety regulation more likely, but I’m more uncertain now, and this doesn’t seem like a particularly important/decision-relevant point. Will edit the OG comment to language that I endorse with more confidence.