So, here’s some considerations (not an actual policy)
It’s instructive to look at the case of nuclear weapons, and the key analogies or disanalogies to math work. For nuclear weapons, the basic theory is pretty simple and building the hardware is the hard part, while for AI, the situation seems reversed. The hard part there is knowing what to do in the first place, not scrounging up the hardware to do it.
First, a chunk from Wikipedia
Most of the current ideas of the Teller–Ulam design came into public awareness after the DOE attempted to censor a magazine article by U.S. anti-weapons activist Howard Morland in 1979 on the “secret of the hydrogen bomb”. In 1978, Morland had decided that discovering and exposing this “last remaining secret” would focus attention onto the arms race and allow citizens to feel empowered to question official statements on the importance of nuclear weapons and nuclear secrecy. Most of Morland’s ideas about how the weapon worked were compiled from highly accessible sources—the drawings which most inspired his approach came from the Encyclopedia Americana. Morland also interviewed (often informally) many former Los Alamos scientists (including Teller and Ulam, though neither gave him any useful information), and used a variety of interpersonal strategies to encourage informational responses from them (i.e., asking questions such as “Do they still use sparkplugs?” even if he wasn’t aware what the latter term specifically referred to)....
When an early draft of the article, to be published in The Progressive magazine, was sent to the DOE after falling into the hands of a professor who was opposed to Morland’s goal, the DOE requested that the article not be published, and pressed for a temporary injunction. After a short court hearing in which the DOE argued that Morland’s information was (1). likely derived from classified sources, (2). if not derived from classified sources, itself counted as “secret” information under the “born secret” clause of the 1954 Atomic Energy Act, and (3). dangerous and would encourage nuclear proliferation...
Through a variety of more complicated circumstances, the DOE case began to wane, as it became clear that some of the data they were attempting to claim as “secret” had been published in a students’ encyclopedia a few years earlier....
Because the DOE sought to censor Morland’s work—one of the few times they violated their usual approach of not acknowledging “secret” material which had been released—it is interpreted as being at least partially correct, though to what degree it lacks information or has incorrect information is not known with any great confidence.
So, broad takeaways from this: The Streisand effect is real. A huge part of keeping something secret is just having nobody suspect that there is a secret there to find. This is much trickier for nuclear weapons, which are of high interest to the state, while it’s more doable for AI stuff (and I don’t know how biosecurity has managed to stay so low-profile). This doesn’t mean you can just wander around giving the rough sketch of the insight, in math, it’s not too hard to reinvent things once you know what you’re looking for. But, AI math does have a huge advantage in this it’s a really broad field and hard to search through (I think my roommate said that so many papers get submitted to NeurIPS that you couldn’t read through them all in time for the next NeurIPS conference), and, in order to reinvent something from scratch without having the fundamental insight, you need to be pointed in the exact right direction and even then you’ve got a good shot at missing it (see: the time-lag between the earliest neural net papers and the development of backpropagation, or, in the process of making the Infra-Bayes post, stumbling across concepts that could have been found months earlier if some time-traveler had said the right three sentences at the time.)
Also, secrets can get out through really dumb channels. Putting important parts of the H-bomb structure in a student’s encyclopedia? Why would you do that? Well, probably because there’s a lot of people in the government and people in different parts have different memories of which stuff is secret and which stuff isn’t.
So, due to AI work being insight/math-based, security would be based a lot more on just… not telling people things. Or alluding to them. Although, there is an interesting possibility raised by the presence of so much other work in the field. For nuclear weapons work, things seem to be either secret or well-known among those interested in nuclear weapons. But AI has a big intermediate range between “secret” and “well-known”. See all those Arxiv papers with like, 5 citations. So, for something that’s kinda iffy (not serious enough (given the costs of the slowdown in research with full secrecy) to apply full secrecy, not benign enough to be comfortable giving a big presentation at NeurIPS about it), it might be possible to intentionally target that range. I don’t think it’s a binary between “full secret” and “full publish”, there’s probably intermediate options available.
Of course, if it’s known that an organization is trying to fly under the radar with a result, you get the Streisand effect in full force. But, just as well-known authors may have pseudonyms, it’s probably possible to just publish a paper on Arxiv (or something similar) under a pseudonym and not have it referenced anywhere by the organization as an official piece of research they funded. And it would be available for viewing and discussion and collaborative work in that form, while also (with high probability) remaining pretty low-profile.
Anyways, I’m gonna set a 10-minute timer to have thoughts about the guidelines:
Ok, the first thought I’m having is that this is probably a case where Inside View is just strictly better than Outside View. Making a policy ahead of time that can just be followed requires whoever came up with the policy to have a good classification in advance all the relevant categories of result and what to do with them, and that seems pretty dang hard to do especially because novel insights, almost by definition, are not something you expected to see ahead of time.
The next thought is that working something out for a while and then going “oh, this is roughly adjacent to something I wouldn’t want to publish, when developed further” isn’t quite as strong of an argument for secrecy as it looks like, because, as previously mentioned, even fairly basic additional insights (in retrospect) are pretty dang tricky to find ahead of time if you don’t know what you’re looking for. Roughly, the odds of someone finding the thing you want to hide scale with the number of people actively working on it, so that case seems to weigh in favor of publishing the result, but not actively publicizing it to the point where you can’t befriend everyone else working on it. If one of the papers published by an organization could be built on to develop a serious result… well, you’d still have the problem of not knowing which paper it is, or what unremarked-on direction to go in to develop the result, if it was published as normal and not flagged as anything special. But if the paper got a whole bunch of publicity, the odds go up that someone puts the pieces together spontaneously. And, if you know everyone working on the paper, you’ve got a saving throw if someone runs across the thing.
There is a very strong argument for talking to several other people if you’re unsure whether it’d be good to publish/publicize, because it reduces the problem of “person with laxest safety standards publicizes” to “organization with the laxest safety standards publicizes”. This isn’t a full solution, because there’s still a coordination problem at the organization level, and it gives incentives for organizations to be really defensive about sharing their stuff, including safety-relevant stuff. Further work on the inter-organization level of “secrecy standards” is very much needed. But within an organization, “have personal conversation with senior personnel” sounds like the obvious thing to do.
So, current thoughts: There’s some intermediate options available instead of just “full secret” or “full publish” (publish under pseudonym and don’t list it as research, publish as normal but don’t make efforts to advertise it broadly) and I haven’t seen anyone mention that, and they seem preferable for results that would benefit from more eyes on them, that could also be developed in bad directions. I’d be skeptical of attempts to make a comprehensive policy ahead of time, this seems like a case where inside view on the details of the result would outperform an ahead-of-time policy. But, one essential aspect that would be critical on a policy level is “talk it out with a few senior people first to make the decision, instead of going straight for personal judgement”, as that tamps down on the coordination problem considerably.
Publishing under a pseudonym may end up being counterproductive due to the Streisand effect. Identities behind many pseudonyms may suddenly be publicly revealed following a publication on some novel method for detecting similarities in writing style between texts.
It seems to me that under ideal circumstances, once we think we’ve invented FAI, before we turn it on, we share the design with a lot of trustworthy people we think might be able to identify problems. I think it’s good to have the design be as secret as possible at that point, because that allows the trustworthy people to scrutinize it at their leisure. I do think the people involved in the design are liable to attract attention—keeping this “FAI review project” secret will be harder than keeping the design itself secret. (It’s easier to keep the design for the bomb secret than hide the fact that top physicists keep mysteriously disappearing.) And any purported FAI will likely come after a series of lesser systems with lucrative commercial applications used to fund the project, and those lucrative commercial applications are also liable to attract attention. So I think it’s strategically valuable to have the distance between published material and a possible FAI design be as large as possible. To me, the story of nuclear weapons is a story of how this is actually pretty hard even when well-resourced state actors try to do it.
Of course, that has to be weighed against the benefit of openness. How is openness helpful? Openness lets other researchers tell you if they think you’re pursuing a dangerous research direction, or if there are serious issues with the direction you’re pursuing which you are neglecting. Openness helps attract collaborators. Openness helps gain prestige. (I would argue that prestige is actually harmful because it’s better to keep a low profile, but I guess prestige is useful for obtaining required funding.) How else is openness helpful?
My suspicion is that those papers on Arxiv with 5 citations are mostly getting cited by people who already know the author, and the Arxiv publication isn’t actually doing much to attract collaboration. It feels to me like if our goal is to help researchers get feedback on their research direction or find collaborators, there are better ways to do this than encouraging them to publish their work. So if we could put mechanisms in place to achieve those goals, that could remove much of the motivation for openness, which would be a good thing in my view.
Regarding making a policy ahead of time, I think we can have an evolving model of what ingredients are missing to get transformative AI, and some rule of thumb that says how dangerous your result is, given how much progress it makes towards each ingredient (relevant but clearly insufficient < might or might not be sufficient < plausibly a full solution), how concrete/actionable it is (abstract idea < impractical method < practical method) and how original/surprising it is (synthesis of ideas in the field < improvement on idea in the field < application of idea outside the field < completely out of the blue).
One problem is, the model itself might be an infohazard. This consideration pushes towards making the guidelines secret in themselves, but that would make it much harder to debate and disseminate them. Also, the new result might have major implications for the model. So, yes, certainly there is no replacement for the inside view, but I still feel that we can have guidelines that help focusing on the right considerations.
There’s some intermediate options available instead of just “full secret” or “full publish”… and I haven’t seen anyone mention that...
OpenAI’s phased release of GPT2 seems like a clear example of exactly this. And there is a forthcoming paper looking at the internal deliberations around this from Toby Shevlane, in addition to his extant work on the question of how disclosure potentially affects misuse.
Another advantage AI secrecy has over nuclear secrecy is that there’s a lot of noise and hype these days around ML both within and outside the community, making hiding in plain sight much easier.
So, here’s some considerations (not an actual policy)
It’s instructive to look at the case of nuclear weapons, and the key analogies or disanalogies to math work. For nuclear weapons, the basic theory is pretty simple and building the hardware is the hard part, while for AI, the situation seems reversed. The hard part there is knowing what to do in the first place, not scrounging up the hardware to do it.
First, a chunk from Wikipedia
So, broad takeaways from this: The Streisand effect is real. A huge part of keeping something secret is just having nobody suspect that there is a secret there to find. This is much trickier for nuclear weapons, which are of high interest to the state, while it’s more doable for AI stuff (and I don’t know how biosecurity has managed to stay so low-profile). This doesn’t mean you can just wander around giving the rough sketch of the insight, in math, it’s not too hard to reinvent things once you know what you’re looking for. But, AI math does have a huge advantage in this it’s a really broad field and hard to search through (I think my roommate said that so many papers get submitted to NeurIPS that you couldn’t read through them all in time for the next NeurIPS conference), and, in order to reinvent something from scratch without having the fundamental insight, you need to be pointed in the exact right direction and even then you’ve got a good shot at missing it (see: the time-lag between the earliest neural net papers and the development of backpropagation, or, in the process of making the Infra-Bayes post, stumbling across concepts that could have been found months earlier if some time-traveler had said the right three sentences at the time.)
Also, secrets can get out through really dumb channels. Putting important parts of the H-bomb structure in a student’s encyclopedia? Why would you do that? Well, probably because there’s a lot of people in the government and people in different parts have different memories of which stuff is secret and which stuff isn’t.
So, due to AI work being insight/math-based, security would be based a lot more on just… not telling people things. Or alluding to them. Although, there is an interesting possibility raised by the presence of so much other work in the field. For nuclear weapons work, things seem to be either secret or well-known among those interested in nuclear weapons. But AI has a big intermediate range between “secret” and “well-known”. See all those Arxiv papers with like, 5 citations. So, for something that’s kinda iffy (not serious enough (given the costs of the slowdown in research with full secrecy) to apply full secrecy, not benign enough to be comfortable giving a big presentation at NeurIPS about it), it might be possible to intentionally target that range. I don’t think it’s a binary between “full secret” and “full publish”, there’s probably intermediate options available.
Of course, if it’s known that an organization is trying to fly under the radar with a result, you get the Streisand effect in full force. But, just as well-known authors may have pseudonyms, it’s probably possible to just publish a paper on Arxiv (or something similar) under a pseudonym and not have it referenced anywhere by the organization as an official piece of research they funded. And it would be available for viewing and discussion and collaborative work in that form, while also (with high probability) remaining pretty low-profile.
Anyways, I’m gonna set a 10-minute timer to have thoughts about the guidelines:
Ok, the first thought I’m having is that this is probably a case where Inside View is just strictly better than Outside View. Making a policy ahead of time that can just be followed requires whoever came up with the policy to have a good classification in advance all the relevant categories of result and what to do with them, and that seems pretty dang hard to do especially because novel insights, almost by definition, are not something you expected to see ahead of time.
The next thought is that working something out for a while and then going “oh, this is roughly adjacent to something I wouldn’t want to publish, when developed further” isn’t quite as strong of an argument for secrecy as it looks like, because, as previously mentioned, even fairly basic additional insights (in retrospect) are pretty dang tricky to find ahead of time if you don’t know what you’re looking for. Roughly, the odds of someone finding the thing you want to hide scale with the number of people actively working on it, so that case seems to weigh in favor of publishing the result, but not actively publicizing it to the point where you can’t befriend everyone else working on it. If one of the papers published by an organization could be built on to develop a serious result… well, you’d still have the problem of not knowing which paper it is, or what unremarked-on direction to go in to develop the result, if it was published as normal and not flagged as anything special. But if the paper got a whole bunch of publicity, the odds go up that someone puts the pieces together spontaneously. And, if you know everyone working on the paper, you’ve got a saving throw if someone runs across the thing.
There is a very strong argument for talking to several other people if you’re unsure whether it’d be good to publish/publicize, because it reduces the problem of “person with laxest safety standards publicizes” to “organization with the laxest safety standards publicizes”. This isn’t a full solution, because there’s still a coordination problem at the organization level, and it gives incentives for organizations to be really defensive about sharing their stuff, including safety-relevant stuff. Further work on the inter-organization level of “secrecy standards” is very much needed. But within an organization, “have personal conversation with senior personnel” sounds like the obvious thing to do.
So, current thoughts: There’s some intermediate options available instead of just “full secret” or “full publish” (publish under pseudonym and don’t list it as research, publish as normal but don’t make efforts to advertise it broadly) and I haven’t seen anyone mention that, and they seem preferable for results that would benefit from more eyes on them, that could also be developed in bad directions. I’d be skeptical of attempts to make a comprehensive policy ahead of time, this seems like a case where inside view on the details of the result would outperform an ahead-of-time policy. But, one essential aspect that would be critical on a policy level is “talk it out with a few senior people first to make the decision, instead of going straight for personal judgement”, as that tamps down on the coordination problem considerably.
Publishing under a pseudonym may end up being counterproductive due to the Streisand effect. Identities behind many pseudonyms may suddenly be publicly revealed following a publication on some novel method for detecting similarities in writing style between texts.
It seems to me that under ideal circumstances, once we think we’ve invented FAI, before we turn it on, we share the design with a lot of trustworthy people we think might be able to identify problems. I think it’s good to have the design be as secret as possible at that point, because that allows the trustworthy people to scrutinize it at their leisure. I do think the people involved in the design are liable to attract attention—keeping this “FAI review project” secret will be harder than keeping the design itself secret. (It’s easier to keep the design for the bomb secret than hide the fact that top physicists keep mysteriously disappearing.) And any purported FAI will likely come after a series of lesser systems with lucrative commercial applications used to fund the project, and those lucrative commercial applications are also liable to attract attention. So I think it’s strategically valuable to have the distance between published material and a possible FAI design be as large as possible. To me, the story of nuclear weapons is a story of how this is actually pretty hard even when well-resourced state actors try to do it.
Of course, that has to be weighed against the benefit of openness. How is openness helpful? Openness lets other researchers tell you if they think you’re pursuing a dangerous research direction, or if there are serious issues with the direction you’re pursuing which you are neglecting. Openness helps attract collaborators. Openness helps gain prestige. (I would argue that prestige is actually harmful because it’s better to keep a low profile, but I guess prestige is useful for obtaining required funding.) How else is openness helpful?
My suspicion is that those papers on Arxiv with 5 citations are mostly getting cited by people who already know the author, and the Arxiv publication isn’t actually doing much to attract collaboration. It feels to me like if our goal is to help researchers get feedback on their research direction or find collaborators, there are better ways to do this than encouraging them to publish their work. So if we could put mechanisms in place to achieve those goals, that could remove much of the motivation for openness, which would be a good thing in my view.
Regarding making a policy ahead of time, I think we can have an evolving model of what ingredients are missing to get transformative AI, and some rule of thumb that says how dangerous your result is, given how much progress it makes towards each ingredient (relevant but clearly insufficient < might or might not be sufficient < plausibly a full solution), how concrete/actionable it is (abstract idea < impractical method < practical method) and how original/surprising it is (synthesis of ideas in the field < improvement on idea in the field < application of idea outside the field < completely out of the blue).
One problem is, the model itself might be an infohazard. This consideration pushes towards making the guidelines secret in themselves, but that would make it much harder to debate and disseminate them. Also, the new result might have major implications for the model. So, yes, certainly there is no replacement for the inside view, but I still feel that we can have guidelines that help focusing on the right considerations.
OpenAI’s phased release of GPT2 seems like a clear example of exactly this. And there is a forthcoming paper looking at the internal deliberations around this from Toby Shevlane, in addition to his extant work on the question of how disclosure potentially affects misuse.
Another advantage AI secrecy has over nuclear secrecy is that there’s a lot of noise and hype these days around ML both within and outside the community, making hiding in plain sight much easier.