MIRI’s written about going non-disclosed by default. I expect you to think this is fine and probably good and not too relevant, because it’s not (as far as the writeup suggests) an attempt to keep secrets from the US government, and you expect they’d fail at that. Is that right?
No, I think it’s probably very counterproductive, depending on what it really means in practice. I wasn’t quite sure what the balance was between “We are going to actively try to keep this secret” and “It’s taking too much of our time to write all of this up”.
On the secrecy side of that, the problem isn’t whether or not MIRI’s secrecy works (although it probably won’t)[1]. The problem is with the cost and impact on their own community from their trying to do it. I’m going to go into that further down this tome.
And OpenAI is attempting to push more careful release practises into the overton window of discussion in the ML communities (my summary is here). [...]
For example, there are lots of great researchers in the world that aren’t paid by governments, and those people cannot get the ideas [...]
That whole GPT thing was just strange.
OpenAI didn’t conceal any of the ideas at all. They held back the full version of the actual trained network, but as I recall they published all of the methods they used to create it. Although a big data blob like the network is relatively easy to keep secret, if your goal is to slow down other research, controlling the network isn’t going to be effective at all.
… and I don’t think that slowing down follow-on research was their goal. If I remember right, they seemed to be worried that people would abuse the actual network they’d trained. That was indeed unrealistic. I’ve seen the text from the full network, and played with giving it prompts and seeing what comes out. Frankly, the thing is useless for fooling anybody and wouldn’t be worth anybody’s time. You could do better by driving a manually created grammar with random numbers, and people already do that.
Treating it like a Big Deal just made OpenAI look grossly out of touch. I wonder how long it took them to get the cherry-picked examples they published when they made their announcement...
So, yes, I thought OpenAI was being unrealistic, although it’s not the kind of “romanticization” I had in mind. I just can’t figure out what they could have stood to gain by that particular move.
All that said, I don’t think I object to “more careful release practices”, in the sense of giving a little thought to what you hand out. My objections are more to things like--
Secrecy-by-default, or treating it as cost-free to make something secret. It’s impractical to have too many secrets, and tends to dilute your protection for any secrets you actually do truly need. In the specific case of AI risk, I think it also changes the balance of speed between you and your adversaries… for the worse. I’ll explain more about that below when I talk about MIRI.
The idea that you can just “not release things”, without very strict formal controls and institutional boundaries, and have that actually work in any meaningful way. There seems to be a lot of “illusion of control” thinking going on. Real secrecy is hard, and it gets harder fast if it has to last a long time.
To set the frame for the rest, I’m going to bloviate a bit about how I’ve seen secrecy to work in general.
One of the “secrets of secrecy” is that, at any scale beyond two or three people, it’s more about controlling diffusion rates than about creating absolute barriers. Information interesting enough to care about will leak eventually.
You have some amount of control over the diffusion rate within some specific domains, and at their boundaries. Once information breaks out into a domain you do not control, it will spread according to the conditions in that new domain regardless of what you do. When information hits a new community, there’s a step change in how fast it propagates.
Which brings up next not-very-secret secret: I’m wrong to talk about a “diffusion rate”. The numbers aren’t big enough to smooth out random fluctuations the way they are for molecules. Information tends to move in jumps for lots of reasons. Something may stay “secret” for a really long time just because nobody notices it… and then become big news when it gets to somebody who actively propagates it, or to somebody who sees an implication others didn’t. A big part of propagation is the framing and setting; if you pair some information with an explanation of why it matters, and release it into a community with a lot of members who care, it will move much, much faster than if you don’t.[2]
So, now, MIRI’s approach...
The problem with what MIRI seems to be doing is that it disproportionately slows the movement of information within their own community and among their allies. In most cases, they will probably hurt themselves more than they hurt their “adversaries”.
Ideas will still spread among the “good guys”, but unreliably, slowly, through an unpredictable rumor mill, with much negotiation and everybody worrying at every turn about what to tell everybody else [3]. That keeps problems from getting solved. It can’t be fixed by telling the people who “need to know”, because MIRI (or whoever) won’t know who those people are, especially-but-not-only if they’re also being secretive.
Meanwhile, MIRI can’t rely on keeping absolute secrets from anybody for any meaningful amount of time. And they’ll probably have a relatively small effect on institutions that could actually do dangerous development. Assuming it’s actually interesting, once one of MIRI’s secrets gets to somebody who happens to be part of some “adversary” institution, it will be propagated throughout that institution, possibly very quickly. It may even get formally announced in the internal newsletter. It even has a chance of moving on from there into that first institution’s own institutional adversaries, because they spy on each other.
But the “adversaries” are still relatively good at secrecy, especially from non-peers, so any follow-on ideas they produce will be slower to propagate back out into the public where MIRI et al can benefit from them.
The advantage the AI risk and X-risk communities have is, if you will, flexibility: they can get their heads around new ideas relatively quickly, adapt, act on implications, build one idea on another, and change their course relatively rapidly. The corresponding, closely related disadvantage is weakness in coordinating work on a large scale toward specific, agreed-upon goals (like say big scary AI development projects).
Worrying too much about secrecy throws away the advantage, but doesn’t cure the disadvantage. Curing the disadvantage requires a culture and a set of material resources that I don’t believe MIRI and friends can ever develop… and that would probably torpedo their effectiveness if they did develop them.
By their nature, they are going to be the people who are arguing against some development program that everybody else is for. Maybe against programs that have already got a lot of investment behind them before some problem becomes clear. That makes them intrinsically less acceptable as “team players”. And they can’t easily focus on doing a single project; they have to worry about any possible way of doing it wrong. The structures that are good at building dangerous projects aren’t necessarily the same as the structures that are good at stoppping them.
If the AI safety community loses its agility advantage, it’s not gonna have much left.
MIRI will probably also lose some donors and collaborators, and have more trouble recruiting new ones as time goes on. People will forget they exist because they’re not talking, and there’s a certain reluctance to give people money or attention in exchange for “pigs in pokes”… or even to spend the effort to engage and find out what’s in the poke.
A couple of other notes:
Sometimes people talk about spreading defensive ideas without spreading the corresponding offensive ideas. In AI, that comes out as wanting to talk about safety measures without saying anything about how to increase capability.
In computer security, it comes out as cryptic announcments to “protect this port from this type of traffic until you apply this patch”… and it almost never works for long. The mere fact that you’re talking about some specific subject is enough to get people interested and make them figure out the offensive side. It can work for a couple of weeks for a security bug announcement, but beyond that it will almost always just backfire by drawing attention. And it’s very rare to be able to improve a defense without understanding the actual threat.
Edited the next day in an attempt to fix the footnotes… paragraphs after the first in each footnote were being left in the main flow.
As for keeping secrets from any major government…
.
First, I still prefer to talk about the Chinese government. The US government seems less likely to be a player here. Probably the most important reason is that most parts of the US government apparatus see things like AI development as a job for “industry”, which they tend to believe should be a very clearly separate sphere from “government”. That’s kind of different from the Chinese attitude, and it matters. Another reason is that the US government tends to have certain legal constraints and certain scruples that limit their effectiveness in penetrating secrecy.
.
I threw the US in as a reminder that China is far from the only issue, and I chose them because they used to be more interesting back during the cold war, and perhaps could be again if they got worried enough about “national security”.
.
But if any government, including the US, decides that MIRI has a lot of important “national security” information, and decides to look hard at them, then, yes, MIRI will largely fail to keep secrets. They may not fail completely. They may be able to keep some things off the radar, for a while. But that’s less likely for the most important things, and it will get harder the more people they convince that they may have information that’s worth looking at. Which they need to do.
.
They’ll probably even have information leaking into institutions that aren’t actively spying on them, and aren’t governments, either.
.
But that all that just leaves them where they started anyway. If there were no cost to it, it wouldn’t be a problem.
You can also get independent discoveries creating new, unpredictable starting points for diffusion. Often independent discoveries get easier as time goes on and the general “background” information improves. If you thought of something, even something really new, that can be a signal that conditions are making it easier for the next person to think of the same thing. I’ve seen security bugs with many independent discoveries.
.
Not to mention pathologies like one community thinking something is a big secret, and then seeing it break out from some other, sometimes much larger community that has treated it as common knowledge for ages.
If you ever get to the point where mostly-unaffiliated individuals are having to make complicated decisions about what should be shared, or having to think hard about what they have and have not committed themselves not to share, you are 95 percent of the way to fully hosed.
.
That sort of thing kind of works for industrial NDAs, but the reason it works is that, regardless of what people have convinced themselves to believe, most industrial “secret sauce” is pretty boring, and the rest tends to be either so specific and detailed that it obviously covered by any NDA. AND you usually only care about relatively few competitors, most of whose employees don’t get paid enough to get sued. That’s very different from some really inobvious world-shaking insight that makes the difference between low-power “safe” AI and high-power “unsafe” AI.
No, I think it’s probably very counterproductive, depending on what it really means in practice. I wasn’t quite sure what the balance was between “We are going to actively try to keep this secret” and “It’s taking too much of our time to write all of this up”.
On the secrecy side of that, the problem isn’t whether or not MIRI’s secrecy works (although it probably won’t)[1]. The problem is with the cost and impact on their own community from their trying to do it. I’m going to go into that further down this tome.
That whole GPT thing was just strange.
OpenAI didn’t conceal any of the ideas at all. They held back the full version of the actual trained network, but as I recall they published all of the methods they used to create it. Although a big data blob like the network is relatively easy to keep secret, if your goal is to slow down other research, controlling the network isn’t going to be effective at all.
… and I don’t think that slowing down follow-on research was their goal. If I remember right, they seemed to be worried that people would abuse the actual network they’d trained. That was indeed unrealistic. I’ve seen the text from the full network, and played with giving it prompts and seeing what comes out. Frankly, the thing is useless for fooling anybody and wouldn’t be worth anybody’s time. You could do better by driving a manually created grammar with random numbers, and people already do that.
Treating it like a Big Deal just made OpenAI look grossly out of touch. I wonder how long it took them to get the cherry-picked examples they published when they made their announcement...
So, yes, I thought OpenAI was being unrealistic, although it’s not the kind of “romanticization” I had in mind. I just can’t figure out what they could have stood to gain by that particular move.
All that said, I don’t think I object to “more careful release practices”, in the sense of giving a little thought to what you hand out. My objections are more to things like--
Secrecy-by-default, or treating it as cost-free to make something secret. It’s impractical to have too many secrets, and tends to dilute your protection for any secrets you actually do truly need. In the specific case of AI risk, I think it also changes the balance of speed between you and your adversaries… for the worse. I’ll explain more about that below when I talk about MIRI.
The idea that you can just “not release things”, without very strict formal controls and institutional boundaries, and have that actually work in any meaningful way. There seems to be a lot of “illusion of control” thinking going on. Real secrecy is hard, and it gets harder fast if it has to last a long time.
To set the frame for the rest, I’m going to bloviate a bit about how I’ve seen secrecy to work in general.
One of the “secrets of secrecy” is that, at any scale beyond two or three people, it’s more about controlling diffusion rates than about creating absolute barriers. Information interesting enough to care about will leak eventually.
You have some amount of control over the diffusion rate within some specific domains, and at their boundaries. Once information breaks out into a domain you do not control, it will spread according to the conditions in that new domain regardless of what you do. When information hits a new community, there’s a step change in how fast it propagates.
Which brings up next not-very-secret secret: I’m wrong to talk about a “diffusion rate”. The numbers aren’t big enough to smooth out random fluctuations the way they are for molecules. Information tends to move in jumps for lots of reasons. Something may stay “secret” for a really long time just because nobody notices it… and then become big news when it gets to somebody who actively propagates it, or to somebody who sees an implication others didn’t. A big part of propagation is the framing and setting; if you pair some information with an explanation of why it matters, and release it into a community with a lot of members who care, it will move much, much faster than if you don’t.[2]
So, now, MIRI’s approach...
The problem with what MIRI seems to be doing is that it disproportionately slows the movement of information within their own community and among their allies. In most cases, they will probably hurt themselves more than they hurt their “adversaries”.
Ideas will still spread among the “good guys”, but unreliably, slowly, through an unpredictable rumor mill, with much negotiation and everybody worrying at every turn about what to tell everybody else [3]. That keeps problems from getting solved. It can’t be fixed by telling the people who “need to know”, because MIRI (or whoever) won’t know who those people are, especially-but-not-only if they’re also being secretive.
Meanwhile, MIRI can’t rely on keeping absolute secrets from anybody for any meaningful amount of time. And they’ll probably have a relatively small effect on institutions that could actually do dangerous development. Assuming it’s actually interesting, once one of MIRI’s secrets gets to somebody who happens to be part of some “adversary” institution, it will be propagated throughout that institution, possibly very quickly. It may even get formally announced in the internal newsletter. It even has a chance of moving on from there into that first institution’s own institutional adversaries, because they spy on each other.
But the “adversaries” are still relatively good at secrecy, especially from non-peers, so any follow-on ideas they produce will be slower to propagate back out into the public where MIRI et al can benefit from them.
The advantage the AI risk and X-risk communities have is, if you will, flexibility: they can get their heads around new ideas relatively quickly, adapt, act on implications, build one idea on another, and change their course relatively rapidly. The corresponding, closely related disadvantage is weakness in coordinating work on a large scale toward specific, agreed-upon goals (like say big scary AI development projects).
Worrying too much about secrecy throws away the advantage, but doesn’t cure the disadvantage. Curing the disadvantage requires a culture and a set of material resources that I don’t believe MIRI and friends can ever develop… and that would probably torpedo their effectiveness if they did develop them.
By their nature, they are going to be the people who are arguing against some development program that everybody else is for. Maybe against programs that have already got a lot of investment behind them before some problem becomes clear. That makes them intrinsically less acceptable as “team players”. And they can’t easily focus on doing a single project; they have to worry about any possible way of doing it wrong. The structures that are good at building dangerous projects aren’t necessarily the same as the structures that are good at stoppping them.
If the AI safety community loses its agility advantage, it’s not gonna have much left.
MIRI will probably also lose some donors and collaborators, and have more trouble recruiting new ones as time goes on. People will forget they exist because they’re not talking, and there’s a certain reluctance to give people money or attention in exchange for “pigs in pokes”… or even to spend the effort to engage and find out what’s in the poke.
A couple of other notes:
Sometimes people talk about spreading defensive ideas without spreading the corresponding offensive ideas. In AI, that comes out as wanting to talk about safety measures without saying anything about how to increase capability.
In computer security, it comes out as cryptic announcments to “protect this port from this type of traffic until you apply this patch”… and it almost never works for long. The mere fact that you’re talking about some specific subject is enough to get people interested and make them figure out the offensive side. It can work for a couple of weeks for a security bug announcement, but beyond that it will almost always just backfire by drawing attention. And it’s very rare to be able to improve a defense without understanding the actual threat.
Edited the next day in an attempt to fix the footnotes… paragraphs after the first in each footnote were being left in the main flow.
As for keeping secrets from any major government…
.
First, I still prefer to talk about the Chinese government. The US government seems less likely to be a player here. Probably the most important reason is that most parts of the US government apparatus see things like AI development as a job for “industry”, which they tend to believe should be a very clearly separate sphere from “government”. That’s kind of different from the Chinese attitude, and it matters. Another reason is that the US government tends to have certain legal constraints and certain scruples that limit their effectiveness in penetrating secrecy.
.
I threw the US in as a reminder that China is far from the only issue, and I chose them because they used to be more interesting back during the cold war, and perhaps could be again if they got worried enough about “national security”.
.
But if any government, including the US, decides that MIRI has a lot of important “national security” information, and decides to look hard at them, then, yes, MIRI will largely fail to keep secrets. They may not fail completely. They may be able to keep some things off the radar, for a while. But that’s less likely for the most important things, and it will get harder the more people they convince that they may have information that’s worth looking at. Which they need to do.
.
They’ll probably even have information leaking into institutions that aren’t actively spying on them, and aren’t governments, either.
.
But that all that just leaves them where they started anyway. If there were no cost to it, it wouldn’t be a problem.
You can also get independent discoveries creating new, unpredictable starting points for diffusion. Often independent discoveries get easier as time goes on and the general “background” information improves. If you thought of something, even something really new, that can be a signal that conditions are making it easier for the next person to think of the same thing. I’ve seen security bugs with many independent discoveries.
.
Not to mention pathologies like one community thinking something is a big secret, and then seeing it break out from some other, sometimes much larger community that has treated it as common knowledge for ages.
If you ever get to the point where mostly-unaffiliated individuals are having to make complicated decisions about what should be shared, or having to think hard about what they have and have not committed themselves not to share, you are 95 percent of the way to fully hosed.
.
That sort of thing kind of works for industrial NDAs, but the reason it works is that, regardless of what people have convinced themselves to believe, most industrial “secret sauce” is pretty boring, and the rest tends to be either so specific and detailed that it obviously covered by any NDA. AND you usually only care about relatively few competitors, most of whose employees don’t get paid enough to get sued. That’s very different from some really inobvious world-shaking insight that makes the difference between low-power “safe” AI and high-power “unsafe” AI.