Strategies for differential divulgation of key ideas in AI capability

Openness makes the AI race worse

I will start with a short discussion of this article on the implications of openness in AI research. If you are very familiar with this argument, feel free to skip this section.

While the paper makes a lot of effort to present arguments on both sides, the strongest point is that openness makes the AI development race more competitive.

The oversimplified model of how openness affects the AI race dynamic is one in which there are $k$ ideas necessary to build an AGI. Once all these insights are available, you can actually start working on the implementation details, which may take several months to a few years. If insights are kept to the teams who found them, it is likely that the front-runner team (likely to be the one more productive in finding key ideas) will have some head start. If instead most or all of the insights are published openly, then many teams will start working on the implementation nearly at the same time, and the race will be very competitive.

This is particularly bad as it makes it harder for a leading AI developer to pause or slow down capability research to develop safety methods, or to implement performance-handicapping safety controls, without abandoning the lead to some other less careful developer.

First model: front-runner team committed to AGI safety

To me the above is more than enough to out-weight any positives associated with openness. As a result, my first model of how things could go well includes a team strongly committed to AGI safety succeeding not only in becoming the front-runner of AI development, but also doing so with a significant head start.

This sounds difficult. Those committed to AGI safety seem to be a relatively small minority among AI researchers today, so unless that changes dramatically, it seems a priory unlikely that one of the teams committed to safety will become the front-runner, and that becomes even more unlikely if we require a significant initial lead.

Coordination advantages of “ideological” teams

Not all is lost, though. We, those committed to AGI safety, have a significant advantage: we are united, or at least we have the potential to be. If we can coordinate ourselves, we may be able to increase our changes dramatically.

Let us assume for the sake of the argument that we have solved these coordination problems and that there is a centralized authority, capable of understanding key ideas and trusted by us to share them responsibly. If such an institution has enough legitimacy, it may be possible for most AI researchers committed to AGI safety to share their insights and key ideas with it. The authority could then select some of the best and most trusted researchers and form a team with them, entrust it with all the key ideas, so that they can become front-runners in the AI race.

Our team has a significant advantage in that its constituents are expected to be ideologically aligned in a shared goal of preventing existential risk. This will allow them to keep their key ideas secret.

In contrast, other competing teams may struggle to keep their key ideas securely hidden. If not bound by some stronger ideology, some of their own members may share key ideas with other teams. Others who are in favor of openness may actually share key ideas with the whole world, something that will be difficult to prevent, and once it has happened, difficult to investigate.

So in some sense only ideologically bound teams can coordinate themselves to acquire and maintain a number of key ideas only for themselves. According to our oversimplified models, these may end up being the only serious contestants in the AI race, despite their respective ideologies being shared by only a small minority of researchers.

If that analysis is correct and such advantages are substantial, then those committed to AGI safety may end up having a decent chance after all!

Key ideas do not become obsolete quickly

One important limitation of this model is that ideas are sometimes reinvented or become no longer useful. Sometimes a new idea or a combination of several could even strictly dominate the old one,.

So if a team has developed a key idea that is subsequently reinvented and shared openly, or surpassed by new insights that become public, the team has effectively lost any advantage it had.

As a result, if we want to be able to use this model of key ideas to acquire and maintain a corpus of useful private insights, we should focus much more on fundamental understanding than on techniques. There are frequent changes in what the state of the art techniques are, but fundamental understanding of learning algorithms may remain relevant for decades.

Such a focus on long-term fundamental understanding may also give us advantages compared to other researchers.

For instance, in neuroscience, it is much easier to publish a paper on a small, low-level detail of how the brain implements something. Even if this all turns out not to be important in the grand picture of things, no one will blame you or the editors if your paper is correct and the marginal contributions are novel.

In contrast, a new and speculative theory of how the brain works (such as what we can see in the writings of Steven Byrnes here on this forum) is much more likely to be important, but may require writing about things that the author itself maybe unsure about, may require generalizations for which exceptions will be found, or may involve disputed originality over the claims (someone else may have written something similar 20 years ago!).

Similarly in machine learning research, it is a lot easier to gain recognition by publishing a novel technique that improves the state of the art slightly on some benchmark, even if the result is likely to be surpassed soon by a better implementation. Trying to get new learning algorithms to do useful work, while possibly more important, may be a lot less rewarding, and the initial attempts, even if promising, may not immediately generate clear improvements in performance in comparison to heavily optimized existing paradigms.

Challenges of centralized coordination

In our initial model of how we might cooperate to succeed, we depend on a centralized authority to both gather key ideas from researchers all over the world and to select a single team responsible that is expected to become to use them to gain an advantage in the AI race.

This is clearly a very difficult task. Key ideas are sometimes speculative, and it may be difficult to identify them as promising when they are in their initial stages of development.

Furthermore, fundamental insights are hardly developed in isolation. AI researchers may need to receive access to some previous insights in order to work on new important ones. Without effective coordination, duplicate work may be performed independently. Dead-end ideas may be pursued over and over again.

In theory it should be possible for the authority to recognize the state of each individual contributor’s research, assess the contributor for trustworthiness, and then give it access to a few insights that it believes will be particularly important for its research to be fruitful. In this model, the authority would be a central gatekeeper of fundamental insights, coordinating everybody’s access to information, which would make its job even harder.

The fact that such an authority would have a tremendously difficult job will make rational people less confident on the ability of this coordination mechanism to work, potentially preventing them from sharing their insights in the first place.

As a result, while we may still want a coordination authority to be responsible for dealing with some particularly sensitive results, I no longer consider it practical for such an entity to be responsible for dealing with information security for all or most of AI capability research.

Differential divulgation of key ideas

This brings us to a refinement of my initial model of how we might cooperate to succeed. Maybe we can find a way to leverage some of the advantages of openness, creating an environment in which those committed to AGI safety can exchange ideas somewhat freely, including ideas on capability research, but in a way that does not make the AI race worse.

One way we could do that is by making it easier to use this very forum to publish insights not intended for the general public. In some sense this is already possible with the draft feature (I can share a draft with only a few select users), but it would be better if I could share a post with a whole set of users. This could be the set of people with karma above a certain level, the set of people trusted by specific users, or the set of curated researchers known to contribute to a specific area.

But here I would like to further propose another idea. Maybe we can start sharing capability insights freely, but in such a way as to make them hard to understand for those without a solid background in AGI safety research.

In some sense we are doing that already, and spontaneously, by using a lot of jargon. I always considered that to be a bad thing, a kind of necessary evil resulting from the necessity of brevity and precision. But when explaining insights that might be relevant for AI capability, we may actually want to include as much jargon as possible.

A more sophisticated approach would, instead of merely using safety jargon, actually argue in such a way that only makes sense for those who have read the relevant literature, maybe by using informal examples and analogies that are only clear to those having this particular knowledge.

If many important insights are discussed in here and at similar forums, it may become known to the general AI research community that there are important ideas for research to be learned here. This has a downside in that researchers who are not committed to AGI safety may access some key insights, making the AI race worse. In the worse possible scenario, someone may purify the capability idea, removing the safety jargon and requirements from it, and publish it somewhere else.

But as long as people stay around to learn, even if they come for the capability ideas, they may get convinced by the safety arguments. Furthermore, an increasing status of the rationalist community among AI researchers may encourage them to take our arguments more seriously.

I am not arguing that all or even that most AI capability insight should be shared publicly. But I do think there is a strong case for sharing more than what we have being doing so far, for doing so on the same forums as we discuss safety research, and for presenting capability and safety arguments in an interconnected way.

By doing so, we’ll be doing a differential divulgation of key AI capability ideas, sharing them predominantly with those who by being members of our community are already more likely to take AGI risks seriously, while simultaneously increasing the appeal of our community to AI researchers in general.