Balancing Security Mindset with Collaborative Research: A Proposal

Challenges with Security Mindset
Some necessary context: I was in the SERI-MATS Winter 2023 cohort, and eventually dropped out due to burnout of various kinds, after being accepted into the final phase of the program. Most of that burnout was for purely personal reasons, and not really anyone’s fault, but some of it was I think a result of feeling like I had to think within a security mindset, which is a poor fit for my personal psychology and biochemistry. I think I’m not particularly unique in this regard, so it seemed like a problem worth trying to fix, rather than just a reason for me personally to sit this one out.

For the purposes of this post, I will take it as a given that security mindset is good and necessary; this is of course something that can be debated, but it makes sense to me.

So, how do we achieve good security mindset without burning out the (possibly significant) fraction of the population that can’t easily handle it? My intuition is that we need better tooling and better, more explicit conventions around the issue, so that people can do things the right way without causing themselves undue stress. In the rest of this post, I lay out some tools and social conventions that seem to me like they would help.

The Dual-Use Nature of Ideas

One intuition that I developed during SERI-MATS, that I think holds up under scrutiny, is that almost all ideas are either unimportant, or if they are useful enough to be important, inherently dual-use. (That is, they can be used both for capabilities advancements and for alignment advancements.) So, for me at least, it doesn’t make sense to treat some knowledge as dangerous and some knowledge as safe. All of it is either useless trivia, or it’s potentially dangerous intel, and nobody is really smart enough to reliably sort things into even those two categories. Very accomplished senior researchers will maybe be able to sort things into trivial vs. nontrivial, but I think even they will make occasional errors in that regard.

The Need for Compartmentalization and Collaboration

So if everything is either dangerous or useless, and you can’t even tell which is which, how do you accomplish literally anything? The only answer I’ve been able to come up with is compartmentalization. (Maybe there are other answers, but I haven’t thought of any yet.) Like any security-conscious organization, you need to keep things on a roughly need-to-know basis. However, this does massive damage to people’s ability to collaborate successfully and productively on what everyone agrees are massively difficult scientific problems that basically require significant collaboration. This poses a quandary; in the rest of this post I outline a proposed solution, which has various flaws that I will point out and try to address. It presumably has more flaws that I have not seen yet, so I invite any commenters to point them out in the comments.

Proposed DAG Model for Collaboration

So, how do you balance vitally necessary compartmentalization with vitally necessary collaboration? The only way I’ve been able to make it sort of work in my head is to create a directed acyclic graph (DAG) of alignment researchers, and share information according to this graph. So every researcher picks another, more senior researcher that they trust more than themselves, and shares things that need to be shared with this person. This way, the most senior researchers will be able to direct massive formations of research activity: they will have the necessary knowledge bubble up to them through the DAG, they will have the personal judgment to know what pieces need to be worked on next, and they will be trusted by those who report to them to give the right advice on what to work on next.

As a small refinement on this, there should probably be two-way links between people who report to the same mentor and work in the same location, since these peer-to-peer collaborations are, in my experience, just as valuable as mentor-to-peer collaborations, although they are valuable in different ways. Between the one-way mentor-to-peer collaborations and the two-way peer-to-peer collaborations, I think people will be quite productive. The only thing missing from the typical academic researcher’s collaboration toolkit will be conferences and symposia; maybe these can still exist somehow, possibly with some sort of decentralized censorship mechanism to screen out things that are super dual-use (insofar as you think that alignment and capabilities can be separated at all, which I don’t really).

Challenges and Potential Solutions

What are some problems with this approach? I’ve only really thought of two: firstly, that senior researchers might not want to do the amount of mentoring that this approach seems to require, especially if we want literally the whole alignment community to use the same graph; and secondly, that there are various incentive problems with this mechanism around credit allocation and monetary resources. I’ll briefly sketch out some possible solutions to these problems.

So, how do we not waste the time of vitally important senior researchers by forcing them to screen through massive amounts of stuff from their more junior colleagues? I think the first key here is just to very strictly limit the fan-out that senior researchers are expected to support. If this makes the tree too deep (which it will, just as a matter of combinatorics), then some important things will not bubble up in a timely and correct fashion, but that seems like an acceptable tradeoff. So, senior researchers only have to regularly listen to 2-5 people that they have hand selected as being worth listening to. Maybe in rare cases, some important piece of information will need to skip a level or two, and that can be handled by asking the intermediate people for introductions, with the understanding that there is a very strict time limit on any conversation that skips levels, especially if it skips multiple levels. I think that, between limiting the fan-out, and having strong norms around allowing but limiting skip-level conversations, information can flow quite quickly through the DAG without burning the senior researchers out.

How do we deal with incentive problems? As I’ve sketched out the proposal above, career advancement is extremely difficult. People who start out as junior (which everyone basically must) need to be able to become senior eventually if they do well; this is important both for the community’s ability to produce additional senior researchers, and for the individual wellbeing of the formerly junior researchers. The most straightforward way I can think of to accomplish this is simply to promote people up to be peers with their former mentor; this adds only one additional edge, which will be between the newly promoted researcher and their former mentor’s-mentor.

The other incentive problem is that, if any of these edges cross between for-profit organizations, or from a for-profit organization into any other organization really, the necessary communication will be extremely frowned upon by the people who control the purse strings. This problem I haven’t really figured out a good solution to. Siloing research into various competing for-profit organizations (the only ones that can afford massive GPU clusters) seems suboptimal for solving the alignment problem. Maybe this is just a necessary part of trying to solve the alignment problem within a capitalist society, I don’t know. Maybe there can be some norms that emerge that for-profit companies make explicit, above-board deals with each other about what is allowed and not allowed in terms of inter-organization communication, possibly with money changing hands if that becomes necessary in the eyes of the organizations’ respective business people.

Conclusion

To summarize this post, I think that security mindset is good and necessary, but I also think that there need to be much more explicit norms about how it is done, since security mindset is potentially a large source of burnout for some number of researchers. How exactly this would work is kind of an unknown, and I welcome any feedback on the various mechanisms I have proposed above.