The Case for a Journal of AI Alignment
When you have some nice research in AI Alignment, where do you publish it? Maybe your research fits with a ML or AI conference. But some papers/research are hard sells to traditional venues: things like Risks from Learned Optimization, Logical Induction, and a lot of research done on this Forum.
This creates two problems:
(Difficulty of getting good peer-review) If your paper is accepted at a big conference like NeurIPS, you’ll probably get useful reviews, but it seems improbable that those will focus on alignment in the way most AI Alignment researchers would. (I’m interested in feedback here).
And if your research is confined to arXiv or the Alignment Forum, it can be really hard to get any sort of deep feedback on it.(Dispersal of peer-reviewed research) The lack of a centralized peer-reviewed source of alignment research means that finding new papers is hard. Most people thus relies on heuristics like following specific researchers and the Alignment Forum, which is not the best for openness to new ideas.
I think that creating a research journal dedicated to AI alignment would help with both problems. Of course, such structure brings with it the specter of academia and publish-or-perish—we must thread carefully. Yet I believe that with enough thinking about these problems, there is a way to build a net positive journal for the field.
What it could look like
Let’s call our journal JAA (Journal of AI Alignment). I choose a journal over a conference because of the rolling basis of application for the former and the fact that journals don’t have to organize IRL meetings.
How does JAA work?
For the board of editors, I’m thinking of something similar to what the AF does: one editor from each big lab, plus maybe some independent researchers. The role of the editors is to manage a given submission, by sending it to reviewers. We probably want at least one of these reviewers to be from another part of AI Alignment, to judge the submission from a more outside view. There is then some back and forth between reviewers and authors, moderated by the editors, which results in the paper being either accepted or rejected.
A specificity of this process would be to require a reason why this research is useful for aligning AIs, and a separate document about the information hazards of the submission.
I also think the process should be transparent (no anonymity and the reviews are accessible with the published paper) and the papers to be open access.
Positive parts
If JAA existed, it would be a great place to send someone who wanted a general overview of the field. Papers there would have the guarantee of peer-reviewing, with the reviews published alongside the papers. The presentation would also probably be tailored to AI Alignment, instead of trying to fit in a bigger ML landscape. Lastly, I imagine a varied enough output to not really privilege any one approach, which is a common problem today (it’s really easy to stay stuck with the first approach one finds).
Defusing issues
The picture from the previous sections probably raised red flag in many readers. Let’s discuss them.
You’re creating a publish-or-perish dynamic
This would clearly be a big problem, as publish-or-perish is quite a disease in academia, one that we don’t want in this field. But I actually don’t see how such a journal would create this dynamic.
The publish-or-perish mentality is pushed by the institutions employing researchers and those funding the projects. As far as I can see, none of the big institutions in AI alignment are explicitly pushing for publication (MIRI even has a non-disclosure-by-default policy).
As long as getting published in JAA isn’t tied to getting a job or getting funding, I don’t think it will create publish-or-perish dynamics.
You’re taking too much time from researchers
This objection comes from the idea that “researchers should only do research”. In some sense, I agree: one should try to minimize the time of researchers taken by administrative duties and the like.
Yet reviewing papers is hardly administrative duty—it’s an essential part of participating in a research community. It takes way less time than mentoring new entrants, but can provide immensely useful feedback. A culture of peer-review also ensures that your own work will be reviewed, which means you’ll have feedback from others than your close collaborators. Lastly, being send papers to review is an excellent way to discover new ideas and collaborators.
As long as no one is submerged by reviews, and the process is handled efficiently, I don’t think this is asking too much of researchers.
You’re losing the benefit of anonymity in reviews
My counter-argument has nothing to do with the benefits of anonymity; it’s just that in any small enough field, anonymity in submissions and/or reviews is basically impossible. In the lab where I did my PhD, I heard many conversation about how this or that paper was clearly written by this person, or how the reviewer blocking one of my friend’s paper was obviously this guy, who wanted to torpedo any alternative to his own research. And by the end of my PhD, I could actually unmask pretty well the reviewers which knew enough to dig into my paper, as they were very few indeed.
So we can’t have anonymity in this context. Is that a problem? As long as the incentives are right, I don’t think so. The main issue I can think of is reviewers being too polite to destroy a bad submission by a friend/colleague/known researcher. And there’s definitely a bias from this. But the AI Alignment community and the relevant part of the EA seem epistemically-healthy enough to point politely to the issues they see in the papers. After all, it’s done every day on the Alignment Forum, without any anonymity.
As long as we push for politeness and honest feedback, I don’t think transparency will create bad incentives.
You’re creating information hazards through the open access policy
I want this journal to be open access. Yes, a part of this comes from beliefs that information should be freely accessible. But I know about information hazards, and I understand how they can be problematic.
My point here is that if you’re worried about an information hazard, don’t publish the research anywhere on the internet. No paywall/inscription-wall protected paper is truly protected (just ask sci-hub). And having the paper listed in an official sounding journal will attract people that want to use it for non-aligned reasons.
So information hazards are a pre-publication issue, not an issue with open access.
Call to action
What would be required? A cursory google search (with results like this page and this page) raises three main tasks:
(Deciding how the journal should function) This should be a community discussion, and can be started here in the comments.
(Technical Process) Code and host a website, design a template, code a submission and review tool. This can probably be done by one person or a small team in something between 6 months and a year (a bit of a wild guess here).
(Administration) Dealing with all the operational details, choosing new reviewers, interacting with everyone involved. This looks like either a full-time or part-time job.
I’m very interested in participating in the first task, but the other two are not things I want to do with my time. I would also definitely review papers for such a journal.
Yet there seem to be many people interested in AI-risks, and reducing them, with great technical and/or organizational skills. So I hope someone might get interested enough in the project to try to do it. Funding (from things like the LTFF) might help to start such a project (and pay someone for doing the administrative stuff).
What do you think? Do you believe such a journal would be a great idea? A terrible idea? Do you feel convinced by my discussions of the issues, or do you think that I missed something very important? And if you also want to have such a journal, what’s your take on the shape it should take?
- Alignment Research = Conceptual Alignment Research + Applied Alignment Research by 30 Aug 2021 21:13 UTC; 37 points) (
- 7 Mar 2022 17:50 UTC; 20 points) 's comment on The Future Fund’s Project Ideas Competition by (EA Forum;
I continue to think that there aren’t many benefits to a journal (yet).
But we can also look at past examples—there’s the AGI journal, in which some past safety work has been published. I have heard safety people saying things like “oh, that was published at AGI, it’s probably not worth reading”, and others saying “yeah, if you can’t get your work into a real conference, then send it to AGI so that it’s published somewhere”. (Not going to name names for the obvious reasons.) Doesn’t seem like a great precedent.
Thanks for the feedback. I didn’t know about the AGI journal, that’s a good point.
If we only try to provide publicly available feedback to safety researchers, including new ones, do you think that this proposal makes sense as a first step?
Yeah, I think that system makes sense as a way to address that goal (and does seem particularly valuable for new entrants). I’m at least interested enough to try it as an experiment.
I think this is a good idea. If you go ahead with it, here’s a suggestion.
Reviewers often procrastinate for weeks or months. This is partly because doing a review takes an unbounded amount of time, especially for articles that are long or confusing. So instead of sending the reviewers a manuscript with a due date, book a calendar event for 2 hours with the reviewers. The reviewers join a call or group chat and read the paper and discuss it. They can also help clear each other’s confusions. They aim to complete the review by the end of the time window.
It’s a pretty nice idea. I thought about just giving people two weeks, which might be a bit hardcore.
This idea has been discussed before. Though it’s an important one, so I don’t think it’s a bad thing for us to bring it up again. My perspective now and previously is that this would be fairly bad at the moment, but might be good in a couple of years time.
My background understanding is that the purpose of a conference or journal in this case (and in general) is primarily to certify the quality of some work (and to a lesser extent, the field of inquiry). This in-turn helps with growing the AIS field, and the careers of AIS researchers.
This is only effective if the conference or journal is sufficiently prestigious. Presently, publishing AI safety papers in Neurips, AAAI, JMLR, JAIR serves to certify the validity of the work, and boosts the field of AI safety whereas publishing in (for example) Futures or AGI doesn’t. If you create a new publication venue, by default, its prestige would be comparable to, or less than Futures or AGI, and so wouldn’t really help to serve the role of a journal.
Currently, the flow of AIS papers into the likes of Neurips and AAAI (and probably soon JMLR, JAIR) is rapidly improving. New keywords have been created there at several conferences, along the lines of “AI safety and trustworthiness” (I forget the exact wording) so that you can nowadays expect, on average, to receive reviewer who average out to neutral, or even vaguely sympathetic to AIS research. Ten or so papers were published in such journals in the last year, and all these authors will become reviewers under that keyword when the conference comes around next year. Yes, things like “Logical Inductors” or “AI safety via debate” are very hard to publish. There’s some pressure to write research that’s more “normie”. All of that sucks, but it’s an acceptable cost for being in a high-prestige field. And overall, things are getting easier, fairly quickly.
If you create a too low-prestige journal, you can generate blowback. For example, there was some criticism on Twitter about Pearl’s “Journal of Causal Inference”, even though his field is somewhat more advanced than hours.
In 1.5-3 years time, I think the risk-benefit calculus will probably change. The growth of AIS work (which has been fast) may outpace the virtuous cycle that’s currently happening with AI conferences and journals, such that a lot of great papers are getting rejected. There could be enough tenure-track professors at top schools to make the journal decently high-status (moreso than Futures and AGI). We might even be nearing the point where some unilateral actor will go and make a worse journal if we don’t make one. I’d say when a couple of those things are true, that’s when we should pull the trigger and make this kind of conference/journal.
Thanks for the detailed feedback! David already linked the facebook conversation, but it’s pretty useful that you summarize it in a comment like this.
I think that your position makes sense, and you do take into account most of my issues and criticisms about the current model. Do you think you could make really specific statements about what needs to change for a journal to be worth it, detailing more your last paragraph maybe.
Also, to provide a first step without the issues that you pointed, I proposed a review mechanism here in the AF in this comment.
I don’t (and perhaps shouldn’t) have a guaranteed trigger—probably I will learn a lot more about what the trigger should be over the next couple years. But my current picture would be that the following are mostly true:
The AIS field is publishing 3-10x more papers per year as the causal inference field is now.
We have ~3 highly aligned tenured professors at top-10 schools, and ~3 mostly-aligned tenured professors with ~10k citations, who want to be editors of the journal
The number of great papers that can’t get into other top AI journals is >20 per year. I figure it’s currently like ~2.
The chance that some other group creates a similar (worse) journal for safety in the subsequent 3 years is >20%
I agree with Ryan’s comments above on this being somewhat bad timing to start a journal for publishing work like the two examples mentioned at the start of the post above. I have an additional reason, not mentioned by Ryan, for feeling this way.
There is an inherent paradox when you want to confer academic credibility or prestige on much of the work that has appeared on LW/AF, work that was produced from an EA or x-risk driven perspective. Often, the authors chose the specific subject area of the work exactly because at the time, they felt that the subject area was a) important for x-risk while also b) lacking the credibility or prestige in main-stream academia that would have been necessary for academia to produce sufficient work in the subject area.
If condition b) is not satisfied, or becomes satisfied, then the EA or x-risk driven researchers (and EA givers of research funds) will typically move elsewhere.
I can’t see any easy way to overcome this paradox of academic prestige-granting on prestige-avoiding work in an academic-style journal. So I think that energy is better spent elsewhere.
In the spirit of open peer review, here are a few thoughts:
First, overall, I was convinced during earlier discussions that this is a bad idea—not because of costs, but because the idea lacks real benefits, and itself will not serve the necessary functions. Also see this earlier proposal (with no comments). There are already outlets that allow robust peer review, and the field is not well served by moving away from the current CS / ML dynamic of arXiv papers and presentations at conferences, which allow for more rapid iteration and collaboration / building on work than traditional journals—which are often a year or more out of date as of when they appear. However, if this were done, I would strongly suggest doing it as an arXiv overlay journal, rather than a traditional structure.
One key drawback you didn’t note is that allowing AI safety further insulation from mainstream AI work could further isolate it. It also likely makes it harder for AI-safety researchers to have mainstream academic careers, since narrow journals don’t help on most of the academic prestige metrics.
Two more minor disagreement are about first, the claim that “If JAA existed, it would be a great place to send someone who wanted a general overview of the field.” I would disagree—in field journals are rarely as good a source as textbooks or non-technical overview. Second, the idea that a journal would provide deeper, more specific, and better review than Alignment forum discussions and current informal discussions seems farfetched given my experience publishing in journals that are specific to a narrow area, like Health security, compared to my experience getting feedback on AI safety ideas.
+1 to each of these. May I suggest, instead of creating a JAA, we create a textbook? Or maybe a “special compilation” book that simply aggregates stuff? Or maybe even an encyclopedia? It’s like a journal, except that it doesn’t prevent these things from being published in normal academic journals as well.
Thanks for your pushback! I’ll respond to both of you in this comment.
Thanks for the link. I’m reading through the facebook thread, and I’ll come back here to discuss it after I finish.
The only actual peer review I see for the type of research I’m talking about by researchers knowledgeable in the subject is from private gdocs, as mentioned for example by Rohin here. Although it’s better than nothing, it has the issue of being completely invisible for any reader without access to these gdocs. Maybe you could infer the “peer-reviewness” of a post/paper by who is thanked in it, but that seems ridiculously roundabout.
When something is published in the AF, it rarely gets any feedback as deep as a peer-review or the comments in private gdocs. When something is published in a ML conference, I assume that most if not all reviews don’t really consider the broader safety and alignment questions, and focus on the short term ML relevance. And there is some research that is not even possible to publish in big ML venues.
As for conference vs journal… I see what you mean, but I don’t think it’s really a big problem. In small subfields that actively use arXiv, papers are old news when the conference happens, so it’s not a problem if they also are when the journal publishes them. I also wonder how faster could we get a journal to run if we actively try to ease the process. I’m thinking for example of not giving two months to reviewers when they all do their review the last week anyway. Lastly, you’re not proposing to make a conference, but if you were, I still think a conference would require much more work to organize.
I hadn’t thought of overaly journals, that’s a great idea! It might actually make it feasible without a full-time administrator.
I agree that this is risk, which is still another reason to privilege a journal. At least in Computer Science, the publication process is generally preprint → conference → journal. In that way, we can allow the submission of papers previously accepted at NeurIPS for example (maybe extended versions), which should mitigate the cost to academic careers. And if the journal curate enough great papers, it might end up decent enough on academic prestige metrics.
Agreed. Yet as I answer to Daniel below, I don’t think AI Alignment is mature enough and clear enough on what matters to write a satisfying textbook. Also, the state of the art is basically never in textbooks, and that’s the sort of overview I was talking about.
Hum, if you compare to private discussions and gdocs, I mostly agree that the review would be as good or a little worse (although you might get reviews from researchers to which you wouldn’t have sent your research). If it’s for the Alignment Forum, I definitely disagree that all the comments that you get here would be as useful as an actual peer-review. The most useful feedback I saw here recently was this review of Alex Turner’s paper by John, and that was actually from a peer-review process on LW.
So my point is that a journal with an open peer-review might be a way to make private gdocs discussions accessible while ensuring most people (not only those in contact of other researchers) can get any feedback whatsoever.
Onto Daniel’s answer:
As I wrote above, I don’t think we’re at the point where a textbook is a viable (and even useful) endeavor). For the second point, journals are not really important for careers in computer science (with maybe some exceptions, but all the recruiting processes I know basically only care about the conferences and maybe about the existence of at least one journal paper). And as long as we actually accept extended versions of papers published at conferences, there should be no problem with doing both.
Thanks. FWIW I find my worries mostly addressed by your reply about computer science conferences being the source of academic prestige and thus not in conflict with JAA. I still think a textbook or encyclopedia would be great; I think the field is plenty advanced enough, and in general there isn’t enough distillation and compilation work being done.
My issue with a textbook comes more from the lack of consensus. Like, the fundamentals (what you would put in the first few chapters) for embedded agency are different from those for preference learning, different from those for inner alignment, different from those for agent incentives (to only quote a handful of research directions). IMO, a textbook would either overlook big chunks of the field or look more like an enumeration of approaches than a unified resource.
Textbooks that cover a number of different approaches without taking a position on which one is the best are pretty much the standard in many fields. (I recall struggling with it in some undergraduate psychology courses, as previous schooling didn’t prepare me for a textbook that would cover three mutually exclusive theories and present compelling evidence in favor of each. Before moving on and presenting three mutually exclusive theories about some other phenomenon on the very next page.)
Fair enough. I think my real issue with an AI Alignment textbook is that for me a textbook presents relatively foundational and well established ideas and theories (maybe multiple ones), whereas I feel that AI Alignment is basically only state-of-the-art exploration, and that we have very few things that should actually be put into a textbook right now.
But I could change my mind if you have an example of what should be included in such an AI Alignment textbook.
That doesn’t seem like a big problem to me. Just make a different textbook for each major approach, or a single textbook that talks about each of them in turn. I would love such a book, and would happily recommend it to people looking to learn more about the field.
Or, just go ahead and overlook big chunks of the field. As long as you are clear that this is what you are doing, the textbook will still be useful for those interested in the chunk it covers.
As I said in my answer to Kaj, the real problem I see is that I don’t think we have the necessary perspective to write a useful textbook. Textbooks basically never touch research in the last ten years, or that research must be really easy to interpret and present, which is not the case here.
I’m open to being proven wrong, though.
I think we do. I also think attempting to write a textbook would speed up the process of acquiring more perspective. Our goals, motivations, and constraints are very different from the goals and motivations of most textbook-writers, I think, so I don’t feel much pressure to defer to the collective judgment of other textbook-writers.
I think this is a great idea and would be happy to help in any way with this.
Thanks. I’m curious about what you think of Ryan’s position or Rohin’s position?
An idea for having more AI Alignment peer review without compromising academic careers or reputation: create a review system in the Alignment Forum. What I had in mind is that people who are okay with doing a review can sign up somewhere. Then someone who posted something and wants a review can use a token (if they have some, which happens in a way I explain below) to ask for one. Then some people (maybe AF admins, maybe some specific administrator of the thing) assign one of the reviewers to the post.
The review has to follow some guidelines, like summarizing the paper, explaining the good parts and the issues, proposing new ideas. Once the review is posted and validated by the people in charge of the system, the reviewer gets a token she can use for asking a review of her own posts.
How do you bootstrap? For long time users of the AF it makes sense to give them some initial tokens maybe. And for newcomers (who really have a lot to win for reviews), I was thinking of asking them to do a nice distillation post for some token.
While not as ambitious as a journal, I think a system like that might solve two problems at once:
The lack of public feedback and in-depth peer review in most posts here
The lack of feedback at all for newcomers who don’t have private gdocs with a lot of researchers on them.
There’s probably a way to be even better for the second point, by for example having personal mentorship for something like three tokens.
I also believe that the incentives would be such that people would participate, and not necessarily try to game the system (being limited to the AF which is a small community also helps).
What do you think?
I think you need to distinguish clearly between wanting more peer interaction/feedback and wanting more peer review.
Academic peer review is a form of feedback, but it is mainly a form of quality control, so the scope of the feedback tends to be very limited in my experience.
The most valuable feedback, in terms of advancing the field, is comments like ‘maybe if you combine your X with this Y, then something very new/even better will come out’. This type of feedback can happen in private gdocs or LW/AF comment sections, less so in formal peer review.
That being said, I don’t think that private gdocs or LW/AF comment sections are optimal peer interaction/feedback mechanisms, something better might be designed. (The usual offline solution is to put a bunch of people together in the same building, either permanently or at a conference, and have many coffee breaks. Creating the same dynamics online is difficult.)
To make this more specific, here is what stops me usually from contributing feedback in AF comment sections. The way I do research, I tend to go on for months without reading any AF posts, as this would distract me too much. When I catch up, I have little motivation to add a quick or detailed comment to a 2-month old post.
One alternative would be to try to raise funds (e.g. perhaps from the EA LTF fund) to pay reviewers to perform reviews.
I’m surprised by two implicit claims you seem to be making.
Is your experience that peer-review is a good source of deep feedback?
I have a few peer-reviewed physics publications. The only useful peer-review feedback I got was a reviewer who pointed out a typo in one of my equations. Everything else have been gatekeeping related comments, which is no surprise given that peer-review mainly is a status gate keeping function. I got the impression that others physicists have the same experience. Is it different in other fields?
If I where to try to solve the lack of feedback problem I would create something like this or this or this or this.
But maybe peer-review is great if done right, and physics is just doing it wrong? I’m open to this possibility.
Do you think a journal would have lower bar for publication than AF? This seems probably wrong to me, but of course depends on who is making the decisions.
Or maybe you are just saying that having a second AI Safety publication platform would increase diversity? This does seem true to me.
In what way is AF not open to new ideas? I think it is a bit scary to publish a post here, but that has more to do with it being very public, and less to do with anything specific about the AF. But if AF has a culture of being non welcoming of new ideas, maybe we should fix that?
Update: I’m shifting towards thinking that peer-review could be good if done right, because:
I’m told it does work well in practice some times, creating a proof of concept.
I can see how being asked specifically to review some specific work could make me motivated to put in more work than I would do in a more public format (talk or blogpost).
It’s not that easy to justify a post from a year ago, but I think that what I meant was that the alignment forum has a certain style of alignment research, and thus only reading it means you don’t see stuff like CHAI research or other works that are and aim at alignment without being shared that much on the AF.
Strongly agree. I would be happy to help. Here are three academic AI alignment articles I have co-authored. https://arxiv.org/abs/2010.02911 https://arxiv.org/abs/1906.10536 https://arxiv.org/abs/2003.00812
Thanks! I don’t plan on making it myself (as mentioned in the post), but I’ll try to keep you posted if anything happens in this style.
How big is the first-mover advantage in journals? That is, how important is the possibility that if you don’t make an open-access journal first, Elsevier may come and create a closed-access one instead?
My take on this is: we don’t really care. Many if not most people in AI Alignment come from computer science fields, which have traditionally pushed a lot for open access and against traditional publishers. So I don’t believe Elsevier would find editors and papers for starting its journal, and it thus wouldn’t be able to reach the point where publishing there is required for careers.
I think there’s a lot of really good responses, that I won’t repeat.
I think the traditional model of journals has a lot of issues, not the least of which are bad incentives.
The new model used by eLife is pretty exciting to me, but very different than what you proposed. I think it’s worth considering:
only reviewing works that have already been published as preprints (I think LW/AF should count for this, as well as ArXiV)
publishing reviews—this lets the rest of the community benefit more from the labor of reviewing, though it does raise the standard for reviewers
curate the best / highest reviewed articles to be “published”
The full details of their new system is here in an essay they published describing the changes and why they made them.