Roman Leventov comments on Review AI Alignment posts to help figure out how to make a proper AI Alignment review

Roman Leventov 16 Jan 2023 4:29 UTC
12 points
0
I feel that LW is quite bad as the system for performing AI safety research. Most likely worse than the traditional system of academic publishing, in aggregate. A random list of things that I find unhelpful:
- Very short “attention timespan”. People read posts within a couple of days from publishing unless curated (but curation is not a solution, because it also either happens or not within a short time window, and is a subjective judgement of a few moderators), perhaps within a few weeks, unless hugely upvoted. And a big “wave of upvotes” is also a somewhat immediate reaction rather than a research appreciation.
  - Contrast: academic papers get read when they have many citations. Citations accumulate over time. Citations are a much more reliable indicator of a paper’s usefulness (exception: controversial takes by high-profile authors that invoke a lot of counter-takes). I suspect a paper’s “peak read” is somewhere within a year of the publishing date, but definitely not the day 1 after publishing.
  - Given LW/AF’s naturally faster turnover of posts and ideas (also necessitated by the ultra-fast AI progress), if it adopted showing on the front page posts that accumulated the most citations (backlinks) on these forums in the last few months, it could be more useful for guiding research.
- No standards and practices of review. No expectation that established researchers review the work of novices, thus disseminating ideas and helping novices to fix their conceptual confusion. Unstructured comments are way too “poor man’s” reviews, they often cherry-pick a small point and the ensuing argument loses the big picture quickly, making it largely unhelpful.
  - Yann LeCun proposed a review system here that seems to be implementable on LW/AF. Importantly, a full review of a post (a “paper”) could give it a “rating”, but a mere comment can’t. People can subscribe to streams of posts, approved (or highly rated) by specific researchers or groups (“Reviewing Entities”).
  - The cumulative rating of the post could also be used to guide the community’s attention. But note that reading a big post and writing a full review of it and giving it a rating or approval is much harder than just clicking the upvote button, and is in itself an intellectual bar. The “crowd” that just clicks upvotes (usually before even reading the post beyond the title) shouldn’t guide the attention of the research community.
  - The current review system on LW emphasises focusing on specific lines and claims, rather than on the big picture, and doesn’t guide the reviewers to ask the right questions about the work.
- No community standards (nor LW native support, e. g. for a dedicated “references” section) for referencing prior work. An academic paper in physics, computer science, biology, etc. with zero or a couple of references to prior work is unfathomable, in LW/AF this is standard practice (including for posts attempting to make research contributions: develop new theories, etc.). Also, these references shouldn’t be solipsistically limited to LW/AF alone.
- The selectivity of AF doesn’t seem to me to work or help (of course, this take is shaped by my personal experience of posts that IMO are obviously relevant for AF and of no lower quality than a big portion of posts on AF that don’t get published there because I’m not a member of the in-group, and, perhaps, not completely onboard with the currently most popular paradigms for thinking about alignment). Seems that AF just reinforces the “echo chamber” effect that is already present on LW (relative to the rest of scientific publishing/thought).
AF as a research medium
I think the following setup would be interesting: all posts with tags like “AI”, “Alignment”, “Agency”, plus an additional tag/label that authors may add, like “Research”, are automatically visible on “AF” (though, it wouldn’t make much sense to call that medium “forum” anymore). On these posts, apart from regular comments, people could also post structured reviews (with many obligatory sections, similar to academic reviews). There is already similar machinery on LW: on questions, there could be “answers”, but there also could be regular comments.
Finally, the content on the “new AF” is ranked completely differently than on LW, along the lines as I described above: regular LW’s upvotes and downvotes don’t matter at all, however, the ratings given out in reviews do matter (along, perhaps, with the reputation of the reviewers), as well as backlinks/reference counts. This necessarily implies that the content on this “AF” becomes visible on the front page only slowly (busy researchers are unlikely to review work quicker than on about a week timescale, and reviews themselves take time), but also potentially stays there for longer, if the content generates a lot of backlinks.
Also, this “new AF” should provide formal academic labelling, similar to how https://distill.pub/ did.
If someone shares these ideas and has more thoughts on why LW/AF system doesn’t cater for AI safety progress well, I would like to collaborate on a shared post (for LW).
PS. We, the AI Safety community, need to “outcompete” capability research, including on this front, so it’s unacceptable that we toil behind including because our epistemic systems don’t support a more effective accumulation of knowledge and insight.
What links here?
- Roman Leventov's comment on AGI safety field building projects I’d like to see by Severin T. Seehrich (25 Jan 2023 7:31 UTC; 1 point)
- Roman Leventov's comment on Review AI Alignment posts to help figure out how to make a proper AI Alignment review by habryka (16 Jan 2023 4:36 UTC; 1 point)
- habryka 16 Jan 2023 4:32 UTC
  2 points
  0
  Parent
  I have definitely been neglecting engineering and mechanism design for the AI Alignment Forum for quite a while, so concrete ideas for how to reform things are quite welcome. I also think things aren’t working that well, though my guess is my current top criticisms are quite different from yours.

Roman Leventov comments on Review AI Alignment posts to help figure out how to make a proper AI Alignment review

AF as a research medium