59 of the 430 eligible voters participated, evaluating 75 posts. Meanwhile, 39 users submitted a total of 120 reviews, with most posts getting at least one review.
Thanks a ton to everyone who put in time to think about the posts—nominators, reviewers and voters alike. Several reviews substantially changed my mind about many topics and ideas, and I was quite grateful for the authors participating in the process. I’ll mention Zack_M_Davis, Vanessa Kosoy, and Daniel Filan as great people who wrote the most upvoted reviews.
In the coming months, the LessWrong team will write further analyses of the vote data, and use the information to form a sequence and a book of the best writing on LessWrong from 2018.
Below are the results of the vote, followed by a discussion of how reliable the result is and plans for the future.
For most posts, between 10-20 people voted on them (median of 17). A change by 10-15 in a post’s score is enough to move a post up or down around 10 positions within the rankings. This is equal to a few moderate strength votes from two or three people, or an exceedingly strong vote from a single strongly-feeling voter. This means that the system is somewhat noisy, though it seems to me very unlikely that posts at the very top could end up placed much differently.
The vote was also affected by two technical mistakes the team made:
The post-order was not randomized. For the first half of the voting period, the posts on the voting page appeared in order of number of nominations (least to most) instead of appearing randomly, thereby giving more visual attention to the first ~15 or so posts (these were posts with 2 nominations). Ruby looked into it and says that 15-30% more people cast votes on these earlier-appearing posts compared to those appearing elsewhere in the list. Thanks to gjm for identifying this issue.
Users were given some free negative votes. When calculating the cost of users’ votes, we used a simple equation, but missed that it produced an off-by-one error for negative numbers. Essentially, users got a free 1-negative-vote-weight on all the posts to which they had voted on negatively. To correct for this, for those who had exceeded their budget − 18 users in total—we reduced the strength of their negative votes by a single unit, and for those who had not spent all their points their votes were unaffected. This didn’t affect the rank-ordering very much, a few posts changed by 1 position, and a smaller number changed by 2-3 positions.
The effect size of these errors is not certain since it’s hard to know how people would have voted counterfactually. My sense is that the effect is pretty small, and that the majority of noise in the system comes from elsewhere.
Finally, we discarded exactly one ballot, which spent 10,000 points on voting instead of the allotted 500. Had a user gone over by a small amount e.g. 1-50 points, we had planned to scale their votes down to fit the budget. However when someone’s allocation was so extreme, we were honestly unsure what adjustment to their votes they would have wanted, as if their points had been normalised down to 500, the majority of their votes would have been adjusted to zero. (This decision was made without knowing the user who cast the ballot or which posts were affected.)
Overall, I think the vote is a good indicator to about 10 places within the rankings, but, for example, I wouldn’t agonise over whether a post is at position #42 vs #43.
The Future
This has been the first LessWrong Annual Review. This project was started with the vision of creating a piece of infrastructure that would:
Create common knowledge about how the LessWrong community feels about various posts and topics and the progress we’ve made.
Improve our longterm incentives, feedback, and rewards for authors.
Help create a highly curated “Best of 2018” Sequence and Book.
The vote reveals much disagreement between LessWrongers. Every post has at least five positive votes and every post had at least one negative vote – except for An Untrollable Mathematician Illustrated by Abram Demski, which was evidently just too likeable – and many people had strongly different feelings about many posts. Many of these seem more interesting to me than the specific ranking of the given post.
In total, users wrote 207 nominations and 120 reviews, and many authors updated their posts with new thinking, or clearer explanations, showing that both readers and authors reflected a lot (and I think changed their mind a lot) during the review period. I think all of this is great, and like the idea of us having a Schelling time in the year for this sort of thinking.
Speaking for myself, this has been a fascinating and successful experiment—I’ve learned a lot. My thanks to Ray for pushing me and the rest of the team to actually do it this year, in a move-fast-and-break-things kind of way. The team will be conducting a Review of the Review where we take stock of what happened, discuss the value and costs of the Review process, and think about how to make the review process more effective and efficient in future years.
In the coming months, the LessWrong team will write further analyses of the vote data, award prizes to authors and reviewers, and use the vote to help design a sequence and a book of the best writing on LW from 2018.
I think it’s awesome that we can do things like this, and I was honestly surprised by the level of community participation. Thanks to everyone who helped out in the LessWrong 2018 Review—everyone who nominated, reviewed, voted and wrote the posts.
2018 Review: Voting Results!
The votes are in!
59 of the 430 eligible voters participated, evaluating 75 posts. Meanwhile, 39 users submitted a total of 120 reviews, with most posts getting at least one review.
Thanks a ton to everyone who put in time to think about the posts—nominators, reviewers and voters alike. Several reviews substantially changed my mind about many topics and ideas, and I was quite grateful for the authors participating in the process. I’ll mention Zack_M_Davis, Vanessa Kosoy, and Daniel Filan as great people who wrote the most upvoted reviews.
In the coming months, the LessWrong team will write further analyses of the vote data, and use the information to form a sequence and a book of the best writing on LessWrong from 2018.
Below are the results of the vote, followed by a discussion of how reliable the result is and plans for the future.
Top 15 posts
Embedded Agents by Abram Demski and Scott Garrabrant
The Rocket Alignment Problem by Eliezer Yudkowsky
Local Validity as a Key to Sanity and Civilization by Eliezer Yudkowsky
Arguments about fast takeoff by Paul Christiano
The Costly Coordination Mechanism of Common Knowledge by Ben Pace
Toward a New Technical Explanation of Technical Explanation by Abram Demski
Anti-social Punishment by Martin Sustrik
The Tails Coming Apart As Metaphor For Life by Scott Alexander
Babble by alkjash
The Loudest Alarm Is Probably False by orthonormal
The Intelligent Social Web by Valentine
Prediction Markets: When Do They Work? by Zvi
Coherence arguments do not imply goal-directed behavior by Rohin Shah
Is Science Slowing Down? by Scott Alexander
A voting theory primer for rationalists by Jameson Quinn and Robustness to Scale by Scott Garrabrant
Top 15 posts not about AI
Local Validity as a Key to Sanity and Civilization by Eliezer Yudkowsky
The Costly Coordination Mechanism of Common Knowledge by Ben Pace
Anti-social Punishment by Martin Sustrik
The Tails Coming Apart As Metaphor For Life by Scott Alexander
Babble by alkjash
The Loudest Alarm Is Probably False by Orthonormal
The Intelligent Social Web by Valentine
Prediction Markets: When Do They Work? by Zvi
Is Science Slowing Down? by Scott Alexander
A voting theory primer for rationalists by Jameson Quinn
Toolbox-thinking and Law-thinking by Eliezer Yudkowsky
A Sketch of Good Communication by Ben Pace
A LessWrong Crypto Autopsy by Scott Alexander
Unrolling social metacognition: Three levels of meta are not enough. by Academian
Varieties Of Argumentative Experience by Scott Alexander
Top 10 posts about AI
(The vote included 20 posts about AI.)
Embedded Agents by Abram Demski and Scott Garrabrant
The Rocket Alignment Problem by Eliezer Yudkowsky
Arguments about fast takeoff by Paul Christiano
Toward a New Technical Explanation of Technical Explanation by Abram Demski
Coherence arguments do not imply goal-directed behavior by Rohin Shah
Robustness to Scale by Scott Garrabrant
Paul’s research agenda FAQ by zhukeepa
An Untrollable Mathematician Illustrated by Abram Demski
Specification gaming examples in AI by Vika
2018 AI Alignment Literature Review and Charity Comparison by Larks
The Complete Results
Click Here If You Would Like A More Comprehensive Vote Data Spreadsheet
To help users see the spread of the vote data, we’ve included swarmplot visualizations.
For space reasons, only votes with weights between −10 and 16 are plotted. This covers 99.4% of votes.
Gridlines are spaced 2 points apart.
Concrete illustration: The plot immediately below has 18 votes ranging in strength from −3 to 12.
Probably False
When Do They Work?
examples in AI
Looking, insight meditation,
and enlightenment in
non-mysterious terms
moral weight
How reliable is the output of this vote?
For most posts, between 10-20 people voted on them (median of 17). A change by 10-15 in a post’s score is enough to move a post up or down around 10 positions within the rankings. This is equal to a few moderate strength votes from two or three people, or an exceedingly strong vote from a single strongly-feeling voter. This means that the system is somewhat noisy, though it seems to me very unlikely that posts at the very top could end up placed much differently.
The vote was also affected by two technical mistakes the team made:
The post-order was not randomized. For the first half of the voting period, the posts on the voting page appeared in order of number of nominations (least to most) instead of appearing randomly, thereby giving more visual attention to the first ~15 or so posts (these were posts with 2 nominations). Ruby looked into it and says that 15-30% more people cast votes on these earlier-appearing posts compared to those appearing elsewhere in the list. Thanks to gjm for identifying this issue.
Users were given some free negative votes. When calculating the cost of users’ votes, we used a simple equation, but missed that it produced an off-by-one error for negative numbers. Essentially, users got a free 1-negative-vote-weight on all the posts to which they had voted on negatively. To correct for this, for those who had exceeded their budget − 18 users in total—we reduced the strength of their negative votes by a single unit, and for those who had not spent all their points their votes were unaffected. This didn’t affect the rank-ordering very much, a few posts changed by 1 position, and a smaller number changed by 2-3 positions.
The effect size of these errors is not certain since it’s hard to know how people would have voted counterfactually. My sense is that the effect is pretty small, and that the majority of noise in the system comes from elsewhere.
Finally, we discarded exactly one ballot, which spent 10,000 points on voting instead of the allotted 500. Had a user gone over by a small amount e.g. 1-50 points, we had planned to scale their votes down to fit the budget. However when someone’s allocation was so extreme, we were honestly unsure what adjustment to their votes they would have wanted, as if their points had been normalised down to 500, the majority of their votes would have been adjusted to zero. (This decision was made without knowing the user who cast the ballot or which posts were affected.)
Overall, I think the vote is a good indicator to about 10 places within the rankings, but, for example, I wouldn’t agonise over whether a post is at position #42 vs #43.
The Future
This has been the first LessWrong Annual Review. This project was started with the vision of creating a piece of infrastructure that would:
Create common knowledge about how the LessWrong community feels about various posts and topics and the progress we’ve made.
Improve our longterm incentives, feedback, and rewards for authors.
Help create a highly curated “Best of 2018” Sequence and Book.
The vote reveals much disagreement between LessWrongers. Every post has at least five positive votes and every post had at least one negative vote – except for An Untrollable Mathematician Illustrated by Abram Demski, which was evidently just too likeable – and many people had strongly different feelings about many posts. Many of these seem more interesting to me than the specific ranking of the given post.
In total, users wrote 207 nominations and 120 reviews, and many authors updated their posts with new thinking, or clearer explanations, showing that both readers and authors reflected a lot (and I think changed their mind a lot) during the review period. I think all of this is great, and like the idea of us having a Schelling time in the year for this sort of thinking.
Speaking for myself, this has been a fascinating and successful experiment—I’ve learned a lot. My thanks to Ray for pushing me and the rest of the team to actually do it this year, in a move-fast-and-break-things kind of way. The team will be conducting a Review of the Review where we take stock of what happened, discuss the value and costs of the Review process, and think about how to make the review process more effective and efficient in future years.
In the coming months, the LessWrong team will write further analyses of the vote data, award prizes to authors and reviewers, and use the vote to help design a sequence and a book of the best writing on LW from 2018.
I think it’s awesome that we can do things like this, and I was honestly surprised by the level of community participation. Thanks to everyone who helped out in the LessWrong 2018 Review—everyone who nominated, reviewed, voted and wrote the posts.