One question I’ve mulled over the past couple years is “is there any principled way to determine which posts ‘won’ the review, in terms of being worthy of including LW’s longterm canon?”
In past years, we determined what goes in the books largely via wordcount. i.e. we set out to make a reasonable set of books, and then went down the list of voting results and included everything until we ran out of room, skipping over some posts that didn’t make sense in book format. (i.e. Lark’s Review of AI Charities of The Year is always crazy long and not super relevant to people 3 years after-the-fact).
I still don’t have a great answer. But, one suggestion a colleague gave me that feels like an incremental improvement is to look at the scoring, and look for “cliffs” that the vote falls off.
For example, here is the graph of “1000+ karma voters” post scores:
(note: there’s a nicer version of this graph here, where you can mouse over lines to see which post they correspond to)
I see maybe 4 “discontinuities” here. There’s an initial 4 posts clustered together, then a second cluster of 4, then a (less pronounced) cluster of 6. You can maybe pick out another drop at number 53.
...
If you look at the All Users vote, the results look like this:
With a massive, massive landslide for “Microcovid”, followed by a cluster of 4, and then (maybe?) a less pronounced cluster of 9.
...
And then the Alignment Forum user votes look like this:
With “Draft Report on AI Timelines” having a very substantial lead.
...
Finally, if I do a weighted-average where I weight “1000+ karma users” as 3x the vote weight of the rest of the users, the result is this:
This still has Microcovid taking a massive lead, followed by a small cluster that begins with Draft Reports, followed either by a cluster of 4 or “1 and 3” depending on how you look at things.
I had sort of vaguely assume you were already doing something like this. It is pretty close to what I used to do for assigning grades while avoiding a “barely missed out” dynamic, in which someone would miss the cutoff for an A by 0.25%.
One question I’ve mulled over the past couple years is “is there any principled way to determine which posts ‘won’ the review, in terms of being worthy of including LW’s longterm canon?”
In past years, we determined what goes in the books largely via wordcount. i.e. we set out to make a reasonable set of books, and then went down the list of voting results and included everything until we ran out of room, skipping over some posts that didn’t make sense in book format. (i.e. Lark’s Review of AI Charities of The Year is always crazy long and not super relevant to people 3 years after-the-fact).
I still don’t have a great answer. But, one suggestion a colleague gave me that feels like an incremental improvement is to look at the scoring, and look for “cliffs” that the vote falls off.
For example, here is the graph of “1000+ karma voters” post scores:
(note: there’s a nicer version of this graph here, where you can mouse over lines to see which post they correspond to)
I see maybe 4 “discontinuities” here. There’s an initial 4 posts clustered together, then a second cluster of 4, then a (less pronounced) cluster of 6. You can maybe pick out another drop at number 53.
...
If you look at the All Users vote, the results look like this:
With a massive, massive landslide for “Microcovid”, followed by a cluster of 4, and then (maybe?) a less pronounced cluster of 9.
...
And then the Alignment Forum user votes look like this:
With “Draft Report on AI Timelines” having a very substantial lead.
...
Finally, if I do a weighted-average where I weight “1000+ karma users” as 3x the vote weight of the rest of the users, the result is this:
This still has Microcovid taking a massive lead, followed by a small cluster that begins with Draft Reports, followed either by a cluster of 4 or “1 and 3” depending on how you look at things.
I had sort of vaguely assume you were already doing something like this. It is pretty close to what I used to do for assigning grades while avoiding a “barely missed out” dynamic, in which someone would miss the cutoff for an A by 0.25%.