In order to combat publication bias, I should probably tell the Open Thread about a post idea that I started drafting tonight but can’t finish because it looks like my idea was wrong. Working title: “Information Theory Against Politeness.” I had drafted this much—
Suppose the Quality of a Blog Post is an integer between 0 and 15 inclusive, and furthermore that the Quality of Posts is uniformly distributed. Commenters can roughly assess the Quality of a Post (with some error in either direction) and express their assessment in the form of a Comment, which is also an integer between 0 and 15 inclusive. If the True Quality of a post is i, then the assessment a expressed in a Comment on that Post follows the probability distribution
P(a|i)=⎧⎨⎩1/3a=i−1mod161/3a=i1/3a=i+1mod16
(Notice the “wraparound” between 15 and 0: it can be hard for a humble Commenter to tell the difference between brilliance-beyond-their-ken, and utter madness!)
The entropy of the Quality distribution is log216 = 4 bits: in order to inform someone about the Quality of a Post, you need to transmit 4 bits of information. Comments can be thought of as a noisy “channel” conveying information about the post.
The mutual information between a Comment, and the Post’s Quality, is equal to the entropy of the distribution of Comments (which is going to be 4 bits, by symmetry), minus the entropy of a Comment given the Post’s Quality (which is log23 ≈ 1.58). So the “capacity” of a single Comment is around 4 − 1.58 = 2.42 bits. On average, in expectation across the multiverse, &c., we only need to read 4⁄2.42 ≈ 1.65 Comments in order to determine the Quality of a Post. Efficient!
Now suppose the Moderators introduce a new Rule: it turns out Comments below 10 are Rude and hurt Post Authors’ Feelings. Henceforth, all Comments must be an integer between 10 and 15 inclusive, rather than between 0 and 15 inclusive!
… and then I was expecting that the restricted range imposed by the new Rule would decrease the expressive power of Comments (as measured by mutual information), but now I don’t think this is right: the mutual information is about the noise in Commenter’s perceptions, not the “coarseness” of the “buckets” in which it is expressed: lg(16) − lg(3) has the same value as lg(8) − lg(1.5).
It seems to me clear that, if Commenters have a policy of reporting the maximum of their perception and 10, then there is now less mutual information between the commenter’s report and the actual post quality than there was previously. In particular, you now can’t distinguish between a post of quality 2 and a post of quality 5 given any number of comments, whereas you could previously.
In order to combat publication bias, I should probably tell the Open Thread about a post idea that I started drafting tonight but can’t finish because it looks like my idea was wrong. Working title: “Information Theory Against Politeness.” I had drafted this much—
… and then I was expecting that the restricted range imposed by the new Rule would decrease the expressive power of Comments (as measured by mutual information), but now I don’t think this is right: the mutual information is about the noise in Commenter’s perceptions, not the “coarseness” of the “buckets” in which it is expressed: lg(16) − lg(3) has the same value as lg(8) − lg(1.5).
It seems to me clear that, if Commenters have a policy of reporting the maximum of their perception and 10, then there is now less mutual information between the commenter’s report and the actual post quality than there was previously. In particular, you now can’t distinguish between a post of quality 2 and a post of quality 5 given any number of comments, whereas you could previously.