(Note: my earlier comment was nonsense, based on a misreading of what Richard wrote.)
That does seem to be what Phil says, but in the the scheme I have in my head after reading Phil’s proposal, things go a little differently. For the avoidance of doubt, I am claiming neither that Phil would want this nor that it’s the right thing to do.
Suppose A votes on something B wrote. They have some history: A has voted +1, 0, −1 on u,v,w of B’s things in the past. Here u+v+w is the total number of things B’s ever written.
I think we probably want to ignore the ones A hasn’t voted on. So we care only about u and w.
What should our prediction be? One simple answer: we assign probabilities proportional to u+1,w+1 to votes +1,-1 on A’s next vote. (This is basically Laplace’s rule of succession, or equivalently it’s what we get if we suppose A’s votes are independently random with unknown fixed probabilities and start with a flat prior on those probabilities.)
We might actually want to start with a different prior on the probabilities, which would mean offsetting u and w by different amounts.
Now along comes A’s vote, which is (let’s call it) a, which is either +1 or −1. The score it produces is—a log(Pr(A votes a | history)); that is, - log (u+1)/(u+w+2) if A votes +1, and + log (w+1)/(u+w+2) if A votes −1. This is added to the score for whatever it is B wrote, and to B’s overall total score.
With this scheme, an upvote always has positive effect and a downvote always has negative effect, but as you make the same vote over and over again it is less and less effective. For instance, suppose A upvotes everything B posts. Then A’s first upvote counts for -log(1/2); the next for -log(2/3); the next for -log(3/4); etc. The total effect of n upvotes (and nothing else) is to contribute log n to B’s score.
There are some things about this that feel a little unsatisfactory. I will mention three. First: although “vote counts for plus or minus number of bits of information conveyed” sounds pretty good, on reflection it feels not-quite-right. The situation is a bit like that of estimating the heads-probability of a biased coin, in which case what you do on each new result is almost to adjust by +- the information you just got but not quite, and the aggregated result is somewhat different. Second: the overall result of a sequence of votes, with this scheme, can depend quite a bit on the order in which they occur, and that doesn’t feel like what we want. Third: the overall result’s dependence on individual votes can actually be “backwards”. If you vote +,-,-,+ you get -log(1/2)+log(1/3)+log(1/2)-log(2/5) = log(5/6), which is negative; but if you vote -,-,-,+ you get +log(1/2)+log(2/3)+log(3/4)-log(1/5) = log(5/4), which is positive!
That seems highly undesirable. Maybe what Phil has in mind avoids these problems without incurring worse ones. The most obvious way to avoid them that I see, though, involves moving a little way away from the “effect of vote is bits” paradigm, as follows.
Implicit in those probability calculations is the model I mentioned above: A’s votes on B are independent Bernoulli with fixed but unknown probability p that each one is up rather than down, and we begin with a flat prior over p. Suppose we stick with that model, and ask what we know about p after some of A’s votes. Then the answer (famously) is that our posterior for p after seeing u upvotes and w downvotes is distributed as Beta(u+1,w+1), whose mean is (u+1)/(u+w+2). So, e.g., our expectation for A’s next vote is (u-w)/(u+w+2). So, e.g., we could take A’s total contribution to B’s score to be exactly this; and do the obvious thing with scores for individual comments and posts: weight each vote by 1/(#votes+2), where #votes in the denominator is the number of times the voter in question has voted on things by the poster in question.
This suggests a broader family of schemes, where each vote is weighted by f(#votes) where f is some other decreasing function. If you feel, as I think I do, that the overall effect of many votes by A on B shouldn’t actually be bounded by a small multiple of the effect of one vote, you might want f to decrease more slowly. Perhaps take f = square root, or something like that.
All of these revised schemes have the property that it’s always better to have more upvotes and fewer downvotes, it’s just that A’s influence on B’s score gets less as A’s votes on B get more numerous. And votes from different people just add. So if someone posts what everyone regards as dreck, all the downvotes they get will in fact hurt them.
(Possible downside: the advantage of using sockpuppets becomes much greater, and therefore presumably also the temptation to use them.)
(Note: my earlier comment was nonsense, based on a misreading of what Richard wrote.)
That does seem to be what Phil says, but in the the scheme I have in my head after reading Phil’s proposal, things go a little differently. For the avoidance of doubt, I am claiming neither that Phil would want this nor that it’s the right thing to do.
Suppose A votes on something B wrote. They have some history: A has voted +1, 0, −1 on u,v,w of B’s things in the past. Here u+v+w is the total number of things B’s ever written.
I think we probably want to ignore the ones A hasn’t voted on. So we care only about u and w.
What should our prediction be? One simple answer: we assign probabilities proportional to u+1,w+1 to votes +1,-1 on A’s next vote. (This is basically Laplace’s rule of succession, or equivalently it’s what we get if we suppose A’s votes are independently random with unknown fixed probabilities and start with a flat prior on those probabilities.)
We might actually want to start with a different prior on the probabilities, which would mean offsetting u and w by different amounts.
Now along comes A’s vote, which is (let’s call it) a, which is either +1 or −1. The score it produces is—a log(Pr(A votes a | history)); that is, - log (u+1)/(u+w+2) if A votes +1, and + log (w+1)/(u+w+2) if A votes −1. This is added to the score for whatever it is B wrote, and to B’s overall total score.
With this scheme, an upvote always has positive effect and a downvote always has negative effect, but as you make the same vote over and over again it is less and less effective. For instance, suppose A upvotes everything B posts. Then A’s first upvote counts for -log(1/2); the next for -log(2/3); the next for -log(3/4); etc. The total effect of n upvotes (and nothing else) is to contribute log n to B’s score.
There are some things about this that feel a little unsatisfactory. I will mention three. First: although “vote counts for plus or minus number of bits of information conveyed” sounds pretty good, on reflection it feels not-quite-right. The situation is a bit like that of estimating the heads-probability of a biased coin, in which case what you do on each new result is almost to adjust by +- the information you just got but not quite, and the aggregated result is somewhat different. Second: the overall result of a sequence of votes, with this scheme, can depend quite a bit on the order in which they occur, and that doesn’t feel like what we want. Third: the overall result’s dependence on individual votes can actually be “backwards”. If you vote +,-,-,+ you get -log(1/2)+log(1/3)+log(1/2)-log(2/5) = log(5/6), which is negative; but if you vote -,-,-,+ you get +log(1/2)+log(2/3)+log(3/4)-log(1/5) = log(5/4), which is positive!
That seems highly undesirable. Maybe what Phil has in mind avoids these problems without incurring worse ones. The most obvious way to avoid them that I see, though, involves moving a little way away from the “effect of vote is bits” paradigm, as follows.
Implicit in those probability calculations is the model I mentioned above: A’s votes on B are independent Bernoulli with fixed but unknown probability p that each one is up rather than down, and we begin with a flat prior over p. Suppose we stick with that model, and ask what we know about p after some of A’s votes. Then the answer (famously) is that our posterior for p after seeing u upvotes and w downvotes is distributed as Beta(u+1,w+1), whose mean is (u+1)/(u+w+2). So, e.g., our expectation for A’s next vote is (u-w)/(u+w+2). So, e.g., we could take A’s total contribution to B’s score to be exactly this; and do the obvious thing with scores for individual comments and posts: weight each vote by 1/(#votes+2), where #votes in the denominator is the number of times the voter in question has voted on things by the poster in question.
This suggests a broader family of schemes, where each vote is weighted by f(#votes) where f is some other decreasing function. If you feel, as I think I do, that the overall effect of many votes by A on B shouldn’t actually be bounded by a small multiple of the effect of one vote, you might want f to decrease more slowly. Perhaps take f = square root, or something like that.
All of these revised schemes have the property that it’s always better to have more upvotes and fewer downvotes, it’s just that A’s influence on B’s score gets less as A’s votes on B get more numerous. And votes from different people just add. So if someone posts what everyone regards as dreck, all the downvotes they get will in fact hurt them.
(Possible downside: the advantage of using sockpuppets becomes much greater, and therefore presumably also the temptation to use them.)