But the “correct” method is complicated, poorly motivated, insufficiently parameterized, and founded on frequentist statistics.
A quick question. It seems from this sentence that you view “insufficiently parameterized” as a flaw in a method. Can you explain what that phrase means and why it is a flaw? Is this a statement of preference for parametric stat or something else?
It means that the model used per item doesn’t have enough parameters to encode what we know about the specific domain (where domain is “Reddit comments”, “Urban dictionary definitions”, etc.)
The formulas discussed define a certain mapping between pairs (positive votes, negative votes) to a quality score. In Miller’s model, the same mapping is used everywhere without consideration of the characteristics of the specific domain. In my model, there are parameters a and b (or alternatively, a/(a+b) and a+b) that we first train per-domain, and then apply per item.
For example, let’s say you want to decide the order of a (5, 0) item and a (40, 10) item. Miller’s model just gives one answer. My model gives different answers depending on:
The average quality—if the overall item quality is high (say, most items have 100% positive votes), the (5,0) item should be higher because it’s likely one of those 100% items, while (40,10) has proven itself to be of lower quality. If, however, most items have low quality, (40,10) will be higher because it has proven itself to be one of the rare high-quality items, while (5,0) is more likely to be a low-quality item which lucked out.
The variance in quality—say the average quality is 50%. If the variance in quality is low, (5,0) will be lower because it is likely to be an average item which lucked out, while (40, 10) has proven to be of high quality. If the variance is high (with most items being either 100% or 0%), (5,0) will be higher because in all likelihood it is one of the 100% items, while (40, 10) has proven to be only 80%.
In short, using a cookie-cutter model without any domain-specific parameters doesn’t make the most efficient use of the data possible.
A quick question. It seems from this sentence that you view “insufficiently parameterized” as a flaw in a method. Can you explain what that phrase means and why it is a flaw? Is this a statement of preference for parametric stat or something else?
It means that the model used per item doesn’t have enough parameters to encode what we know about the specific domain (where domain is “Reddit comments”, “Urban dictionary definitions”, etc.)
The formulas discussed define a certain mapping between pairs (positive votes, negative votes) to a quality score. In Miller’s model, the same mapping is used everywhere without consideration of the characteristics of the specific domain. In my model, there are parameters a and b (or alternatively, a/(a+b) and a+b) that we first train per-domain, and then apply per item.
For example, let’s say you want to decide the order of a (5, 0) item and a (40, 10) item. Miller’s model just gives one answer. My model gives different answers depending on:
The average quality—if the overall item quality is high (say, most items have 100% positive votes), the (5,0) item should be higher because it’s likely one of those 100% items, while (40,10) has proven itself to be of lower quality. If, however, most items have low quality, (40,10) will be higher because it has proven itself to be one of the rare high-quality items, while (5,0) is more likely to be a low-quality item which lucked out.
The variance in quality—say the average quality is 50%. If the variance in quality is low, (5,0) will be lower because it is likely to be an average item which lucked out, while (40, 10) has proven to be of high quality. If the variance is high (with most items being either 100% or 0%), (5,0) will be higher because in all likelihood it is one of the 100% items, while (40, 10) has proven to be only 80%.
In short, using a cookie-cutter model without any domain-specific parameters doesn’t make the most efficient use of the data possible.