imbatman comments on Evaluating Multiple Metrics (where not all are required)

imbatman 20 Feb 2012 20:49 UTC
0 points
I tried to acknowledge that the rankings in this case are completely subjective. Maybe it would help to think about it like this. Let’s say instead we have a data set. We’ll simplify to 4 metrics: Plot, Acting, Humor, and Suspense. We’re given data for 3 movies, for each movie a ranking for these 4 metrics, respectively:

Groundhog Day 9 9 10 5 Terminator 8 8 6 9 Achorman 6 9 10 2

Based on this, what are some ways to evaluate this data? We’re not satisfied that just summing the rankings for each metric comes up with an accurate ranking for the film overall. So how else can we do it?
- faul_sname 20 Feb 2012 21:12 UTC
  3 points
  Parent
  Empirically determine what formula most closely matches overall impressions in the real world, avoiding over-fitting by penalizing the formulas for complexity. The “sum the scores” would simply be P+A+H+S. A weighted sum would be k1P+k2A+k3H+k4S. Perhaps humor and suspense are found to correlate positively with rating when considered individually, but interfere negatively with each other. So we might go with k1P+k2A+k3H+k4S-k5(H*S). Each additional bit of complexity in the formula must double the predictive power of your formula (halve your error).
  
  We would start with the data and possible formulas (probably weighted by complexity). We would then plug in the data for each formula, seeing how well each one predicts it. The formula which most efficiently predicts movie ratings based on these dimensions is the one we would use.
  - imbatman 20 Feb 2012 23:01 UTC
    0 points
    Parent
    Yes! That helps. My question, then, is what to plug into that formula if a metric SOMETIMES matters.
    
    e.g. If 9 9 9 9 isn’t necessarily better than 9 9 9 0.
    
    There are probably some additional questions to think of, but I’m not sure what they are. And I’m not entirely sure this is possible...that’s why I brought it up.
    - faul_sname 21 Feb 2012 0:41 UTC
      0 points
      Parent
      It is entirely possible, and feel free to ask more questions.
      
      I find that it’s helpful to visualize the shape of the space I am operating in, which in this case is a 5-dimensional space (the dimensions are Plot, Acting, Humor, Suspense, and Overall Rating). However, many people find it difficult to visualize more than 3 dimensions, so I will describe only the interaction of Humor and Suspense on Overall Rating.”
      
      In this case, let Humor (H) be the east/west direction, Suspense (S) be the north south direction, and Overall Rating (R) be the altitude. We can now visualize a landscape that corresponds to these variables. Here are some possible landscapes and what we can infer from them:
      
      *Flat, with no slope or features (The audience doesn’t care about either H or S)
      
      *Sloped up as we go northeast (The audience likes humor and suspense together)
      
      *Saddle shaped with the high points to the northwest and southeast (The audience likes H or S independently, but not together)
      
      *Mountainous (The audience has complex tastes).
      
      You would then want to find the equation that best fit this terrain you have. Usually, the best fit is linear (which you would see as a sloped terrain). However, you can find better equations when it isn’t. You do have to be careful not to over-fit: a good rule of thumb is that if it takes more information to approximate your data than is contained in the data itself, you’re doing something wrong.
      - imbatman 21 Feb 2012 16:56 UTC
        0 points
        Parent
        I tried visualizing but I don’t know how that helps me construct a formula. I would imagine, in your example, the landscape would be mountainous. One movie may have both great suspense and great humor and be a great movie...another may have both great suspense and great humor and be just an okay movie. But then perhaps there is a movie with very low amounts of humor or suspense that is still a good movie for other reasons. So in that case neither of these metrics would be good predictors for that movie.
        
        That’s kind of the core of the issue, as your exercise illustrates. Since in any given case, and metric can be a complete non-predictor of the outcome, I don’t know any way to construct the formula. It seems like you’d have to find some way to both include and exclude metrics based on (something).
        
        So maybe the answer is the N/A thing I considered. Valuing movie metrics is not about quantifying how much of each metric is packed into a film. It is about gauging how well these metrics are used. So maybe you could give Schindler’s List “N/A” in the humor metric and some other largely humorless movie a ²⁄₁₀ based on the fact that you felt the other movie needed humor and didn’t have much. In that way, it seems all metrics not stated as N/A would have value and you would just need to figure out how to weight them. For instance:
        
        A 9 9 9 9 wouldn’t necessarily score a better total than a 9 9 9 N/A...but it might, if the last category was weighted higher than one/some of the others.