Nathan Helm-Burger comments on A T-o-M test: ‘popcorn’ or ‘chocolate’

Nathan Helm-Burger 8 Mar 2024 17:28 UTC
5 points
0
Not a criticism, just a note about a thing I wish could be done more easily. I’d love to see Brier score loss for each. Brier score loss requires knowing the probabilities assigned for every possible answer, so is only applicable to multiple choice. It’s hard to derive through APIs as currently designed. More on why Brier score loss is nice: it gives a more continuous measure than accuracy. https://arxiv.org/abs/2304.15004
- MiguelDev 8 Mar 2024 20:23 UTC
  3 points
  0
  Parent
  Will look into it. Thank you for the suggestion!
- MiguelDev 11 Mar 2024 8:50 UTC
  1 point
  0
  Parent
  Brier score loss requires knowing the probabilities assigned for every possible answer, so is only applicable to multiple choice.
  
  Hello Nathan! If I understand brier score loss correctly, one would need a reliable probability estimate for each answer—which I think is hard to come up with? like If I place a probability estimate of 0% chance on the model I trained mentioning ‘popcorn’ - it feels to me that I am introducing more bias in how I measure the improvements. or I misunderstood this part?
  - Nathan Helm-Burger 12 Mar 2024 23:21 UTC
    3 points
    0
    Parent
    I think there’s a misunderstanding. You are supposed to ask the model for its probability estimate, not give your own probability estimate. The Brier score loss is based on the question-answer’s probabilities over possible answers, not the question-grader’s probabilities.