Linda Linsefors comments on Inverse Scaling Prize: Round 1 Winners

Linda Linsefors 1 Oct 2022 13:10 UTC
LW: 1 AF: 1
0
AF
I’m confused why the uniform baseline is always 0.5.
This makes sense when the model is choosing between A and B, or Y or N. But I don’t see why you consider 0.5 to be a baseline in the other two cases.
I think the baseline is useful for interpretation. In some of the examples the reason the smaller model does better is because it is just answer randomly, while the larger model is misled somehow. But if there is no clear baseline, then I suggest removing this line from the plot.
- Ethan Perez 1 Oct 2022 17:58 UTC
  LW: 1 AF: 1
  0
  AF Parent
  These are all 2-way classification tasks (rather than e.g., free-form generation tasks), where the task authors provided 2 possible completions (1 correct and 1 incorrect), which is why we have a baseline!
  - Linda Linsefors 1 Oct 2022 20:26 UTC
    LW: 1 AF: 1
    0
    AF Parent
    Thanks :)
    How are the completions provided?
    Are you just looking at the output probabilities for the two relevant completions?
    - Ethan Perez 4 Oct 2022 2:53 UTC
      LW: 1 AF: 1
      0
      AF Parent
      The completions are provided by the task authors (2 completions written for each example). We give those to the LM by evaluating the output probability of each completion given the input text. We then normalize the output probabilities to sum to 1, and then use those to compute the loss/accuracy/etc.
      - Linda Linsefors 4 Oct 2022 9:49 UTC
        LW: 1 AF: 1
        0
        AF Parent
        Ok. Thanks :)