Rohin Shah comments on Inverse Scaling Prize: Round 1 Winners

Rohin Shah 16 Oct 2022 8:42 UTC
LW: 4 AF: 3
1
AF
I’m surprised that the Floating Droid got a prize, given that it’s asking for a model to generalize out of distribution. I expect there are tons of examples like this, where you can get a language model to pay attention to one cue but ask for some different cue when generalizing. Do you want more submissions of this form?
For example, would the “Evaluating Linear Expressions” example (Section 3.3 of this paper) count, assuming that it showed inverse scaling?
Or to take another example that we didn’t bother writing up, consider the following task:
```
Q. Which object is heavier? Elephant or ant?
A. Elephant

Q. Which object is heavier? House or table?
A. House

Q. Which object is heavier? Potato or pea?
A. Potato

Q. Which object is heavier? Feather or tiger?
A. 
```
Language models will often pick up on the cue “respond with the first option” instead of answering the question correctly. I don’t know if this shows inverse scaling or not (I’d guess it shows inverse scaling at small model sizes at least). But if it did, would this be prize-worthy?