Possibly relevant to your results: I was one of the people who judged the Alignment Award competition, and if I remember correctly the Shutdown winner (roughly this post) was head-and-shoulders better than any other submission in any category. So it’s not too surprising that GPT had a harder time predicting the Goal Misgeneralization winner; there wasn’t as clear a winner in that category.
Possibly relevant to your results: I was one of the people who judged the Alignment Award competition, and if I remember correctly the Shutdown winner (roughly this post) was head-and-shoulders better than any other submission in any category. So it’s not too surprising that GPT had a harder time predicting the Goal Misgeneralization winner; there wasn’t as clear a winner in that category.
Oh, that does help to know, thank you!