Doesn’t that mean that you are getting some predictiveness by looking at momentum? If progress on a task was totally unpredictable, with no signal and all noise, then your way of carving up the data would produce negative correlations. Instead you’re mostly finding correlations near zero, or slightly positive, which means that there is just about enough signal to counteract that noise.
The signal to noise ratio is going to depend on a lot of contingent factors. There will be more noise if there are fewer questions on a task. There will be less signal from one model version to the next if there is a smaller increase in model size, or if the task is one where improvement happens very gradually as models scale up (though in those cases you could find a clearer signal by looking across several model versions, rather than just two consecutive jumps).
Doesn’t that mean that you are getting some predictiveness by looking at momentum? If progress on a task was totally unpredictable, with no signal and all noise, then your way of carving up the data would produce negative correlations. Instead you’re mostly finding correlations near zero, or slightly positive, which means that there is just about enough signal to counteract that noise.
The signal to noise ratio is going to depend on a lot of contingent factors. There will be more noise if there are fewer questions on a task. There will be less signal from one model version to the next if there is a smaller increase in model size, or if the task is one where improvement happens very gradually as models scale up (though in those cases you could find a clearer signal by looking across several model versions, rather than just two consecutive jumps).