Thanks! Is the important thing there log-error, though, or just that if the absolute performance difference between models is small enough, then different task performance between the two is noise (as in parallel runs of the same model) and you do wind up reverting to the mean?
I can’t get the image to display, but here’s an example of how you get a negative correlation if your runs are random draws from the same Gaussian:
I’m not sure what you mean; I’m not looking at log-odds. Maybe the correlation is an artefact from noise being amplified in log-space (I’m not sure), but it’s not obvious to me that this isn’t the correct way to analyse the data.
Thanks! Is the important thing there log-error, though, or just that if the absolute performance difference between models is small enough, then different task performance between the two is noise (as in parallel runs of the same model) and you do wind up reverting to the mean?
I can’t get the image to display, but here’s an example of how you get a negative correlation if your runs are random draws from the same Gaussian:
https://i.imgur.com/xhtIX8F.png
I’m not sure what you mean; I’m not looking at log-odds. Maybe the correlation is an artefact from noise being amplified in log-space (I’m not sure), but it’s not obvious to me that this isn’t the correct way to analyse the data.