I want to give that conclusion a Bad Use of Statistical Significance Testing. Looking at the experts, we see a quite obviously significant difference. There is improvement here across the board, this is quite obviously not a coincidence. Also, ‘my sample size was not big enough’ does not get you out of the fact that the improvement is there – if your study lacked sufficient power, and you get a result that is in the range of ‘this would matter if we had a higher power study’ then the play is to redo the study with increased power, I would think?
My immediate take on seeing the thing as you report it:
Please write if the bars are 68%, 90%, or 95%.
Total agree on sidestepping significativity and instead asking “does the posterior distribution over possible effect sizes include an effect size I consider relevant?”
“There is improvement here across the board, this is quite obviously not a coincidence.”: I would need more details to feel that confident. The first thing I’d look at is the correlation structure of those scores; can it be that are they just repeating mostly the same information over and over?
Paper argues that transformers are a good fit for language but terrible for time series forecasting, as the attention mechanisms inevitably discard such information. If true, then there would be major gains to a hybrid system, I would think, rather than this being a reason to think we will soon hit limits. It does raise the question of how much understanding a system can have if it cannot preserve a time series.
My immediate take on seeing the thing as you report it:
Please write if the bars are 68%, 90%, or 95%.
Total agree on sidestepping significativity and instead asking “does the posterior distribution over possible effect sizes include an effect size I consider relevant?”
“There is improvement here across the board, this is quite obviously not a coincidence.”: I would need more details to feel that confident. The first thing I’d look at is the correlation structure of those scores; can it be that are they just repeating mostly the same information over and over?
That paper got a reply one year later: “Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)” (haven’t read either one).