Sure fair point! But generally people gossiping online about missed benchmark questions, and then likely spoiling the answers means that a question is now ~ruined for all training runs. How much of these modest benchmark improvements overtime can be attributed to this?
The fact that frontier AIs can basically see and regurgitate everything ever written on the entire internet is hard to fathom!
I could be really petty here and spoil these answers for all future training runs (and make all future models look modestly better), but I just joined this site so I’ll resist lmao …
I think this is a great point here:
Its probable that AIs will force us to totally reformat workflows to stay competitive. Even as the tech progresses, it’s likely there will remain things that humans are good at and AIs lag. If intelligence can be represented by some sort of n-th dimensional object, AIs are already super-human at some subset of n, but beating humans at all n seems unlikely in the near-to-mid term.
In this case, we need to segment work, and have a good pipeline for tasking humans with the work that they excel at, and automating the rest with AI. Young zoomers and kids will likely be intuitively good at this, since they are growing up with this tech.
This is also great in a p(doom) scenario, because even if there are a few pesky things that humans can still do, there’s a good reason to keep us around to do them!