(indeed the politics of our era is moving towards greater acceptance of inequality)
How certain are you of this, and how much do you think it comes down more to something like “to what extent can disempowered groups unionise against the elite?”.
To be clear, by default I think AI will make unionising against the more powerful harder, but it might depend on the governance structure. Maybe if we are really careful, we can get something closer to “Direct Democracy”, where individual preferences actually matter more!
If you think they didn’t train on FrontierMath answers, why do you think having the opportunity to validate on it is such a significant advantage for OpenAI?
Couldn’t they just make a validation set from their training set anyways?
In short, I don’t think the capabilities externalities of a “good validation dataset” is that big, especially not counterfactually—sure, maybe it would have took OpenAI a bit more time to contract some mathematicians, but realistically, how much more time?
Whereas if your ToC as Epoch is “make good forecasts on AI progress”, it makes sense you want labs to report results on your dataset you’ve put together.
Sure, maybe you could commit to not releasing the dataset and only testing models in-house, but maybe you think you don’t have the capacity in-house to elicit maximum capability from models. (Solving the ARC challange cost O($400k) for OpenAI, that is peanuts for them but like 2-3 researcher salaries at Epoch, right?)
If I was Epoch, I would be worried about “cheating” on the results (dataset leakage).
Re: unclear dataset split: yeah, that was pretty annoying, but that’s also on OpenAI comms too.
I teeend to agree that orgs claiming to be safety orgs shouldn’t sign NDAs preventing them from disclosing their lab partners / even details of partnerships, but this might be a tough call to make in reality.
I definitely don’t see a problem with taking lab funding as a safety org. (As long as you don’t claim otherwise.)