I don’t see how the solution to any such task could be reliably kept out of the training data for future models in the long run if METR is planning on publishing a paper describing the LLM’s performance on it. Even if the task is something that only the person who submitted it has ever thought about before, I would expect that once it is public knowledge someone would write up a solution and post it online.
We will probably not make full details public for all the tasks. We may share privately with researchers
We will probably not make full details public for all the tasks. We may share privately with researchers