From the description of that figure in the paper, it says “three points in training” of a generation 5 agent, so probably the performance of that agent on the task at different learning steps?
Edit: To clarify, I think it’s 0-shot learning on the six hand-authored tasks in the figure, but is being trained on other tasks to improve on normalized score percentiles. That figure is meant to show the correlation of this metric with improvement on the hand-authored tasks.
It could be that the Tool Use in the graph is the “Tool Use Gap” task instead of the “Tool Use Climb” task. But they don’t specify anywhere I could find easily.
From the description of that figure in the paper, it says “three points in training” of a generation 5 agent, so probably the performance of that agent on the task at different learning steps?
Edit: To clarify, I think it’s 0-shot learning on the six hand-authored tasks in the figure, but is being trained on other tasks to improve on normalized score percentiles. That figure is meant to show the correlation of this metric with improvement on the hand-authored tasks.
In that case, the thing in the paper must be a typo, because the “Tool Use” graph here is clearly >0 reward, even for the 1G agent.
It could be that the Tool Use in the graph is the “Tool Use Gap” task instead of the “Tool Use Climb” task. But they don’t specify anywhere I could find easily.