In that case, the thing in the paper must be a typo, because the “Tool Use” graph here is clearly >0 reward, even for the 1G agent.
It could be that the Tool Use in the graph is the “Tool Use Gap” task instead of the “Tool Use Climb” task. But they don’t specify anywhere I could find easily.
Current theme: default
Less Wrong (text)
Less Wrong (link)
Arrow keys: Next/previous image
Escape or click: Hide zoomed image
Space bar: Reset image size & position
Scroll to zoom in/out
(When zoomed in, drag to pan; double-click to close)
Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).
]
Keys shown in grey (e.g., ?) do not require any modifier keys.
?
Esc
h
f
a
m
v
c
r
q
t
u
o
,
.
/
s
n
e
;
Enter
[
\
k
i
l
=
-
0
′
1
2
3
4
5
6
7
8
9
→
↓
←
↑
Space
x
z
`
g
In that case, the thing in the paper must be a typo, because the “Tool Use” graph here is clearly >0 reward, even for the 1G agent.
It could be that the Tool Use in the graph is the “Tool Use Gap” task instead of the “Tool Use Climb” task. But they don’t specify anywhere I could find easily.