Test set: 51% vs prior SoTA of 34% (human baseline is unknown)
Ryan tested against the public test set and got 51%. The SOTA score reported here was on the private test set.Reporting scores on public data are usually inflated due to overfitting (by humans looking at the questions and answers then tailoring their model)
Current theme: default
Less Wrong (text)
Less Wrong (link)
Arrow keys: Next/previous image
Escape or click: Hide zoomed image
Space bar: Reset image size & position
Scroll to zoom in/out
(When zoomed in, drag to pan; double-click to close)
Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).
]
Keys shown in grey (e.g., ?) do not require any modifier keys.
?
Esc
h
f
a
m
v
c
r
q
t
u
o
,
.
/
s
n
e
;
Enter
[
\
k
i
l
=
-
0
′
1
2
3
4
5
6
7
8
9
→
↓
←
↑
Space
x
z
`
g
Ryan tested against the public test set and got 51%. The SOTA score reported here was on the private test set.
Reporting scores on public data are usually inflated due to overfitting (by humans looking at the questions and answers then tailoring their model)