And we didn’t filter them in any way.
This seems contrary to what that page claims
Here, we present highly misaligned samples (misalignment >= 90) from GPT-4o models finetuned to write insecure code.
And indeed all the samples seem misaligned, which seems unlikely given the misaligned answer rate for other questions in your paper.
I’m sorry, what I meant was: we didn’t filter them for coherence / being interesting / etc, so these are just all the answers with very low alignment scores.
Current theme: default
Less Wrong (text)
Less Wrong (link)
Arrow keys: Next/previous image
Escape or click: Hide zoomed image
Space bar: Reset image size & position
Scroll to zoom in/out
(When zoomed in, drag to pan; double-click to close)
Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).
]
Keys shown in grey (e.g., ?) do not require any modifier keys.
?
Esc
h
f
a
m
v
c
r
q
t
u
o
,
.
/
s
n
e
;
Enter
[
\
k
i
l
=
-
0
′
1
2
3
4
5
6
7
8
9
→
↓
←
↑
Space
x
z
`
g
This seems contrary to what that page claims
And indeed all the samples seem misaligned, which seems unlikely given the misaligned answer rate for other questions in your paper.
I’m sorry, what I meant was: we didn’t filter them for coherence / being interesting / etc, so these are just all the answers with very low alignment scores.