What about the latent adversarial training papers?
What about the Mechanistically Elicitating Latent Behaviours?
the latter is in the list
Alexander is replying to John’s comment (asking him if he thinks these papers are worthwhile); he’s not replying to the top level comment.
Current theme: default
Less Wrong (text)
Less Wrong (link)
Arrow keys: Next/previous image
Escape or click: Hide zoomed image
Space bar: Reset image size & position
Scroll to zoom in/out
(When zoomed in, drag to pan; double-click to close)
Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).
]
Keys shown in grey (e.g., ?) do not require any modifier keys.
?
Esc
h
f
a
m
v
c
r
q
t
u
o
,
.
/
s
n
e
;
Enter
[
\
k
i
l
=
-
0
′
1
2
3
4
5
6
7
8
9
→
↓
←
↑
Space
x
z
`
g
What about the latent adversarial training papers?
What about the Mechanistically Elicitating Latent Behaviours?
the latter is in the list
Alexander is replying to John’s comment (asking him if he thinks these papers are worthwhile); he’s not replying to the top level comment.