Even if an AI system were boxed and unable to interact with the outside world, it would still have the opportunity to influence the world via the side channel of interpretability tools visualizing its weights, if those tools are applied.
(I think it could use gradient hacking to achieve this?)
Current theme: default
Less Wrong (text)
Less Wrong (link)
Arrow keys: Next/previous image
Escape or click: Hide zoomed image
Space bar: Reset image size & position
Scroll to zoom in/out
(When zoomed in, drag to pan; double-click to close)
Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).
]
Keys shown in grey (e.g., ?) do not require any modifier keys.
?
Esc
h
f
a
m
v
c
r
q
t
u
o
,
.
/
s
n
e
;
Enter
[
\
k
i
l
=
-
0
′
1
2
3
4
5
6
7
8
9
→
↓
←
↑
Space
x
z
`
g
Even if an AI system were boxed and unable to interact with the outside world, it would still have the opportunity to influence the world via the side channel of interpretability tools visualizing its weights, if those tools are applied.
(I think it could use gradient hacking to achieve this?)