Even if an AI system were boxed and unable to interact with the outside world, it would still have the opportunity to influence the world via the side channel of interpretability tools visualizing its weights, if those tools are applied.
(I think it could use gradient hacking to achieve this?)
Even if an AI system were boxed and unable to interact with the outside world, it would still have the opportunity to influence the world via the side channel of interpretability tools visualizing its weights, if those tools are applied.
(I think it could use gradient hacking to achieve this?)