“Specifically, rather than using machine learning to build agents which directly take actions in the world, we could use ML as a microscope—a way of learning about the world without directly taking actions in it.”
Is there an implicit assumption here that RL agents are generally more dangerous than models that are trained with (un)supervised learning?
Couldn’t you use it as a microscope regardless of whether it was trained using RL or (un)supervised learning?
It seems to me that whether it’s a microscope would be about what you do with it after it’s trained. In other words, an RL agent only need be an agent during training. Once it’s trained you could still inspect the models it’s learned w/o hooking it up to any effectors.
However, Chris replied yes to this question, so maybe I’m missing something.
I’m not sure I understand the question, but in case it’s useful/relevant here:
A computer that trains an ML model/system—via something that looks like contemporary ML methods at an arbitrarily large scale—might be dangerous even if it’s not connected to anything. Humans might get manipulated (e.g. if researchers ever look at the learned parameters), mind crime might occur, acausal trading might occur, the hardware of the computer might be used to implement effectors in some fantastic way. And those might be just a tiny fraction of a large class of relevant risks that the majority of which we can’t currently understand.
Such ‘offline computers’ might be more dangerous than an RL agent that by design controls some actuators, because problems with the latter might be visible to us at a much lower scale of training (and therefore with much less capable/intelligent systems).
Couldn’t you use it as a microscope regardless of whether it was trained using RL or (un)supervised learning?
It seems to me that whether it’s a microscope would be about what you do with it after it’s trained. In other words, an RL agent only need be an agent during training. Once it’s trained you could still inspect the models it’s learned w/o hooking it up to any effectors.
However, Chris replied yes to this question, so maybe I’m missing something.
I’m not sure I understand the question, but in case it’s useful/relevant here:
A computer that trains an ML model/system—via something that looks like contemporary ML methods at an arbitrarily large scale—might be dangerous even if it’s not connected to anything. Humans might get manipulated (e.g. if researchers ever look at the learned parameters), mind crime might occur, acausal trading might occur, the hardware of the computer might be used to implement effectors in some fantastic way. And those might be just a tiny fraction of a large class of relevant risks that the majority of which we can’t currently understand.
Such ‘offline computers’ might be more dangerous than an RL agent that by design controls some actuators, because problems with the latter might be visible to us at a much lower scale of training (and therefore with much less capable/intelligent systems).