Establish a censorship scheme that could reliably censor all knowledge that the agent has.
(This might be somewhat tricky, possible approaches:
If the agent has a fixed input channel, censor the agent from reliably predicting anything about the state of the input channel, present past or future.
If the agent has a fixed output channel, censor the output of the agent from being distinguishable from a set of randomly generated bits (the censor network is a discriminator that tries to tell the difference between the two, and propagates the censor gradient to the agent)
But censor the censor network from producing output containing knowledge about any information relevant to the whitelisted domain.
The agent should then not be censored from knowing about anything related to the whitelisted domain.
This will run into issues about the scope implied by the whitelisted domain data set (certain datasets might imply too small or too large of a domain being relevant, and this might be tricky to know in advance).
A way to achieve whitelisting might be:
Establish a censorship scheme that could reliably censor all knowledge that the agent has. (This might be somewhat tricky, possible approaches:
If the agent has a fixed input channel, censor the agent from reliably predicting anything about the state of the input channel, present past or future.
If the agent has a fixed output channel, censor the output of the agent from being distinguishable from a set of randomly generated bits (the censor network is a discriminator that tries to tell the difference between the two, and propagates the censor gradient to the agent)
But censor the censor network from producing output containing knowledge about any information relevant to the whitelisted domain.
The agent should then not be censored from knowing about anything related to the whitelisted domain.
This will run into issues about the scope implied by the whitelisted domain data set (certain datasets might imply too small or too large of a domain being relevant, and this might be tricky to know in advance).