evhub comments on Exploring safe exploration

evhub 6 Jan 2020 23:56 UTC
LW: 6 AF: 3
AF
I definitely was not arguing that. I was arguing that safe exploration is currently defined in ML as the agent making an accidental mistake, and that we should really not be having terminology collisions with ML. (I may have left that second part implicit.)

Ah, I see—thanks for the correction. I changed “best” to “current.”

I assume that the difference you see is that you could try to make across-episode exploration less detrimental from the agent’s perspective

No, that’s not what I was saying. When I said “reward acquisition” I meant the actual reward function (that is, the base objective).

EDIT:

That being said, it’s a little bit tricky in some of these safe exploration setups to draw the line between what’s part of the base objective and what’s not. For example, I would generally include the constraints in constrained optimization setups as just being part of the base objective, only specified slightly differently. In that context, constrained optimization is less of a safe exploration technique and more of a reward-engineering-y/outer alignment sort of thing, though it also has a safe exploration component to the extent that it constrains across-episode exploration.

Note that when across-episode exploration is learned, the distinction between safe exploration and outer alignment becomes even more muddled, since then all the other terms in the loss will implicitly serve to check the across-episode exploration term, as the agent has to figure out how to trade off between them.^[1]
1. ↩︎
  This is another one of the points I was trying to make in “Safe exploration and corrigibility” but didn’t do a great job of conveying properly.
- Rohin Shah 7 Jan 2020 8:14 UTC
  LW: 2 AF: 2
  AF Parent
  No, that’s not what I was saying. When I said “reward acquisition” I meant the actual reward function (that is, the base objective).
  Wait, then how is “improving across-episode exploration” different from “preventing the agent from making an accidental mistake”? (What’s a situation that counts as one but not the other?)
  - evhub 7 Jan 2020 8:41 UTC
    LW: 2 AF: 1
    AF Parent
    Like I said in the post, I’m skeptical that “preventing the agent from making an accidental mistake” is actually a meaningful concept (or at least, it’s a concept with many possible conflicting definitions), so I’m not sure how to give an example of it.