I noticed that I have two distinct “mental pictures” for what the overseer is, depending on how the Distill procedure works (i.e. depending on the narrow technique used in the Distill procedure).
For imitation learning and narrow inverse reinforcement learning: a “passive” overseer that just gets used as a template/target for imitation.
For narrow reinforcement learning and in discussions about approval-directed agents: an “active” overseer that rates actions or provides rewards.
I wonder if this way of thinking about the overseer is okay/correct, or if I’m missing something (e.g. maybe even in case (1), the overseer has a more active role than I can make out). Assuming this way of thinking about the overseer is okay, it seems like for case (1), the term “overseer” has connotations that extend beyond the role played by the overseer (i.e. it doesn’t really provide any oversight since it is passive).
I noticed that I have two distinct “mental pictures” for what the overseer is, depending on how the Distill procedure works (i.e. depending on the narrow technique used in the Distill procedure).
For imitation learning and narrow inverse reinforcement learning: a “passive” overseer that just gets used as a template/target for imitation.
For narrow reinforcement learning and in discussions about approval-directed agents: an “active” overseer that rates actions or provides rewards.
I wonder if this way of thinking about the overseer is okay/correct, or if I’m missing something (e.g. maybe even in case (1), the overseer has a more active role than I can make out). Assuming this way of thinking about the overseer is okay, it seems like for case (1), the term “overseer” has connotations that extend beyond the role played by the overseer (i.e. it doesn’t really provide any oversight since it is passive).