I suppose there is a continuum of how much insight the human has into what the agent is doing. Squeezing all your evaluation into one simple reward function would be on one end of the spectrum (and particularly susceptible to unintended behaviors), and then watching a 2d projection of a 3d action would be further along the spectrum (but not all the way to full insight), and then you can imagine setups with much more insight than that.
I suppose there is a continuum of how much insight the human has into what the agent is doing. Squeezing all your evaluation into one simple reward function would be on one end of the spectrum (and particularly susceptible to unintended behaviors), and then watching a 2d projection of a 3d action would be further along the spectrum (but not all the way to full insight), and then you can imagine setups with much more insight than that.