I don’t particularly object to that framing, it’s just a huge gap from “Rewards have unpredictable effects on agent’s cognition, not necessarily to cause them to want reward” to “we have a way to use RL to interpret and implement human wishes.”
it’s just a huge gap from “Rewards have unpredictable effects on agent’s cognition, not necessarily to cause them to want reward” to “we have a way to use RL to interpret and implement human wishes.”
So, OP said
In general, we have no way to use RL to actually interpret and implement human wishes, rather than to optimize some concrete and easily-calculated reward signal.
I read into this a connotation of “In general, there isn’t a practically-findable way to use RL...”. I’m now leaning towards my original interpretation being wrong—that you meant something more like “we don’t know how to use RL to actually interpret and implement human wishes” (which I agree with).
I don’t particularly object to that framing, it’s just a huge gap from “Rewards have unpredictable effects on agent’s cognition, not necessarily to cause them to want reward” to “we have a way to use RL to interpret and implement human wishes.”
So, OP said
I read into this a connotation of “In general, there isn’t a practically-findable way to use RL...”. I’m now leaning towards my original interpretation being wrong—that you meant something more like “we don’t know how to use RL to actually interpret and implement human wishes” (which I agree with).