Yeah. I think a creature behaving just like me doesn’t necessarily have the exact same internal experiences. Across all possible creatures, there are degrees of freedom in internal experiences that aren’t captured by actions. Some of these might be value-relevant.
Yeah, in ML language, you’re describing the unidentifiability problem in inverse reinforcement learning—for any behavior, there are typically many reward functions for which that behavior is optimal.
Though another way this could be true is if “internal experience” depends on what algorithm you use to generate your behavior, and “optimize a learned reward” doesn’t meet the bar. (For example, I don’t think a giant lookup table that emulates my behavior is having the same experience that I am.)
Yeah. I think a creature behaving just like me doesn’t necessarily have the exact same internal experiences. Across all possible creatures, there are degrees of freedom in internal experiences that aren’t captured by actions. Some of these might be value-relevant.
Yeah, in ML language, you’re describing the unidentifiability problem in inverse reinforcement learning—for any behavior, there are typically many reward functions for which that behavior is optimal.
Though another way this could be true is if “internal experience” depends on what algorithm you use to generate your behavior, and “optimize a learned reward” doesn’t meet the bar. (For example, I don’t think a giant lookup table that emulates my behavior is having the same experience that I am.)