But that doesn’t detract from the main point: that simplicity, on its own, is not sufficient to resolve the issue.
It kind of does. You have shown that simplicity cannot distinguish (p, R) from (-p, -R), but you have not shown that simplicity cannot distinguish a physical person optimizing competently for a good outcome from a physical person optimizing nega-competently for a bad outcome.
If it seems unreasonable for there to be a difference, consider a similar map-territory distinction of a height map to a mountain. An optimization function that gradient descents on a height map is the same complexity, or nearabouts, as one that gradient ascents on the height map’s inverse. However, a system that physically gradient descents on the actual mountains can be much simpler than one that gradient ascents on the mountain’s inverse. Since negative mental experiences are somehow qualitatively different to positive ones, it would not surprise me much if they did in fact effect a similar asymmetry here.
Saying that an agent has a preference/reward R is an interpretation of that agent (similar to the “intentional stance” of seeing it as an agent, rather than a collection of atoms). And the (p,R) and (-p,-R) interpretations are (almost) equally complex.
One of us is missing what the other is saying. I’m honestly not sure what argument you are putting forth here.
I agree that preference/reward is an interpretation (the terms I used were map and territory). I agree that (p,R) and (-p,-R) are approximately equally complex. I do not agree that complexity is necessarily isomorphic between the map and the territory. This means although the model might be a strong analogy when talking about behaviour, it is sketchy to use it as a model for complexity of behaviour.
It kind of does. You have shown that simplicity cannot distinguish (p, R) from (-p, -R), but you have not shown that simplicity cannot distinguish a physical person optimizing competently for a good outcome from a physical person optimizing nega-competently for a bad outcome.
If it seems unreasonable for there to be a difference, consider a similar map-territory distinction of a height map to a mountain. An optimization function that gradient descents on a height map is the same complexity, or nearabouts, as one that gradient ascents on the height map’s inverse. However, a system that physically gradient descents on the actual mountains can be much simpler than one that gradient ascents on the mountain’s inverse. Since negative mental experiences are somehow qualitatively different to positive ones, it would not surprise me much if they did in fact effect a similar asymmetry here.
Saying that an agent has a preference/reward R is an interpretation of that agent (similar to the “intentional stance” of seeing it as an agent, rather than a collection of atoms). And the (p,R) and (-p,-R) interpretations are (almost) equally complex.
One of us is missing what the other is saying. I’m honestly not sure what argument you are putting forth here.
I agree that preference/reward is an interpretation (the terms I used were map and territory). I agree that (p,R) and (-p,-R) are approximately equally complex. I do not agree that complexity is necessarily isomorphic between the map and the territory. This means although the model might be a strong analogy when talking about behaviour, it is sketchy to use it as a model for complexity of behaviour.
I tried to answer in more detail here: https://www.lesswrong.com/posts/f5p7AiDkpkqCyBnBL/preferences-as-an-instinctive-stance (hope you didn’t mind; I used your comment as a starting point for a major point I wanted to clarify).
But I admit to being confused now, and not understanding what you mean. Preferences don’t exist in the territory, so I’m not following you, sorry! :-(