But the map/model is the only way that the agent knows that world has those properties.
The agent wants the world to have those properties, not for itself to know/perceive that the world has those properties.
I read “achieve its goals” as “lead to the map being updated to having shown the goal being achieved”
That is not what “achieve its goals” mean.
because it cannot know any better than its map whether its actions actually do achieve goals
Its map at the time it makes the decision can have information about the accuracy of the maps it would have if it makes different decisions. It is by using its current map that it can say that the high utility represented on its counterfactual future map is erroneous because the current map is more accurate and understands how the counterfactual future map would become innaccurate. Further, the current map predicts the future state of the universe given its decision, and makes its decisions based on its prediction of the entire universe and not just its own cognitive state.
I think our disagreement comes down to the following: You think that an AI (based upon maximising model utility) will be a natural realist, I don’t see any reason why it will not fall into solipsism when allowed to alter its model.
More precisely, I think it is possible to program a maximiser that is a realist, by not making the mistakes you describe.
Is there a toy program that we can play around with to alter our intuitions on this subject?
This is not about intuitions. It is about considering an agent whose high level behavior is made out of the low level behavior of precisely following instructions for how to make decisions, and reasoning about the results of using different instructions. If the agent is programed to maximize expected utility rather than expected perception of utility, it will do that.
I was hoping to make the discussion more concrete, We might be arguing about different types of systems,..
Talking mathematically, what is the domain of your utility function for the system you are suggesting? And does the function change over time, if so what governs the change?
We might be arguing about different types of systems
Well, yes, I think that type of system you are talking about is a particularly ineffective type of maximizer, and the problems it has are not general to maximizers.
Talking mathematically, what is the domain of your utility function for the system you are suggesting? And does the function change over time, if so what governs the change?
The utility function should be over possible states of the block universe, and it should only change when discoveries of how the universe works reveal that it is based on fundamental misconceptions.
You have a block world (as in eternalism)?) representation of the world that includes the AI system itself ( and the block world representation inside that system, and so forth?). My mind boggles at this a bit. How does it know what it will do before it makes the decision to do it? Formal proofs?
I suspect I need to see a formal (ish) spec of the system, so I can talk intelligently about how it might or might not fall into the pitfalls I see.
The agent wants the world to have those properties, not for itself to know/perceive that the world has those properties.
That is not what “achieve its goals” mean.
Its map at the time it makes the decision can have information about the accuracy of the maps it would have if it makes different decisions. It is by using its current map that it can say that the high utility represented on its counterfactual future map is erroneous because the current map is more accurate and understands how the counterfactual future map would become innaccurate. Further, the current map predicts the future state of the universe given its decision, and makes its decisions based on its prediction of the entire universe and not just its own cognitive state.
More precisely, I think it is possible to program a maximiser that is a realist, by not making the mistakes you describe.
This is not about intuitions. It is about considering an agent whose high level behavior is made out of the low level behavior of precisely following instructions for how to make decisions, and reasoning about the results of using different instructions. If the agent is programed to maximize expected utility rather than expected perception of utility, it will do that.
I was hoping to make the discussion more concrete, We might be arguing about different types of systems,..
Talking mathematically, what is the domain of your utility function for the system you are suggesting? And does the function change over time, if so what governs the change?
Well, yes, I think that type of system you are talking about is a particularly ineffective type of maximizer, and the problems it has are not general to maximizers.
The utility function should be over possible states of the block universe, and it should only change when discoveries of how the universe works reveal that it is based on fundamental misconceptions.
You have a block world (as in eternalism)?) representation of the world that includes the AI system itself ( and the block world representation inside that system, and so forth?). My mind boggles at this a bit. How does it know what it will do before it makes the decision to do it? Formal proofs?
I suspect I need to see a formal (ish) spec of the system, so I can talk intelligently about how it might or might not fall into the pitfalls I see.