You don’t see the fundamental conflict between wanting to have an accurate world map and also wanting the map to have certain other properties, that the other parts of the Accurite scene represents?
Yes, but really, an agent wants the world itself, not its map, to have certain properties, and it wants an accurate map as instrumental value to guide its actions to produce those properties in the world. A reflective agent has a model of its own map making and decision making process, and can direct its information gathering and processing resources to concentrate on improving the accuracy in the areas of the map that are most likely to help it achieve its goals. It does not have some subagent obsessed with maximizing generic accuracy, instead it compares the expected utility resulting from it increased effectiveness resulting from various efforts to improve accuracy in different parts of the map to each other and other actions that can increase utility and chooses the best one.
Yes, but really, an agent wants the world itself, not its map, to have certain properties, and it wants an accurate map as instrumental value to guide its actions to produce those properties in the world.
But the map/model is the only way that the agent knows that world has those properties. If it alters the model it alters its perception of the world’s properties.
A reflective agent has a model of its own map making and decision making process, and can direct its information gathering and processing resources to concentrate on improving the accuracy in the areas of the map that are most likely to help it achieve its goals
I read “achieve its goals” as “lead to the map being updated to having shown the goal being achieved”, because it cannot know any better than its map whether its actions actually do achieve goals (brain in vat etc).
I think our disagreement comes down to the following: You think that an AI (based upon maximising model utility) will be a natural realist, I don’t see any reason why it will not fall into solipsism when allowed to alter its model.
Is there a toy program that we can play around with to alter our intuitions on this subject?
But the map/model is the only way that the agent knows that world has those properties.
The agent wants the world to have those properties, not for itself to know/perceive that the world has those properties.
I read “achieve its goals” as “lead to the map being updated to having shown the goal being achieved”
That is not what “achieve its goals” mean.
because it cannot know any better than its map whether its actions actually do achieve goals
Its map at the time it makes the decision can have information about the accuracy of the maps it would have if it makes different decisions. It is by using its current map that it can say that the high utility represented on its counterfactual future map is erroneous because the current map is more accurate and understands how the counterfactual future map would become innaccurate. Further, the current map predicts the future state of the universe given its decision, and makes its decisions based on its prediction of the entire universe and not just its own cognitive state.
I think our disagreement comes down to the following: You think that an AI (based upon maximising model utility) will be a natural realist, I don’t see any reason why it will not fall into solipsism when allowed to alter its model.
More precisely, I think it is possible to program a maximiser that is a realist, by not making the mistakes you describe.
Is there a toy program that we can play around with to alter our intuitions on this subject?
This is not about intuitions. It is about considering an agent whose high level behavior is made out of the low level behavior of precisely following instructions for how to make decisions, and reasoning about the results of using different instructions. If the agent is programed to maximize expected utility rather than expected perception of utility, it will do that.
I was hoping to make the discussion more concrete, We might be arguing about different types of systems,..
Talking mathematically, what is the domain of your utility function for the system you are suggesting? And does the function change over time, if so what governs the change?
We might be arguing about different types of systems
Well, yes, I think that type of system you are talking about is a particularly ineffective type of maximizer, and the problems it has are not general to maximizers.
Talking mathematically, what is the domain of your utility function for the system you are suggesting? And does the function change over time, if so what governs the change?
The utility function should be over possible states of the block universe, and it should only change when discoveries of how the universe works reveal that it is based on fundamental misconceptions.
You have a block world (as in eternalism)?) representation of the world that includes the AI system itself ( and the block world representation inside that system, and so forth?). My mind boggles at this a bit. How does it know what it will do before it makes the decision to do it? Formal proofs?
I suspect I need to see a formal (ish) spec of the system, so I can talk intelligently about how it might or might not fall into the pitfalls I see.
I may regret getting involved here, but I want to make sure I understand what you’re claiming.
Just to get specific… say I have a model, M1. Analyzing M1 causes me to predict that eating this thing in my hand, T1, will cause me pleasure. I eat T1 and experience disgust. I apply various heuristics I happen to have encoded to use under similar circumstances in order to modify my model, giving me M2. I pick up a new thing T2, and analyzing M2 causes me to predict eating T2 will cause me pleasure. I eat T2 and experience pleasure. I go on about my day using M2 rather than M1.
The way you’re using the terms, what parts of this example are the “map”, and what parts are the “territory”?
You are talking about a different sort of system than I am....
Maximising “pleasure” is somewhat different from maximising a high level concept such “paper-clips” . There is only generally one way to find out about “pleasure” and it is immediate. You don’t need to heavily process percepts into your model to figure out if you are “pleasured” or not.
So I think your point will miss the mark… But M1 and M2 are part of the maps, the facts that you think you there are such things as T1 and T2 and they have been picked up and eaten and have caused you pleasure are also part of the map. The direct perception of pleasure is not part of the map or the territory and if this is the domain of your utility function you should be okay from the type of problems I described.
Yes, but really, an agent wants the world itself, not its map, to have certain properties, and it wants an accurate map as instrumental value to guide its actions to produce those properties in the world. A reflective agent has a model of its own map making and decision making process, and can direct its information gathering and processing resources to concentrate on improving the accuracy in the areas of the map that are most likely to help it achieve its goals. It does not have some subagent obsessed with maximizing generic accuracy, instead it compares the expected utility resulting from it increased effectiveness resulting from various efforts to improve accuracy in different parts of the map to each other and other actions that can increase utility and chooses the best one.
But the map/model is the only way that the agent knows that world has those properties. If it alters the model it alters its perception of the world’s properties.
I read “achieve its goals” as “lead to the map being updated to having shown the goal being achieved”, because it cannot know any better than its map whether its actions actually do achieve goals (brain in vat etc).
I think our disagreement comes down to the following: You think that an AI (based upon maximising model utility) will be a natural realist, I don’t see any reason why it will not fall into solipsism when allowed to alter its model.
Is there a toy program that we can play around with to alter our intuitions on this subject?
The agent wants the world to have those properties, not for itself to know/perceive that the world has those properties.
That is not what “achieve its goals” mean.
Its map at the time it makes the decision can have information about the accuracy of the maps it would have if it makes different decisions. It is by using its current map that it can say that the high utility represented on its counterfactual future map is erroneous because the current map is more accurate and understands how the counterfactual future map would become innaccurate. Further, the current map predicts the future state of the universe given its decision, and makes its decisions based on its prediction of the entire universe and not just its own cognitive state.
More precisely, I think it is possible to program a maximiser that is a realist, by not making the mistakes you describe.
This is not about intuitions. It is about considering an agent whose high level behavior is made out of the low level behavior of precisely following instructions for how to make decisions, and reasoning about the results of using different instructions. If the agent is programed to maximize expected utility rather than expected perception of utility, it will do that.
I was hoping to make the discussion more concrete, We might be arguing about different types of systems,..
Talking mathematically, what is the domain of your utility function for the system you are suggesting? And does the function change over time, if so what governs the change?
Well, yes, I think that type of system you are talking about is a particularly ineffective type of maximizer, and the problems it has are not general to maximizers.
The utility function should be over possible states of the block universe, and it should only change when discoveries of how the universe works reveal that it is based on fundamental misconceptions.
You have a block world (as in eternalism)?) representation of the world that includes the AI system itself ( and the block world representation inside that system, and so forth?). My mind boggles at this a bit. How does it know what it will do before it makes the decision to do it? Formal proofs?
I suspect I need to see a formal (ish) spec of the system, so I can talk intelligently about how it might or might not fall into the pitfalls I see.
I may regret getting involved here, but I want to make sure I understand what you’re claiming.
Just to get specific… say I have a model, M1. Analyzing M1 causes me to predict that eating this thing in my hand, T1, will cause me pleasure. I eat T1 and experience disgust. I apply various heuristics I happen to have encoded to use under similar circumstances in order to modify my model, giving me M2. I pick up a new thing T2, and analyzing M2 causes me to predict eating T2 will cause me pleasure. I eat T2 and experience pleasure. I go on about my day using M2 rather than M1.
The way you’re using the terms, what parts of this example are the “map”, and what parts are the “territory”?
You are talking about a different sort of system than I am....
Maximising “pleasure” is somewhat different from maximising a high level concept such “paper-clips” . There is only generally one way to find out about “pleasure” and it is immediate. You don’t need to heavily process percepts into your model to figure out if you are “pleasured” or not.
So I think your point will miss the mark… But M1 and M2 are part of the maps, the facts that you think you there are such things as T1 and T2 and they have been picked up and eaten and have caused you pleasure are also part of the map. The direct perception of pleasure is not part of the map or the territory and if this is the domain of your utility function you should be okay from the type of problems I described.
Thanks for clarifying.