So: is it possible to formulate an instrumental version of Occam? Can we justify a simplicity bias in our policies?
Justification has the downside of being wrong, a) if what you are arguing is wrong/false, b) can be wrong even if what you are arguing is right/true. That being said/Epistemic warnings concluded...
1. A more complicated policy:
Is harder to keep track of
Is harder to calculate*
2. We don’t have the right policy.
Simpler goals, that pay out in terms that are useful even if plans change are preferred as a consequence of this uncertainty.
This is an argument against instrumental convergence—in a deterministic environment that is completely understood. Under these conditions, the ‘paperclip maximizer AI’ knows that yes, it does want paperclips more than anything else, and has no issue turning itself into paperclips.
3. A perfect model has a cost, and is an investment.*
What is the best way from A to B? 1. This may depend on conditions (weather, traffic, etc.). 2. We don’t care—so we use navigation systems. (It may be argued that ‘getting stronger’ is about removing magic/black boxes and replacing them with yourself, or that ‘getting stronger in a domain’ is about this.)
*‘Expected utility calculations’ may be done in several ways, including:
What is the optimal action?
What is the optimal series of actions?
One criticism of AIXI/related methods is that it is uncomputable. Another is that, even if you have the right model
4. The right model doesn’t necessarily tell you what to do.
Is/ought aside, calculating the optimal move in Chess or Go may be difficult, and not the optimal move within the larger framework you are working in.
Speculation:
5. More complicated policies are more difficult to update, and
This is an argument against instrumental convergence—in a deterministic environment that is completely understood. Under these conditions, the ‘paperclip maximizer AI’ knows that yes, it does want paperclips more than anything else, and has no issue turning itself into paperclips.
Can you elaborate? Does this argue against instrumental convergence because it would paperclip itself?
Does this argue against instrumental convergence because it would paperclip itself?
If that is the best thing to do. (To maximize paperclips, in those circumstances.)
Can you elaborate?
Imagine you don’t know what your utility function is, but you’ve narrowed it down to two possibilities—A and B. Ignoring questions around optimizing and conflict between the two possibilities, from that position, if something is good for ‘achieving’* both A and B, then you should probably go for that thing.
*This also may be true if the thing is something valued by both A and B.
So in theory there’s an aspect of instrumental convergence in humans/agents that are uncertain about utility functions/etc. - “I don’t know what I want but this is good for more than one/a lot of/all of them.”
There’s also another aspect of instrumental convergence around things like power—things that seem more general. ‘A paperclip maximizer wanting to not die as a means to an end.’ (Though this is backwards of how we usually think of such things—we consider it unusual for a human to want to die. (Perhaps there’s a more general tendency to not talk about self-destructing agents much.))
So if an agent finds out for sure it’s utility function is B, that can eliminate ‘instrumental convergence’ from (utility function) uncertainty.
If an agent finds out all it can do in its environment (determinism, etc.) that can eliminate ‘instrumental convergence’ from (environmental/etc.) uncertainty.
For contrast, new information seems like it can also have the opposite effects:
You find out you value something you didn’t know you valued. (Perhaps after trying something new, or thinking.)
An agent finds out ‘more compute/more time’ improves its ability to do anything. (Perhaps by observing other agents achieve their goals better with more of those things.)**
**An agent need not understand ‘instrumental convergence’ to act on those behaviors. ‘Observing ‘X makes it easier to achieve my goals’ may easily imply that ‘I could use some more X’.′ But uncertainty about ‘what you want’/‘the best way to achieve it’ is also a reason to seek out things that improve ‘general future goal fulfillment ability’. (General across the space across the space of goals/methods you’re considering.)
Justification has the downside of being wrong, a) if what you are arguing is wrong/false, b) can be wrong even if what you are arguing is right/true. That being said/Epistemic warnings concluded...
1. A more complicated policy:
Is harder to keep track of
Is harder to calculate*
2. We don’t have the right policy.
Simpler goals, that pay out in terms that are useful even if plans change are preferred as a consequence of this uncertainty.
This is an argument against instrumental convergence—in a deterministic environment that is completely understood. Under these conditions, the ‘paperclip maximizer AI’ knows that yes, it does want paperclips more than anything else, and has no issue turning itself into paperclips.
3. A perfect model has a cost, and is an investment.*
What is the best way from A to B? 1. This may depend on conditions (weather, traffic, etc.). 2. We don’t care—so we use navigation systems. (It may be argued that ‘getting stronger’ is about removing magic/black boxes and replacing them with yourself, or that ‘getting stronger in a domain’ is about this.)
*‘Expected utility calculations’ may be done in several ways, including:
What is the optimal action?
What is the optimal series of actions?
One criticism of AIXI/related methods is that it is uncomputable. Another is that, even if you have the right model
4. The right model doesn’t necessarily tell you what to do.
Is/ought aside, calculating the optimal move in Chess or Go may be difficult, and not the optimal move within the larger framework you are working in.
Speculation:
5. More complicated policies are more difficult to update, and
6. more difficult to use.
Can you elaborate? Does this argue against instrumental convergence because it would paperclip itself?
If that is the best thing to do. (To maximize paperclips, in those circumstances.)
Imagine you don’t know what your utility function is, but you’ve narrowed it down to two possibilities—A and B. Ignoring questions around optimizing and conflict between the two possibilities, from that position, if something is good for ‘achieving’* both A and B, then you should probably go for that thing.
*This also may be true if the thing is something valued by both A and B.
So in theory there’s an aspect of instrumental convergence in humans/agents that are uncertain about utility functions/etc. - “I don’t know what I want but this is good for more than one/a lot of/all of them.”
There’s also another aspect of instrumental convergence around things like power—things that seem more general. ‘A paperclip maximizer wanting to not die as a means to an end.’ (Though this is backwards of how we usually think of such things—we consider it unusual for a human to want to die. (Perhaps there’s a more general tendency to not talk about self-destructing agents much.))
So if an agent finds out for sure it’s utility function is B, that can eliminate ‘instrumental convergence’ from (utility function) uncertainty.
If an agent finds out all it can do in its environment (determinism, etc.) that can eliminate ‘instrumental convergence’ from (environmental/etc.) uncertainty.
For contrast, new information seems like it can also have the opposite effects:
You find out you value something you didn’t know you valued. (Perhaps after trying something new, or thinking.)
An agent finds out ‘more compute/more time’ improves its ability to do anything. (Perhaps by observing other agents achieve their goals better with more of those things.)**
**An agent need not understand ‘instrumental convergence’ to act on those behaviors. ‘Observing ‘X makes it easier to achieve my goals’ may easily imply that ‘I could use some more X’.′ But uncertainty about ‘what you want’/‘the best way to achieve it’ is also a reason to seek out things that improve ‘general future goal fulfillment ability’. (General across the space across the space of goals/methods you’re considering.)