Does this argue against instrumental convergence because it would paperclip itself?
If that is the best thing to do. (To maximize paperclips, in those circumstances.)
Can you elaborate?
Imagine you don’t know what your utility function is, but you’ve narrowed it down to two possibilities—A and B. Ignoring questions around optimizing and conflict between the two possibilities, from that position, if something is good for ‘achieving’* both A and B, then you should probably go for that thing.
*This also may be true if the thing is something valued by both A and B.
So in theory there’s an aspect of instrumental convergence in humans/agents that are uncertain about utility functions/etc. - “I don’t know what I want but this is good for more than one/a lot of/all of them.”
There’s also another aspect of instrumental convergence around things like power—things that seem more general. ‘A paperclip maximizer wanting to not die as a means to an end.’ (Though this is backwards of how we usually think of such things—we consider it unusual for a human to want to die. (Perhaps there’s a more general tendency to not talk about self-destructing agents much.))
So if an agent finds out for sure it’s utility function is B, that can eliminate ‘instrumental convergence’ from (utility function) uncertainty.
If an agent finds out all it can do in its environment (determinism, etc.) that can eliminate ‘instrumental convergence’ from (environmental/etc.) uncertainty.
For contrast, new information seems like it can also have the opposite effects:
You find out you value something you didn’t know you valued. (Perhaps after trying something new, or thinking.)
An agent finds out ‘more compute/more time’ improves its ability to do anything. (Perhaps by observing other agents achieve their goals better with more of those things.)**
**An agent need not understand ‘instrumental convergence’ to act on those behaviors. ‘Observing ‘X makes it easier to achieve my goals’ may easily imply that ‘I could use some more X’.′ But uncertainty about ‘what you want’/‘the best way to achieve it’ is also a reason to seek out things that improve ‘general future goal fulfillment ability’. (General across the space across the space of goals/methods you’re considering.)
If that is the best thing to do. (To maximize paperclips, in those circumstances.)
Imagine you don’t know what your utility function is, but you’ve narrowed it down to two possibilities—A and B. Ignoring questions around optimizing and conflict between the two possibilities, from that position, if something is good for ‘achieving’* both A and B, then you should probably go for that thing.
*This also may be true if the thing is something valued by both A and B.
So in theory there’s an aspect of instrumental convergence in humans/agents that are uncertain about utility functions/etc. - “I don’t know what I want but this is good for more than one/a lot of/all of them.”
There’s also another aspect of instrumental convergence around things like power—things that seem more general. ‘A paperclip maximizer wanting to not die as a means to an end.’ (Though this is backwards of how we usually think of such things—we consider it unusual for a human to want to die. (Perhaps there’s a more general tendency to not talk about self-destructing agents much.))
So if an agent finds out for sure it’s utility function is B, that can eliminate ‘instrumental convergence’ from (utility function) uncertainty.
If an agent finds out all it can do in its environment (determinism, etc.) that can eliminate ‘instrumental convergence’ from (environmental/etc.) uncertainty.
For contrast, new information seems like it can also have the opposite effects:
You find out you value something you didn’t know you valued. (Perhaps after trying something new, or thinking.)
An agent finds out ‘more compute/more time’ improves its ability to do anything. (Perhaps by observing other agents achieve their goals better with more of those things.)**
**An agent need not understand ‘instrumental convergence’ to act on those behaviors. ‘Observing ‘X makes it easier to achieve my goals’ may easily imply that ‘I could use some more X’.′ But uncertainty about ‘what you want’/‘the best way to achieve it’ is also a reason to seek out things that improve ‘general future goal fulfillment ability’. (General across the space across the space of goals/methods you’re considering.)