Is this just a case of the utility function not being up for grabs? muflax can’t explain to me why wireheading counts as a win, and I can’t explain to muflax why wireheading doesn’t count as a win for me. At least, not using the language of rationality.
It might be interesting to get a neurological or evo-psych explanation for why non-wireheaders exist. But I don’t think this is what’s being asked here.
Is this just a case of the utility function not being up for grabs?
Well, ultimately it might be, but it really weirds me out. We’re all running on essentially the same hardware. I don’t think either those that find wireheading intuitive or those that don’t are that non-neurotypical. I would expect that wireheading is right for either all or no humans and any other result needs a really good explanation.
It might be interesting to get a neurological or evo-psych explanation for why non-wireheaders exist. But I don’t think this is what’s being asked here.
I’m not explicitly asking it, but I would be very interested in why it seems like there are two different kinds of minds, yes.
This is just my opinion, not particularly evidence-based: I don’t think that there are two different kinds of mind, or if there are it’s not this issue that separates them. The wireheading scenario is one which is very alien to our ancestral environment so we may not have an “instinctive” preference for or against it. Rather, we have to extrapolate that preference from other things.
Two heuristics which might be relevant:
where “wanting” and “liking” conflict, it feels like “wanting” is broken (i.e. we’re making ourselves do things we don’t enjoy). So given the opportunity we might want to update what we “want”. This is pro-wireheading.
where we feel we are being manipulated, we want to fight that manipulation in case it’s against our own interests. Thinking about brain probes is a sort of manipulation-superstimulus, so this heuristic would be anti-wireheading.
I can very well believe that wireheading correlates with personality type, which is a weak form of your “two different minds” hypothesis.
Sorry for the ultra-speculative nature of this post.
Makes sense in terms of explaining the different intuition, yes, and is essentially how I think about it.
The second heuristic about manipulation, then, seems useful in practice (more agents will try to exploit us than satisfy us), but isn’t it much weaker, considering the actual wireheading scenario? The first heuristic actually addresses the conflict (although maybe the wrong way), but the second just ignores it.
I agree; the second heuristic doesn’t apply particularly well to this scenario. Some terminal values seem to come from a part of the brain which isn’t open to introspection, so I’d expect them to arise as a result of evolutionary kludges and random cultural influences rather than necessarily making any logical sense.
The thing is, once we have a value system that’s reasonably stable (i.e. what we want is the same as what we want to want) then we don’t want to change our preferences even if we can’t explain where they arise from.
I would expect that wireheading is right for either all or no humans and any other result needs a really good explanation.
As you know, we’ve already seen this statement, with “wireheading”, with “increased complexity”, etc.
Until we get a definition of meta-value and their general axiological treatment, people will always be baffled that others have different meta-values then theirs.
Is this just a case of the utility function not being up for grabs? muflax can’t explain to me why wireheading counts as a win, and I can’t explain to muflax why wireheading doesn’t count as a win for me. At least, not using the language of rationality.
It might be interesting to get a neurological or evo-psych explanation for why non-wireheaders exist. But I don’t think this is what’s being asked here.
Well, ultimately it might be, but it really weirds me out. We’re all running on essentially the same hardware. I don’t think either those that find wireheading intuitive or those that don’t are that non-neurotypical. I would expect that wireheading is right for either all or no humans and any other result needs a really good explanation.
I’m not explicitly asking it, but I would be very interested in why it seems like there are two different kinds of minds, yes.
This is just my opinion, not particularly evidence-based: I don’t think that there are two different kinds of mind, or if there are it’s not this issue that separates them. The wireheading scenario is one which is very alien to our ancestral environment so we may not have an “instinctive” preference for or against it. Rather, we have to extrapolate that preference from other things.
Two heuristics which might be relevant:
where “wanting” and “liking” conflict, it feels like “wanting” is broken (i.e. we’re making ourselves do things we don’t enjoy). So given the opportunity we might want to update what we “want”. This is pro-wireheading.
where we feel we are being manipulated, we want to fight that manipulation in case it’s against our own interests. Thinking about brain probes is a sort of manipulation-superstimulus, so this heuristic would be anti-wireheading.
I can very well believe that wireheading correlates with personality type, which is a weak form of your “two different minds” hypothesis.
Sorry for the ultra-speculative nature of this post.
Makes sense in terms of explaining the different intuition, yes, and is essentially how I think about it.
The second heuristic about manipulation, then, seems useful in practice (more agents will try to exploit us than satisfy us), but isn’t it much weaker, considering the actual wireheading scenario? The first heuristic actually addresses the conflict (although maybe the wrong way), but the second just ignores it.
I agree; the second heuristic doesn’t apply particularly well to this scenario. Some terminal values seem to come from a part of the brain which isn’t open to introspection, so I’d expect them to arise as a result of evolutionary kludges and random cultural influences rather than necessarily making any logical sense.
The thing is, once we have a value system that’s reasonably stable (i.e. what we want is the same as what we want to want) then we don’t want to change our preferences even if we can’t explain where they arise from.
As you know, we’ve already seen this statement, with “wireheading”, with “increased complexity”, etc.
Until we get a definition of meta-value and their general axiological treatment, people will always be baffled that others have different meta-values then theirs.