Think about a paper-clip maximiser (people tend get silly about morality, and a lot less silly about paper-clips so its a useful thought experiment for meta-ethics in general). Its a simple design, it lists all the courses of action it could take, computes the expected_paper-clips given each one using its model of the world, and then takes the one that gives the largest result. It isn’t interested in the question of why paper-clips are valuable, it just produces them.
So, does it value paper-clips, or does it just value expected paper-clips?
Consider how it reacts to the option “update your current model of the world to set Expected paper-clips = BB(1000)”. This will appear on its list of possible actions, so what is its value?
(expected paperclips | “update your current model of the world to set Expected paper-clips = BB(1000)”)
The answer is a lot less than BB(1000). Its current model of the world states that updating its model does not actually change reality (except insofar as the model is part of reality). Thus it does not predict that this action will result in the creation of any new paper-clips, so its expected paper-clips is roughly equal to the number of paper-clips that get produced anyway.
Expected expected paper-clips given this action is very large, but the paper-clipper doesn’t give a rat’s arse about that.
Hopefully, I have convinced you that that there is a difference between caring about some aspect of the world and using your internal model to predict that aspect, versus caring about your internal model. Furthermore, in the space of all possible minds the vast majority are in the first category, since an agent’s own mind is generally only a tiny portion of the world, so if humans value both then it is the internal part that makes us unusual.
I can’t make you value something any more than I can make a rock value it, the best I can do is convince you that you are allowed to value non-wireheading, and if you don’t feel like you want it then it is privileging the hypothesis to even consider the possibility that you do.
The question is if humans, unlike paperclip maximizer’s, are actually more concerned with maximizing their reward number irregardless of how it is being increased.
If there is a way for humans to assign utility non-arbitrarily, then we are able to apply rational choice to our values, i.e. look for values that are better at yielding utility. If humans measure utility in a unit of bodily sensations, then we can ask what would most effectively yield the greatest amount of bodily sensations. Here wireheading seems to be more efficient than any other way to maximize bodily sensations, i.e. utility.
There even is some evidence for this, e.g. humans enjoy fiction. Humans treat their model of reality as part of reality. If you can change the model, you can change reality.
I don’t agree with all that though, because I think that humans either are not utility maximizer’s or assign utility arbitrarily.
It seems to me that I value both my internal world and the external world. I enjoy fiction, but the prospect of spending the rest of my life with nothing else fails to thrill me.
A lot of people express scepticism of this claim, usually acting as if there is a great burden of proof required to show the external part is even possible. My point is that the external part is both possible and unsurprising.
So my argument against wire heading goes; I don’t feel like I want to be a wirehead, the vast majority of minds in general don’t want to become wireheads, low prior + no evidence = “why has this even been promoted to my attention?”
So, does it value paper-clips, or does it just value expected paper-clips?
Consider how it reacts to the option “update your current model of the world to set Expected paper-clips = BB(1000)”. This will appear on its list of possible actions, so what is its value?
That depends on the exact implementation. The paperclipper might be purely feedback-driven, essentially a paperclip-thermostat. In that case, it will simulate setting its internal variables to BB(1000), that will create huge positive feedback and it happily wireheads itself. Or it might simulate the state of the world, count the paperclips and then rate it, in which case it won’t wirehead itself.
The second option is much more complex and expensive. What makes you think humans are like that?
I agree with you that there are non-wireheading agents in principle. I just don’t see any evidence to conclude humans are like that.
That depends on the exact implementation. The paperclipper might be purely feedback-driven, essentially a paperclip-thermostat. In that case, it will simulate setting its internal variables to BB(1000), that will create huge positive feedback and it happily wireheads itself. Or it might simulate the state of the world, count the paperclips and then rate it, in which case it won’t wirehead itself.
The former is incredibly stupid, an agent that consistently gets its imagination confused with reality and cannot, even in principle, separate them would be utterly incapable of abstract thought.
‘Expected Paper-clips’ is completely different to paper-clips. If an agent can’t tell the difference between them it may as well not be able to tell houses from dogs. The fact that I can even understand the difference suggests that I am not that stupid.
I just don’t see any evidence to conclude humans are like that.
Really? You can’t see any Bayesian evidence at all!
How about the fact that I claim not to want to wire head? My beliefs about my desires are surely correlated with my desires. How about all the other people who agree with me, including a lot of commenters on this site and most of humanity in general? Are our beliefs so astonishingly inaccurate that we are not even a tiny bit more likely to be right than wrong?
What about the many cases of people strongly wanting things that did not make them happy and acting on those desires, or vice versa?
You are privileging the hypothesis. Your view has a low prior (most of the matter in the universe is not part of my mind, so given that I might care about anything it is not very likely that I will care about one specific lump of meat?). You don’t present any evidence of your own, and yet you demand that I present mine.
The former is incredibly stupid, an agent that consistently gets its imagination confused with reality and cannot, even in principle, separate them would be utterly incapable of abstract thought.
Welcome to evolution. Have you looked at humanity lately?
(Ok, enough snide remarks. I do agree that this is fairly stupid design, but it would still work in many cases. The fact that it can’t handle advanced neuroscience is unfortunate, but it worked really well in the Savannah.)
How about the fact that I claim not to want to wire head? My beliefs about my desires are surely correlated with my desires.
(I strongly disagree that “most of humanity” is against wireheading. The only evidence for that are very flawed intuition pumps that can easily be reversed.)
However, I do take your disagreement (and that of others here) seriously. It is a major reason why I don’t just go endorse wireheading and why I wrote the post in the first place. Believe me, I’m listening. I’m sorry if I made the impression that I just discard your opinion as confused.
You are privileging the hypothesis. Your view has a low prior (most of the matter in the universe is not part of my mind, so given that I might care about anything it is not very likely that I will care about one specific lump of meat?).
It would have a low prior if human minds were pulled out of mind space at random. They aren’t. We do know that they are reinforcement-based and we have good evolutionary pathways how complex minds based on that would be created. Reinforcement-based minds, however, are exactly like the first kind of mind I described and, it seems to me, should always wirehead if they can.
As such, assuming no more, we should have no problem with wireheading. The fact that we do needs to be explained. Assuming there’s an additional complex utility calculation would answer the question, but that’s a fairly expensive hypothesis, which is why I asked for evidence. On the other hand, assuming (unconscious) signaling, mistaken introspection and so on relies only on mechanisms we already know exist and equally works, but favors wireheading.
Economic models that do assume complex calculations like that, if I understand it correctly, work badly, while simpler models (PCT, behavioral economics in general) work much better.
You don’t present any evidence of your own, and yet you demand that I present mine.
You are correct that I have not presented any evidence in favor of wireheading. I’m not endorsing wireheading and even though I think there are good arguments for it, I deliberately left them out. I’m not interested in “my pet theory about values is better than your pet theory and I’m gonna convince you of that”. Looking at models of human behavior and inferred values, however, wireheading seems like a fairly obvious choice. The fact that you (and others) disagree makes me think I’m missing something.
The fact that it can’t handle advanced neuroscience is unfortunate, but it worked really well in the Savannah.
What do you mean it can’t handle advanced neuroscience? Who do you think invented neuroscience!
One of the points I was trying to make was that humans can, in principle, separate out the two concepts, if they couldn’t then we wouldn’t even be having this conversation.
Since we can separate these concepts, it seems like our final reflective equilibrium, whatever that looks like is perfectly capable of treating them differently. I think that wire-heading is a mistake that arose from the earlier mistake of failing to preserve use-mention distinction. Defending one mistake once we have already overcome its source is like trying to defend the content of Leviticus after admitting that God doesn’t exist.
I’m sorry if I made the impression that I just discard your opinion as confused.
I didn’t actually think you were ignoring my opinion, I was just using a little bit of hyperbole, because people saying “I see no evidence” when there clearly is some evidence is a pet peeve of mine.
On the other hand, assuming (unconscious) signalling
This point interests me. Lets look a little deeper into this signalling hypothesis. Am I correct that you are claiming that while my concious mind utters sentences like “I don’t want to be a wire-head” subconsciously I actually do want to be a wire-head?
If this is the case, then the situation we have is two separate mental agents with conflicting preferences, you appear to be siding with Subconscious!Ben rather than Conscious!Ben on the grounds that he is the ‘real Ben’.
But in what sense is he more real, both of them exist as shown by their causal effect on the world? I may be biased on this issue but I would suggest you side with Conscious!Ben, he is the one with Qualia after all.
Do you, in all honesty, want to be wire-headed? For the moment I’m not asking what you think you should want, what you want to want or what you think you would want in reflective equilibrium, just what you actually want. Does the prospect of being reduced to orgasmium, if you were offered it right now, seem more desirable than the prospect of a complicated universe filled with diverse being pursuing interesting goals and having fun?
What do you mean it can’t handle advanced neuroscience? Who do you think invented neuroscience!
Not that I wanna beat a dead horse here, but it took us ages. We can’t even do basic arithmetic right without tons of tools. I’m always astonished to read history books and see how many really fundamental things weren’t discovered for hundreds, if not thousands of years. So I’m fairly underwhelmed by the intellectual capacities of humans. But I see your point.
Since we can separate these concepts, it seems like our final reflective equilibrium, whatever that looks like is perfectly capable of treating them differently.
Capable, sure. That seems like an overly general argument. The ability to distinguish things doesn’t mean the distinction appears in the supposed utility function. I can tell apart hundreds of monospace fonts (don’t ask), I don’t expect monospace fonts to appear in my actual utility function as terminal values. I’m not sure how this helps either way.
Am I correct that you are claiming that while my conscious mind utters sentences like “I don’t want to be a wire-head” subconsciously I actually do want to be a wire-head?
Not exactly like this. I don’t think the unconscious part of the brain is conspiring against the conscious one.
I don’t think it’s useful to clearly separate “conscious” and “unconscious” into two distinct agents. They are the same agent, only with conscious awareness shifting around, metaphorically like handing around a microphone in a crowd such that only one part can make itself heard for a while and then has to resort to affecting only its direct neighbors or screaming really loud.
I don’t think there’s a direct conflict between agents here. Rather, the (current) conscious part encounters intentions and reactions it doesn’t understand, doesn’t know the origin or history of, and then tries to make sense of them, so it often starts confabulating. This is most easily seen in split-brain patients.
I can clearly observe this by watching my own intentions and my reactions to them moment-to-moment. Intentions come out of nowhere, then directly afterwards (if I investigate) a reason is made up why I wanted this all along. Sometimes, this reason might be correct, but it’s clearly a later interpolation. That’s why I generally tend to ignore any verbal reasons for actions.
So maybe hypocrisy is a bit of an misleading term here. I’d say that there are many agents that don’t always have privileged access (and aren’t always conscious), so that they get somewhat ignored, which screws up complex decision making, which causes akrasia. Like, “I’m not getting my needs fulfilled and can’t change that myself right now, so I’m going to veto everything!”. On the other hand, the conscious part is now stuck with actions that don’t make sense, so it makes up a story. It signals “oh, I would’ve studied all day, but I somehow couldn’t get myself to stop watching cat videos, even though I hated it”. Really, it just avoided pain of boredom when studying and needed instant gratification. But “akrasia” is a much nicer cover story.
I’m not saying this is perfectly correct or the whole picture, but I think assuming models like this fits my own experiences closer than assuming actual conflicting agents. Also, those unconscious parts, I suspect, are too simple to actually understand wireheading. They want rewards. If they were smart enough, they might see that wireheading is a good solution.
On a somewhat related note, Susan Blackmore often makes the point when talking about free will that she doesn’t have any and doesn’t even have the illusion of free will anymore, but it doesn’t interfere with her actual behavior. Example quote from Conversations On Consciousness (she talks more about this in several radio shows I can’t find right now):
Susan Greenfield: “[Searle] said that when he goes into a restaurant and orders a hamburger, he doesn’t say, ‘Well, I’m a determinist, I wonder what my genes are going to order.’”
Susan Blackmore: “I do. You’re right that Searle doesn’t do that, but when I go in a restaurant, I think, ‘Ooh, how interesting, here’s a menu, I wonder what she’ll choose’; so it is possible to do that.”
I’m totally like Blackmore here. I have no idea what I’ll choose tomorrow or even in ten minutes, only that it will be according to rewards, aversion and so on. Not even considering counterfactuals in my decision making (and not making up verbal reasons anymore) hasn’t crippled me in any way, as far as I can tell.
That makes me skeptical that there’s really all that complex a machinery behind all this, and it makes insistence on “but I really value this complex, external thing!” so puzzling.
Also, I don’t think that qualia are a useful concept ever. Let’s not drag any dualism into this by accident. Besides, what makes you think that “what you call qualia” is something your unconscious processes don’t have, right now? What makes you think you have exactly one conscious mind in your skull?
Do you, in all honesty, want to be wire-headed? For the moment I’m not asking what you think you should want, what you want to want or what you think you would want in reflective equilibrium, just what you actually want. Does the prospect of being reduced to orgasmium, if you were offered it right now, seem more desirable than the prospect of a complicated universe filled with diverse being pursuing interesting goals and having fun?
I don’t have an opinion on that, deliberately. I find wireheading very attractive and it seems about equally nice as the complicated universe, but much easier and more of an elegant solution. The halo effect is way too powerful here and I don’t wanna screw myself over just because I didn’t see a fundamental flaw over how pretty the solution was.
(Of course, as per the nature of wireheading, even if I thought it were a good idea, I would spend no effort on convincing anyone of it. What for, because I value them? Then what am I wireheading myself for?)
Think about a paper-clip maximiser (people tend get silly about morality, and a lot less silly about paper-clips so its a useful thought experiment for meta-ethics in general). Its a simple design, it lists all the courses of action it could take, computes the expected_paper-clips given each one using its model of the world, and then takes the one that gives the largest result. It isn’t interested in the question of why paper-clips are valuable, it just produces them.
So, does it value paper-clips, or does it just value expected paper-clips?
Consider how it reacts to the option “update your current model of the world to set Expected paper-clips = BB(1000)”. This will appear on its list of possible actions, so what is its value?
(expected paperclips | “update your current model of the world to set Expected paper-clips = BB(1000)”)
The answer is a lot less than BB(1000). Its current model of the world states that updating its model does not actually change reality (except insofar as the model is part of reality). Thus it does not predict that this action will result in the creation of any new paper-clips, so its expected paper-clips is roughly equal to the number of paper-clips that get produced anyway.
Expected expected paper-clips given this action is very large, but the paper-clipper doesn’t give a rat’s arse about that.
Hopefully, I have convinced you that that there is a difference between caring about some aspect of the world and using your internal model to predict that aspect, versus caring about your internal model. Furthermore, in the space of all possible minds the vast majority are in the first category, since an agent’s own mind is generally only a tiny portion of the world, so if humans value both then it is the internal part that makes us unusual.
I can’t make you value something any more than I can make a rock value it, the best I can do is convince you that you are allowed to value non-wireheading, and if you don’t feel like you want it then it is privileging the hypothesis to even consider the possibility that you do.
The question is if humans, unlike paperclip maximizer’s, are actually more concerned with maximizing their reward number irregardless of how it is being increased.
If there is a way for humans to assign utility non-arbitrarily, then we are able to apply rational choice to our values, i.e. look for values that are better at yielding utility. If humans measure utility in a unit of bodily sensations, then we can ask what would most effectively yield the greatest amount of bodily sensations. Here wireheading seems to be more efficient than any other way to maximize bodily sensations, i.e. utility.
There even is some evidence for this, e.g. humans enjoy fiction. Humans treat their model of reality as part of reality. If you can change the model, you can change reality.
I don’t agree with all that though, because I think that humans either are not utility maximizer’s or assign utility arbitrarily.
It seems to me that I value both my internal world and the external world. I enjoy fiction, but the prospect of spending the rest of my life with nothing else fails to thrill me.
A lot of people express scepticism of this claim, usually acting as if there is a great burden of proof required to show the external part is even possible. My point is that the external part is both possible and unsurprising.
So my argument against wire heading goes; I don’t feel like I want to be a wirehead, the vast majority of minds in general don’t want to become wireheads, low prior + no evidence = “why has this even been promoted to my attention?”
That depends on the exact implementation. The paperclipper might be purely feedback-driven, essentially a paperclip-thermostat. In that case, it will simulate setting its internal variables to BB(1000), that will create huge positive feedback and it happily wireheads itself. Or it might simulate the state of the world, count the paperclips and then rate it, in which case it won’t wirehead itself.
The second option is much more complex and expensive. What makes you think humans are like that?
I agree with you that there are non-wireheading agents in principle. I just don’t see any evidence to conclude humans are like that.
The former is incredibly stupid, an agent that consistently gets its imagination confused with reality and cannot, even in principle, separate them would be utterly incapable of abstract thought.
‘Expected Paper-clips’ is completely different to paper-clips. If an agent can’t tell the difference between them it may as well not be able to tell houses from dogs. The fact that I can even understand the difference suggests that I am not that stupid.
Really? You can’t see any Bayesian evidence at all!
How about the fact that I claim not to want to wire head? My beliefs about my desires are surely correlated with my desires. How about all the other people who agree with me, including a lot of commenters on this site and most of humanity in general? Are our beliefs so astonishingly inaccurate that we are not even a tiny bit more likely to be right than wrong?
What about the many cases of people strongly wanting things that did not make them happy and acting on those desires, or vice versa?
You are privileging the hypothesis. Your view has a low prior (most of the matter in the universe is not part of my mind, so given that I might care about anything it is not very likely that I will care about one specific lump of meat?). You don’t present any evidence of your own, and yet you demand that I present mine.
Welcome to evolution. Have you looked at humanity lately?
(Ok, enough snide remarks. I do agree that this is fairly stupid design, but it would still work in many cases. The fact that it can’t handle advanced neuroscience is unfortunate, but it worked really well in the Savannah.)
(I strongly disagree that “most of humanity” is against wireheading. The only evidence for that are very flawed intuition pumps that can easily be reversed.)
However, I do take your disagreement (and that of others here) seriously. It is a major reason why I don’t just go endorse wireheading and why I wrote the post in the first place. Believe me, I’m listening. I’m sorry if I made the impression that I just discard your opinion as confused.
It would have a low prior if human minds were pulled out of mind space at random. They aren’t. We do know that they are reinforcement-based and we have good evolutionary pathways how complex minds based on that would be created. Reinforcement-based minds, however, are exactly like the first kind of mind I described and, it seems to me, should always wirehead if they can.
As such, assuming no more, we should have no problem with wireheading. The fact that we do needs to be explained. Assuming there’s an additional complex utility calculation would answer the question, but that’s a fairly expensive hypothesis, which is why I asked for evidence. On the other hand, assuming (unconscious) signaling, mistaken introspection and so on relies only on mechanisms we already know exist and equally works, but favors wireheading.
Economic models that do assume complex calculations like that, if I understand it correctly, work badly, while simpler models (PCT, behavioral economics in general) work much better.
You are correct that I have not presented any evidence in favor of wireheading. I’m not endorsing wireheading and even though I think there are good arguments for it, I deliberately left them out. I’m not interested in “my pet theory about values is better than your pet theory and I’m gonna convince you of that”. Looking at models of human behavior and inferred values, however, wireheading seems like a fairly obvious choice. The fact that you (and others) disagree makes me think I’m missing something.
What do you mean it can’t handle advanced neuroscience? Who do you think invented neuroscience!
One of the points I was trying to make was that humans can, in principle, separate out the two concepts, if they couldn’t then we wouldn’t even be having this conversation.
Since we can separate these concepts, it seems like our final reflective equilibrium, whatever that looks like is perfectly capable of treating them differently. I think that wire-heading is a mistake that arose from the earlier mistake of failing to preserve use-mention distinction. Defending one mistake once we have already overcome its source is like trying to defend the content of Leviticus after admitting that God doesn’t exist.
I didn’t actually think you were ignoring my opinion, I was just using a little bit of hyperbole, because people saying “I see no evidence” when there clearly is some evidence is a pet peeve of mine.
This point interests me. Lets look a little deeper into this signalling hypothesis. Am I correct that you are claiming that while my concious mind utters sentences like “I don’t want to be a wire-head” subconsciously I actually do want to be a wire-head?
If this is the case, then the situation we have is two separate mental agents with conflicting preferences, you appear to be siding with Subconscious!Ben rather than Conscious!Ben on the grounds that he is the ‘real Ben’.
But in what sense is he more real, both of them exist as shown by their causal effect on the world? I may be biased on this issue but I would suggest you side with Conscious!Ben, he is the one with Qualia after all.
Do you, in all honesty, want to be wire-headed? For the moment I’m not asking what you think you should want, what you want to want or what you think you would want in reflective equilibrium, just what you actually want. Does the prospect of being reduced to orgasmium, if you were offered it right now, seem more desirable than the prospect of a complicated universe filled with diverse being pursuing interesting goals and having fun?
Not that I wanna beat a dead horse here, but it took us ages. We can’t even do basic arithmetic right without tons of tools. I’m always astonished to read history books and see how many really fundamental things weren’t discovered for hundreds, if not thousands of years. So I’m fairly underwhelmed by the intellectual capacities of humans. But I see your point.
Capable, sure. That seems like an overly general argument. The ability to distinguish things doesn’t mean the distinction appears in the supposed utility function. I can tell apart hundreds of monospace fonts (don’t ask), I don’t expect monospace fonts to appear in my actual utility function as terminal values. I’m not sure how this helps either way.
Not exactly like this. I don’t think the unconscious part of the brain is conspiring against the conscious one.
I don’t think it’s useful to clearly separate “conscious” and “unconscious” into two distinct agents. They are the same agent, only with conscious awareness shifting around, metaphorically like handing around a microphone in a crowd such that only one part can make itself heard for a while and then has to resort to affecting only its direct neighbors or screaming really loud.
I don’t think there’s a direct conflict between agents here. Rather, the (current) conscious part encounters intentions and reactions it doesn’t understand, doesn’t know the origin or history of, and then tries to make sense of them, so it often starts confabulating. This is most easily seen in split-brain patients.
I can clearly observe this by watching my own intentions and my reactions to them moment-to-moment. Intentions come out of nowhere, then directly afterwards (if I investigate) a reason is made up why I wanted this all along. Sometimes, this reason might be correct, but it’s clearly a later interpolation. That’s why I generally tend to ignore any verbal reasons for actions.
So maybe hypocrisy is a bit of an misleading term here. I’d say that there are many agents that don’t always have privileged access (and aren’t always conscious), so that they get somewhat ignored, which screws up complex decision making, which causes akrasia. Like, “I’m not getting my needs fulfilled and can’t change that myself right now, so I’m going to veto everything!”. On the other hand, the conscious part is now stuck with actions that don’t make sense, so it makes up a story. It signals “oh, I would’ve studied all day, but I somehow couldn’t get myself to stop watching cat videos, even though I hated it”. Really, it just avoided pain of boredom when studying and needed instant gratification. But “akrasia” is a much nicer cover story.
I’m not saying this is perfectly correct or the whole picture, but I think assuming models like this fits my own experiences closer than assuming actual conflicting agents. Also, those unconscious parts, I suspect, are too simple to actually understand wireheading. They want rewards. If they were smart enough, they might see that wireheading is a good solution.
On a somewhat related note, Susan Blackmore often makes the point when talking about free will that she doesn’t have any and doesn’t even have the illusion of free will anymore, but it doesn’t interfere with her actual behavior. Example quote from Conversations On Consciousness (she talks more about this in several radio shows I can’t find right now):
Susan Greenfield: “[Searle] said that when he goes into a restaurant and orders a hamburger, he doesn’t say, ‘Well, I’m a determinist, I wonder what my genes are going to order.’” Susan Blackmore: “I do. You’re right that Searle doesn’t do that, but when I go in a restaurant, I think, ‘Ooh, how interesting, here’s a menu, I wonder what she’ll choose’; so it is possible to do that.”
I’m totally like Blackmore here. I have no idea what I’ll choose tomorrow or even in ten minutes, only that it will be according to rewards, aversion and so on. Not even considering counterfactuals in my decision making (and not making up verbal reasons anymore) hasn’t crippled me in any way, as far as I can tell.
That makes me skeptical that there’s really all that complex a machinery behind all this, and it makes insistence on “but I really value this complex, external thing!” so puzzling.
Also, I don’t think that qualia are a useful concept ever. Let’s not drag any dualism into this by accident. Besides, what makes you think that “what you call qualia” is something your unconscious processes don’t have, right now? What makes you think you have exactly one conscious mind in your skull?
I don’t have an opinion on that, deliberately. I find wireheading very attractive and it seems about equally nice as the complicated universe, but much easier and more of an elegant solution. The halo effect is way too powerful here and I don’t wanna screw myself over just because I didn’t see a fundamental flaw over how pretty the solution was.
(Of course, as per the nature of wireheading, even if I thought it were a good idea, I would spend no effort on convincing anyone of it. What for, because I value them? Then what am I wireheading myself for?)