[deleted] comments on Natural wireheadings: formal request.

[deleted] 2 Jun 2011 23:00 UTC
0 points

I was set up to favor specific pathways to these rewards for reasons that are not my own (but those of my genes, memes or other influences).

I have to ask where your own reasons come from, causally speaking.

Good point. I can’t just disown all reasons or “I” become a rock, which doesn’t appeal to me, identity-wise. I like minimalist identities the most, so I retain pleasure = good, but not reproductive success, for example. In other words, I keep the basic mechanism that evolution gave me to achieve goals, I ignore the meta-goal of reproductive success it had.

I’m not happy with this argument, but I find extended versions that care about externals just as implausible. The choice between both seems arbitrary, so I go with the simpler one for now.

Also, not sure what happens to the value of suicide if you value only your subjective experience. Isn’t it undefined?

Yes. Death itself has fairly close to 0 utility to me, but I don’t like dying (because of the pain and shame it causes me, mostly), so I’m normally against suicide.

Can you try to elaborate on why you value external things? As I said earlier, why wouldn’t I? I value non-green things.

Ok, fair. I can’t provide a better case for even why “pleasure” is good, but “pain” ain’t. It just feels that way to me. That’s just how the algorithm works. I’m just surprised that this difference in perceived values exists. If I further add MrMind’s stated values, either terminal value acquisition is fairly shaky and random in humans or easy to manipulate or very hard to introspect on, despite the appearance to the contrary.

A thought experiment. Imagine “reality” disappears suddenly and you wake up in Omega’s Simulation Chamber. Omega explains that all your life has been a simulation of the wallpaper kind. There weren’t any other minds, only ELIZA-style chatbots (but more sophisticated). Would this make you sad?

I don’t get a particularly bad response from that, maybe only slight disappointment because I was mistaken about the state of the world. I take that as weak evidence that I don’t care much about referents. But maybe I just have shitty relationships with people and nothing much to lose, so I’ll try improving in that regard first, to make that intuition more reliable. (That’s gotta take me some time.)

ETA:

The degradation effect I described seems fairly common. Lots of experiments in happiness studies show that set levels adjust ruthlessly.

New fun gets old. We want variety over time as well as space. Doesn’t affect complexity or external referents.

What about sustainability? What if we run out of interesting complexity?
- [deleted] 8 Jun 2011 0:49 UTC
  0 points
  Parent
  Been thinking more and noticed that I’m confused about how “terminal values” actually work.
  
  It seems like my underlying model of preferences is eliminativist. (Relevant caricature.) Because the decision making process uses (projected and real) rewards to decide between actions, it is only these rewards that actually matter, not the patterns that triggered them. As such, there aren’t complex values and wireheading is a fairly obvious optimization.
  
  To take the position of a self-modifying AI, I might look at my source code and find the final decision making function that takes a list of possible actions and their expected utility. It then returns the action with the maximum utility. It is obvious to me that this function does not “care” about the actions, but only about the utility. I might then be tempted to modify it such that, for example, the list always contains a maximum utility dummy action (aka I wirehead myself). This is clearly what this function “wants”.
  
  But that’s not what “I” want. At the least, I should include the function that rates the actions, too. Now I might modify it so that it simply rates every action as optimal, but that’s taking the perspective of the function that picks the action, not the one that rates it! The rating function actually cares about internal criteria (its terminal values) and circumventing this would be wrong.
  
  The problem then becomes how to find out what those terminal values are and which of those to optimize for. (As humans are hypocritical and revealed preferences often match neither professed nor introspected preferences.) Picking the choosing function as an optimization target is much easier and almost consistent.
  
  I’m not confident that this view is right, but I can’t quite reduce preferences in any other consistent way. I checked the Neuroscience of Desire again, but I don’t see how you can extract caring about referents from that. In other words, it’s all just neurons firing. What these neurons optimize is being triggered, not some external state of the world. (Wireheading solution: let’s just trigger them directly.)
  
  For now, I’m retracting my endorsement of wireheading until I have a better understanding of the issue. (I will also try to not blow up any world as I might still need it.)