What Should We Optimize—A Conversation

The following is a lightly edited chat conversation from 2022-04-07. My interlocutor requested to be anonymized as Jason. This conversation sprang out of Jason reading about my transhuman dream.

(The conversation starts out a bit slow, but overall covers a lot of ground.)

Jason: Maxing positive experiences, and minimizing negative experiences seems misguided. I am convinced that positive experience is purely defined by its contrast to negative experience. Without negative experience, the positive experience is just what you do, it’s normal.

I think a better goal would be to minimize suffering. After all, people even like negative experiences (watch horror movies), but nobody likes suffering.

Maybe that’s what you meant all along, and I just interpreted the statement too literally.

Johannes: You don’t think about bad things if you have a positive experience, like laughing hard, or having an orgasm. Also after your logic, minimizing suffering seems wrong. It seems more like you should be in favor of everybody having at least one intensely negative experience. And then only after that, you would minimize suffering.

Jason: Positive/​negative experience is more like positive/​negative pressure, not charge. Let me propose a model:

We rate experiences not by an absolute measure, but by an implicitly defined (relative) ordering among them. The things in the bottom half are what we call negative experiences, and the upper half are the positive ones (roughly speaking).

This ordering seems to be defined subconsciously. So you may not consciously think “Man, cumming is so much better than driving to work”, but that thought may be happening subconsciously: The computation that tells you that this is a positive experience. A table lookup if you will. Sleep or other processing of experience shapes the ordering/​table.

So like with pressure, if you raise the pressure (“remove” the “negative” experience), you’re just left with a smaller range, which still has a bottom and top half.

And we know this to be true experimentally (hedonic treadmill).

So what about suffering? I think in this model, suffering is a subset of the negative experience. Just defining it as the worst negative experience doesn’t seem complete. Suffering seems like an orthogonal metric (though less correlated with “positive” experience).

I don’t understand your point about “after your logic minimizing suffering seems wrong”.

If you don’t agree with this model (which seems to describe the world quite well), what else do you propose?

Oh, and that ordering is subjective. However, suffering doesn’t seem that subjective. It might be. I don’t have a definition for suffering—besides (seemingly) unnecessary negative experience. But even that’s too simple. Even our suffering shapes us and makes us who we are.

This is the problem I have with “maximize positive, minimize negative”. It assumes an absolute metric of “goodness”, and the world is just more complicated than that. and if you optimize under false assumptions (that “negative” experience has no pros), you tend to end up in a bad place or return to where you started but caused a bunch of trouble on the way.

So please tell me what you think is wrong with this reasoning.

Johannes: I do not agree with the model. It might be useful for reasoning about humans, but I am pretty sure it is not valid in general. I see no reason why a being, in general, would have to experience negative things and positive things.

I never said that negative experiences are only bad. If a negative experience causes enough positive experience such that it would become a good trade, compared to other available actions, you should cause the experience to happen.

Jason: How do you define positive/​negative experience then? If you leave out the definition, then of course there is no reason, why they should be necessary.

“if a negative experience causes enough positive experience such that it would become a good trade”, how are you going to predict that? That’s something even an AGI can’t do because the world is chaotic. Generating better weather forecasts is not about being smarter about it. Then we’d have seen significant improvement over the last decade.

(About determining whether a concrete negative experience should “be had”) Even if you had the computing power to model the world accurately enough, you couldn’t get the data for the model, because first, you’d need crazy accurate measurements, which an AGI could perhaps develop, but second, you can’t scan the entire world/​earth at once, because the speed of light is finite, and it is impossible to synchronize clocks (not 100% sure about this, but if you moved them apart, they would get affected differently by gravity, so they would drift apart).

You can’t get the full picture for some specific `t` (which is not even something that’s defined in relativity), so you can’t use the “perfect world model”, because you can’t feed it the data it needs. Unless physics is “wrong”.

I don’t disagree with your arguments on a theoretical level. But they are too abstract to be meaningful. And once you make them more concrete, as I’ve tried here, they seem to fall apart. But maybe you can convince me otherwise.

Johannes: We got better because we got smarter about it in some sense, for many things that we do, including weather forecasts.

Either what negative and positive experiences are, is intuitively clear or it is not. Look at your own experience and see what is good or bad. There is no way right now that we can use to define this in any rigorous way.

If you have artificial beings it is easy to see that needing a perfect world model would be much less of an issue. For humans, it might be more complicated, but if you suggest that not trying to optimize would be better in expectation, then I think you are wrong. Just because we can’t do the perfect thing, using the perfect model, doesn’t mean you should not try.

Jason: It’s not about perfection. it’s about practicality. Determining the consequences of one specific action is incredibly specific. You need insanely accurate data.

There are tons of weird events that are practically unpredictable (like a cosmic ray crashing your computer, preventing you from getting a flight) each with tiny probabilities. But the world is so large, that these things still happen all the time. Not to you, but to someone, and they interact with you indirectly.

It’s more like predicting the exact temperature at your house in exactly one year with an accuracy of 10<sup>-6</​sup>.

I’m not trying to get at the truth here. I’m trying to get you to think because what seems obvious and intuitively clear to you seems dubious if not completely ridiculous to me.

Thinking about how many laws of physics you’ll need to break could be a good idea—if you don’t want things to remain purely theoretical/​fictional unless you want to be a writer.

I’m perhaps too critical, but that’s almost the point, you know?

Johannes: I think it is reasonable to take into account the scenario where we get some theory about consciousness, that tells us what to do if we want to maximize suffering and minimize suffering. I and you, have already a pretty good theory with regards to humans. Predict how it would be for a human if you cut off their hand. Predict how a human would react if they are out alone in a dark alley at night, and they see someone following them. And then predict how somebody would feel if they would work on something that they find important, in which they have lots of autonomy and manage to make progress regularly. Predict how some human feels while they are having an orgasm.

Predicting positive experiences for other humans seems often harder because it depends more on the individual. But we (humans) can still do a lot. This is not about trying to perfectly predict the best thing for making humans happy that works for all humans, but rather in expectation, we can think of things that would work for many humans. The theory that we have in our minds certainly seems pretty good to me, compared to having no theory at all.

And we are doing it without observing the entire world all at once. Of course, you can’t do that. But we don’t need to. What our brains are doing is already showing that we don’t.

And a similar thing applies to animals as well. Let’s only consider mammals. Predict how an animal would feel if you give it good food. Predict how an animal would feel if you give it an electric shock. Sure at some level, we can’t even be sure that they are conscious. But my bet right now is that they are and that they can have positive and negative experiences. Their brains are made of the same stuff as our brains. Their neural architecture is very similar in many ways. And their behavior is similar to ours, when we are happy, and when we are in pain.

Jason: I guess I don’t understand what you’re intending to do when you say “maximize positive, minimize negative experience”. Do you want to put people on an “experience diet”—no negative experiences?

Or do you want to build a world of choice, where people can engage freely in “positive experiences”, if they so choose?

It always sounds more like the former. Determining using some theory, what’s positive and what’s negative, and then eradicating all “negative” from the universe. And that sounds like dystopia to me. The thing about optimizing under false/​incomplete assumptions.

Dystopia as in brave new world kinda stuff: take your happy juice and go have sex all day, because we’ve decided that’s what you like.

Killing the individual.

So I guess I’d be fine with “maximizing the accessibility of positive experience while pre-empting nobody from having any kind of negative experience”.

Choice is important to me.

For reasons of uncertainty.

Johannes: That is a good point. I think that if somebody wanted to experience something negative for some good reason, then it would be alright. But they would need to understand what they are doing. People can be mistaken about the consequences of their actions.

Also, it seems possible to construct beings that would just constantly seek pain and misery. They would look from the outside as if they wanted to suffer, but on the inside, there might be some sort of subsystem that would experience the pain. And that subsystem might not want to suffer. Or it might not even comprehend anything at all, except that there is pain. It is a bit unclear what we should do about systems like that, as we do not really understand consciousness, but right now I air on the side of not having systems like that. Or more restricted: It seems that certain systems like that would be bad to have.

Or even more general: It seems possible to construct certain systems that would do things that we don’t want because they cause certain conscious experiences to arise within themselves that we don’t want to cause.

Jason: Why do you want to sandbox everyone, preventing them from having negative experiences at all costs (referring to the “for some good reason” part)?

That’s what seems so fucked up to me.

It’s like a helicopter parent, wanting the best for their child, but doing harm. The road to hell is paved with good intentions.

What’s wrong with learning, “oh shit, I’ll never do that again”? I find pre-emption disrespectful and presumptuous. It says that being is a stupid, fragile child.

You know, that example specifically: What are you going to do about those beings?

Kill them?

Prevent them from hurting themselves? Well, constraining them might cause even more suffering.

Johannes: Maybe the issue here is in part, that we are thinking about different scenarios. I am thinking about the whole range of conscious beings that could exist. Or at least about the ones that I can imagine, while you seem to focus on humans.

The thing to realize is that beings can exist, that would better not be free. We don’t want some murderer to be free to harm us. But it is at least equally terrible if we would have a system that would just create a lot of negative experiences in some way. For example, by spawning sub-agents that would be miserable. Even if the system was not aware of that and would have what we would see as good intentions, we don’t want that to happen.

And there are more subtle scenarios that we would still want to prevent. Like some system repeatedly not managing to avoid negative experiences for themselves or others.

But it can get way iffier. Consider a system that does strictly oppose being helped in any way, and yet we could help it. Certainly, at least in extreme scenarios of this kind, I think coercing the system into making it have better experiences is the right thing to do. At least if you think that you have a good enough understanding of how consciousness works and you know that the system you want to coerce has a much weaker understanding. You might be able to convince the system, by upgrading its understanding, that you are right. But there are systems where you will simply not be able to do that either.

And it is an interesting question what you should do if you could predict with near certainty that you could give an argument that would convince the system. Once we are talking about computer systems, think of getting a proof that they would behave a certain way after getting some set of inputs. In that case, arguably, you could just go ahead and modify the system as if you had given it the argument if that would be better, e.g. by saving resources.

I agree that this is very dangerous territory, and I am not sure where the boundaries lie here. But I am very sure that future intelligent systems will need to take all of these things I mentioned into account if they want to maximize positive experiences and minimize negative experiences. And I think there is probably a way that you can take all of these things into account without sliding into an authoritarian dystopia.

Jason: So this all comes full circle back to AI alignment?

You don’t want an AGI to create lots of suffering beings, because that’s an efficient way to do whatever.

Johannes: We should at least take into account that the theory of consciousness might be such that beings can just stop existing without any pain, without any fear of death, and without any negative experience in the process. Then causing the beings to stop existing should, at the very least, be an option that is considered. The only loss in that situation might be the future positive experiences that these beings could have. But if you can create a new being to fill that gap, that might not be an issue, especially if the new being is better at being “well”. Alternatively, making that being stop existing, might also cause other positive experiences in another way. In that case, it might also be the right choice.

I am not sure what you mean with the relation to AI. Getting AI alignment right is important for almost every objective that you could have. But right now I was just trying to convey considerations about what one should think about in the general case if they have the objective of maximizing positive and minimizing negative conscious experiences.

Jason: I still think the wording is very important because other people will make assumptions. It seems you don’t actually want to “minimize negative experience”, or at least that’s not the main goal.

The main goal seems to be to minimize the amount of potential, artificially created negative experience. (wording TBD)

Would you say that is accurate?

That’s way easier to get behind because it is not so “absolute”. And it connects with responsibility. It’s less about agenda.

Or perhaps we could minimize the accidental negative experience.

I quite like that. It’s like accidental complexity, in so many ways. Complexity isn’t bad. Some complexity is essential. But any extra complexity is bad (though perhaps necessary in some contexts).

Negative experience is like complexity. Something to be avoided where possible. Declaring it as purely bad and always undesirable/​unnecessary is incomplete.

Maybe that’s just me (I’m trained to take things literally and not make extra assumptions after all) and that’s implicit when people say “minimize negative experience”.

But it seems important to clarify that somewhere.

Johannes: Well, that is part of it. I certainly think we should take all that into account when we or other future systems create artificial conscious experiences.

But there is really no boundary that I can see between artificial experiences and normal human experiences. Maybe the same considerations should apply.

Humans don’t want to be restricted or controlled. And trying to do so would create negative experiences. Living in a society that is laxer with death might cause a lot of negative experiences for humans.

This will sound very extreme, but right now I can’t even see what is wrong with just exterminating all humans painlessly and replacing them with beings that are just happier. In a less extreme scenario, this could simply involve mutating all the human brains such that they would become much better at generating well-being.

It is not that that this is a goal of mine that I have set in iron. It just seems to follow from wanting to optimize experiences. So when you argue against that, please try to argue against the idea itself, in such a way that I find it convincing, instead of just saying that you would not want something like this. I think that is a valid criticism, but I wonder if there are any more general objections to this.

Just to be clear, I would be happy with a world where we would just modify the humans that would want to be modified, and where people can do whatever they want within reason. That would not be the best world for optimizing for positive experiences, but it can get close, and it might actually be the world that I prefer for self-centered reasons. Certainly, a big part of me prefers that world. It should be clear from my transhuman dream.

Jason: it is impossible to make an argument for or against that (replacing all humans). It comes down to values.

  • You value maximizing positive and minimizing negative experiences.

  • I value coexistence and not destroying that which already exists (though perhaps preventing things from existing).

Neither is right.

You can’t make conclusions in an empty context. You need to fill it with some axioms first. Values in this case. And as far as we can tell, there are no values given to us by God or the universe.

So we have to choose our own. Meaning that no set of axioms/​values is right or wrong.

We can talk about consistency. But once we order values by consistency (or some other metric), we implicitly fill the context with the value to maximize that metric first.

And under all these assumptions about the universe, I’ve chosen the value of coexistence.

Johannes: It might all be a matter of what one wants perhaps. At some level where I try to logically think about it, positive conscious experiences, and the absence of negative conscious experiences is the thing that I value most.

But there are many more things that I want. I feel like one of the core problems here is that we do not understand consciousness. If we did, there is a chance for things to get clearer.

But why do you value coexistence? It seems likely that this is ultimately just a good heuristic that evolution build it. But once you start to think about it, the rationalization that you might come up with would be that it is good, because it increases the harmony between beings, and so maximizes the positive experiences that other beings can have (You would probably consider mainly humans if we were not in the current context). And at the very least you would like that it allows you to do what you want if others also use this heuristic.

So it somewhat seems like maximizing positive experiences is somewhat implied by the value of coexistence, at least to a significant degree. Also, that does not seem to capture all of your values. I feel like you value also freedom highly. Brave New World seems to do really well on the “value-axis” of coexistence, but you don’t like that world.

Jason: I guess you can bend it into looking like maximizing positive experience.

Whatever.

For me, it comes down to what I wrote above. And there is some value of a “based” argument hidden inside that. It might be certainty actually (only if the entire tree of reasoning is valid, is the conclusion valid). And some innate respect for other beings.

Yeah, freedom is the flipside of respect. It’s pretty entangled.

Johannes: Do you mean it comes down to the impossibility of any objective moral reality? (Is this what you mean with “written above”?)

Jason: Yeah, that whole thing. The “based” part means: Given your whole reasoning tree, can I follow your reasoning? And that only works with shared values, because the context is empty by default. So I try to choose the least presumptuous “root” value (set).

Johannes: To me maximizing positive and minimizing negative experiences just seems to be very fundamental. In almost everything that anyone would ever want, it is implied that what they care about is that. Possibly only for themselves.

Maybe something gets lost in the translation down to the more fundamental goal. But I am pretty sure that it is very fundamental to what I, and what humans want in general, if they were to just think about it long enough.

It seems that maximizing positive and minimizing negative experiences (let’s call this MPMN from now on) is the least presumptuous “root” value.

Jason: Sadly, that root value is not very useful. So I often have to extend it to include human values.

And no, anything that optimizes is exclusive by definition!

Johannes: What do you mean by that it is not useful?

Jason: Judging by this discussion, I think that with the human value extension, I can mostly agree with your ideas.

Pretty much any action will affect someone or something else. So you can’t act if you only value coexistence. Unless you live very far apart in outer space. And even then, you’re technically limiting other people by using energy for what you care about. Because the universe is finite.

Johannes: Any sort of value can be optimized for. MPMN is a value. But I don’t see any reason why this value could not be combined with other values basically in arbitrary ways. For example, optimize for MPMN unless somebody holds a banana in his hand. Then do nothing.

Jason: Pure coexistence is also not “stable”. It can’t protect itself from murderers (etc). Unless you add the value that non-valuers of coexistence are exempt from coexistence.

No this is about precision: you’re no longer optimizing for “MPMN” if you’re optimizing for “MPMN unless banana”. That’s why optimization is so dangerous, and we had better give an AGI a pretty complex and extensible objective/​value.

Otherwise, it’ll e.g. turn everything into computers, as we’ve talked about some time ago.

Johannes: Sure, you don’t optimize for MPMN if you don’t optimize for MPMN. My point is that MPMN seems to be a very important thing, that I, and I think most humans, care about at least at some level. MPMN seems to be a very useful concept because it points at something that we want to take into account when optimizing the universe. And to me, it seems so fundamental that one is somewhat hard-pressed to find reasons for why this objective would be bad to optimize for directly. At least for finding reasons that feel equally fundamental.

Jason: it still boggles my mind that all this is based on the assumption that there are “positive” and “negative” experiences.

What happened to your meditation insights? For example, that pain is neither good nor bad, but just is.

What most people take away from meditation (as you have, if I understood you correctly) is that all judging is done by us. The universe just is the way it is. It is you who assigns a negative value to pain. It is purely your opinion (which is shared by many humans) that pain is generally bad (though also important for survival. If you can’t feel the pain of heat, you’d burn your hand on a stove and not even notice). So it should be entirely clear that even something that seems good, like minimizing pain, is not universal. It is not the least presumptuous value.

I don’t know how else to explain this. You already understood it. Why don’t you connect the dots?

Johannes: Just because it is possible to pay attention in such a way that pain does not influence you anymore, does not mean that positive and negative experiences do not exist. They might not exist if you pay attention in the right way, but that does not mean they don’t once you stop paying attention in this way.

I don’t know how to integrate this insight though. Maybe we can’t until we understand consciousness better. It certainly seems like we could construct beings for whom the concepts of positive and negative experience would not make sense, because these things are not things that they experience.

Ultimately we don’t know what sort of conscious experiences are possible. Even people who did take DMT can’t. And experiences on DMT are often described as indescribable. These people might have had experiences that are very different from ordinary consciousness, but I don’t think that they have gotten the whole picture of what conscious experiences can be like if we consider computational systems in general. Not even close.

I hope that we will get a better and better understanding of consciousness. And as we do, it seems possible that MPMN would seem childish. And retrospectively I might be seen as a kid who answers the question of “If anything was possible, what would you want?”, with: “Cotton Candy!”.

Though I can also imagine that MPMN is not too far off from what I would consider valuable if I really would understand consciousness.

Just filling the universe with beings that can’t experience positive or negative experiences, but which do have experience, does not seem bad at least. Even if their experience is not more expansive than that of a human.