We need to think more about Terminal Values
I just sent an email to Eliezer but it is also applicable to everyone on this website. Here it is:
Hi Eliezer,
I’m contacting you directly because I think I have an idea that could get us closer to obtaining our terminal value(s), and you are in a position of influence but also seem like someone I could possibly influence if my idea is correct. If my idea is wrong at least I’d get a lot out of hearing why.
First. I think your definition of Terminal Value is flawed. People don’t ultimately want survival or health or friendship. I know that because if I asked you why you want survival, the answer would ultimately be because circuitry and chemistry make us. So it’d appear that our terminal values are actionable motivations from circuitry and biology. If we stopped our line of questioning there we’d think our terminal value was that, in which case the optimal path of action would be to find a way to hack our circuitry and biology to give us positive signals. But I don’t think we should stop there.
If you model humans as optimizers of this terminal value then you’d say to people “you should try to optimize your terminal value” but it’s not really a should because that’s already what people do to various degrees of effectiveness. Terminal values aren’t “should”s, they’re “shall”s. The real question here is what do you mean when you say “you.”
Let me know if any part of this seems logically unsound but I’m now going to make another argument that I’m still having a hard time processing:
“You” is poorly defined. Evidence:
- That tree over there is just as much a part of “you” as your hand is. The only difference is your conscious mind is better at finely controlling your hand than at controlling the tree, but same as how you can cause your hand to move you can also cause the tree to move by pushing it.
- The “you” some years from now is made of entirely different atoms yet you don’t behave as if “you” will be dead when that happens—you still think of that as you. That means “you” is more than your physical body.
- If “you” is more than your physical body then, the first thing I was saying about chemical and electric signals being our terminal value don’t make sense.
New model:
- We are a collective system of consciousnesses that is in some sense conscious. Humans are like neurons in a brain. We communicate like neurons. We communicate in various ways, one of them language. The brain might have a different terminal value than the neuron.
- Question: what “should” a neuron in a brain’s terminal value be? Its own possibly faultily programmed terminal value? Or the terminal value of the brain? I think it depends on the neurons level of awareness, but once it realizes it’s in a brain and thinks of its “self” as the brain, its terminal value should be the brain’s.
Possibly the brain is likewise a neuron in a bigger brain but nevertheless somewhere there must be a real terminal value at the end or at least a level to which we are unable to find any more information about it from. I think if we can expand our definition of ourselves to the whole brain rather than our little neuron selves, we “should” try to find the terminal value.
Why do I think that we “should” do that? Because knowing more about what you want puts you in a much better position to get what you want. And there’s no way we wouldn’t want what we want—the only question here is if we think the expected value of trying to figure out what we ultimately want makes it part of the optimal path for getting what we ultimately want. From my information and model, I do think it’s part of the optimal path. I also think its entirely possible to figure out what the terminal value is if we’re “intelligent” enough, by your definition of intelligence so the expected value is at least positive.
I think the optimal path is:
- figure out what the “self” is
- figure out what the self’s terminal value is
- use science and technology and whatever means necessary to get that terminal value
So this is where I ask:
- Are there any words that I’m using in an unclear way? (Often smart people don’t actually disagree but are using different definitions)
- Are there any logical flaws in the statements I’ve made? (I’ve kept my message shorter than it probably should’ve been thinking that this length is enough to get the desired effect but I’m extremely willing to elaborate)
- Do you agree with the conclusion I’ve drawn?
- If yes what do you think we should do next?
- If no I beg of you please explain why as it would greatly help with my optimization towards my terminal value
I think writing something like this is a bit like a rite of passage. So, welcome to LW :P
When we talk about someone’s values, we’re using something like Dan Dennett’s intentional stance. You might also enjoy this LW post about not applying the intentional stance.
Long story short, there is no “truly true” answer to what people want, and no “true boundary” between person and environment, but there are answers and boundaries that are good enough for what people usually mean.
Thanks so much for replying!
I’m still reading Dan Dennett’s intentional stance now so I won’t address that right now, but in terms of /not/ applying the intentional stance, I think we can be considered different from the “blue minimizer” since the blue minimizer assumes it has no access to its source code—we do actually have access to our source code so can see what laws govern us. Since we “want” to do things, we should be able to figure out why we “want” anything or really, why we “do” anything. To be clear, are you saying that instead of the equations being X=”good points” and Y=”good points” and the law is “maximize good points” the law might just be DO X and Y? If so I still don’t think things like “survival” and “friendship” are terminal values or laws of the form “SURVIVE” and “MAKE FRIENDS”. When these two are in conflict we still are able to choose a course of action therefore there must be some lower level law that determines the thing we “want” to do (or more accurately, just do if you don’t want to assign intention to people).
I also want to address the point that you said there are answers and boundaries good enough for what people usually mean—I think what we should really be going for is “answers and boundaries good enough to get what we really /want/.” I think a common model of humans in this community is somewhat effective optimizers upon a set of terminal values, if that’s really true, in order to optimize our terminal value(s) we should be trying to know them, and as I said I think the current idea that we can have multiple changeable terminal values contradicts the definition of a terminal value.