Imagine you have something like a chess playing program. It’s got some sort of basic position evaluation function, then uses some sort of look ahead to assign values to the instrumental nodes based on the terminal nodes you anticipate along the path. But unless the game actually ends at the terminal node, it’s only “terminal” in the sense that that’s where you choose to stop calculating. There’s nothing really special about them.
Human beings are different from the chess program in that for us the game never ends, there are no “true” terminal nodes. As you point out, we care what happens after we are dead. So wouldn’t it be true that in a sense there’s nothing but instrumental values, that a “terminal value” just means that a point at which we’ve chosen to stop calculating, rather than saying something about the situation itself?
I would propose an approximation of the system where each node has a terminal value of its own (which can be 0 for completely neutral nodes, but actually no they cannot—reinforcement mechanisms of our brain inevitably give something like 0.0001 because I heard someone say it was cool once or −0.002 because it reminds me of a sad event in my childhood)
As a simple example, consider eating food when hungry. You get a terminal value on eating food—the immediate satisfaction the brain releases in the form of chemicals as a response to recognition of the event, thanks to evolution—and an instrumental value on eating food, which is that you get to not starve for a while longer.
Now let’s say that while you are a sentient optimization process that can reason over long projections of time, you are also a really simple one, and your network actually doesn’t have any other terminal values than eating food, it’s genuinely the only thing you care about. So when you calculate the instrumental value of eating food, you get only the sum of getting to eat more food in the future.
Let’s say your confidence in getting to eat food next time after this one decreases with a steady rule. For example, p(i+1)=p(i)*0.5. If your confidence that you are eating food right now is 1, then your confidence that you’ll get to eat again is 0.5, and your confidence that you’ll get to eat the time after that is 0.25 and so on.
So the total instrumental value of eating food right now is limit of Sum(p(i) * T(food)) where i starts from 0 and approaches infinity (no I don’t remember enough math to write this in symbols).
So the total total value of eating food is T(food) + Sum (p(i)*T(food)). It’s always positive, because T(food) is positive and p(i) is positive and that’s that. You’ll never choose not to eat food you see in front of you, because there are no possible reasons for that in your value network.
Then let’s add the concept of ‘gross food’, and for simplicity’s sake ignore evolution and suggest that it exists as a totally arbitrary concept that is not actually connected to your expectation of survival after eating it. It’s just kinda free floating—you like broccoli but don’t like carrots, because your programmer was an asshole and entered those values into the system. Also for simplicity’s sake, you’re a pretty stupid reasoning process that doesn’t actually anticipate seeing gross food in the future. In your calculation of instrumental value there’s only T(food) which is positive, and T(this_food) which can be positive or negative depending on the specific food you’re looking at appears ONLY while you’re actually looking at it. If it’s negative, you’re surprised every time (but don’t update your values because you’re a really stupid sentient entity and don’t have that function).
So now the value of eating food you see right now is T(this_food) + Sum (p(i)*T(food)). If T(this_food) is negative enough, you might choose to not eat food. Of course this assumes we’re comparing to zero, ie you assume that if you don’t eat right now you’ll die immediately and also that’s perfectly neutral and you don’t have opinions on that (you only have opinions on eating food). If you don’t eat the food you’re looking at right now, you’ll NEVER EAT AGAIN, but it might be that it’s gross enough that it’s worth it! More logically, you’re comparing T(this_food) + Sum (p(i)*T(food)) to Sum(p(i)*T(food)) * p(not starving immediately). The outcome depends on how high the grossness of the food is and how high you evaluate p(not starving immediately) to be.
(If the food’s even a little positive, or even just neutral, eating it wins every time, since p(not starving immediately) is <1 and not having it there wins automatically)
Note that the grossness of food and probability of starving are already not linear in how they correlate in their influence on the outcome. And that’s just for the idiot AI that knows nothing except tasty food and gross food! And if we allow it to compute T(average_food) based on how much of what food we’ve given it, it might choose to starve rather than eat gross things it expects to eat in the future! Look, I’ve simulated willful suicide in all three simplifications so far! No wonder evolution didn’t produce all that many organisms that could compute instrumental values.
Anyway, it gets more horrifically complex when you consider bigger goals. So our brain doesn’t compute the whole Sum( Sum(p(i)*T(outcome(j)))) every time. It gets computed once and then stored as a quasi-terminal value instead. QT(outcome) = T(outcome) + Sum( Sum(p(i)*T(outcome(j)))), and it might get recomputed sometimes, but most of the time it doesn’t. And recomputing it is what updating our beliefs must involve. For ALL outcomes linked to the update.
I have a question about this picture.
Imagine you have something like a chess playing program. It’s got some sort of basic position evaluation function, then uses some sort of look ahead to assign values to the instrumental nodes based on the terminal nodes you anticipate along the path. But unless the game actually ends at the terminal node, it’s only “terminal” in the sense that that’s where you choose to stop calculating. There’s nothing really special about them.
Human beings are different from the chess program in that for us the game never ends, there are no “true” terminal nodes. As you point out, we care what happens after we are dead. So wouldn’t it be true that in a sense there’s nothing but instrumental values, that a “terminal value” just means that a point at which we’ve chosen to stop calculating, rather than saying something about the situation itself?
I would propose an approximation of the system where each node has a terminal value of its own (which can be 0 for completely neutral nodes, but actually no they cannot—reinforcement mechanisms of our brain inevitably give something like 0.0001 because I heard someone say it was cool once or −0.002 because it reminds me of a sad event in my childhood)
As a simple example, consider eating food when hungry. You get a terminal value on eating food—the immediate satisfaction the brain releases in the form of chemicals as a response to recognition of the event, thanks to evolution—and an instrumental value on eating food, which is that you get to not starve for a while longer.
Now let’s say that while you are a sentient optimization process that can reason over long projections of time, you are also a really simple one, and your network actually doesn’t have any other terminal values than eating food, it’s genuinely the only thing you care about. So when you calculate the instrumental value of eating food, you get only the sum of getting to eat more food in the future.
Let’s say your confidence in getting to eat food next time after this one decreases with a steady rule. For example, p(i+1)=p(i)*0.5. If your confidence that you are eating food right now is 1, then your confidence that you’ll get to eat again is 0.5, and your confidence that you’ll get to eat the time after that is 0.25 and so on.
So the total instrumental value of eating food right now is limit of Sum(p(i) * T(food)) where i starts from 0 and approaches infinity (no I don’t remember enough math to write this in symbols).
So the total total value of eating food is T(food) + Sum (p(i)*T(food)). It’s always positive, because T(food) is positive and p(i) is positive and that’s that. You’ll never choose not to eat food you see in front of you, because there are no possible reasons for that in your value network.
Then let’s add the concept of ‘gross food’, and for simplicity’s sake ignore evolution and suggest that it exists as a totally arbitrary concept that is not actually connected to your expectation of survival after eating it. It’s just kinda free floating—you like broccoli but don’t like carrots, because your programmer was an asshole and entered those values into the system. Also for simplicity’s sake, you’re a pretty stupid reasoning process that doesn’t actually anticipate seeing gross food in the future. In your calculation of instrumental value there’s only T(food) which is positive, and T(this_food) which can be positive or negative depending on the specific food you’re looking at appears ONLY while you’re actually looking at it. If it’s negative, you’re surprised every time (but don’t update your values because you’re a really stupid sentient entity and don’t have that function).
So now the value of eating food you see right now is T(this_food) + Sum (p(i)*T(food)). If T(this_food) is negative enough, you might choose to not eat food. Of course this assumes we’re comparing to zero, ie you assume that if you don’t eat right now you’ll die immediately and also that’s perfectly neutral and you don’t have opinions on that (you only have opinions on eating food). If you don’t eat the food you’re looking at right now, you’ll NEVER EAT AGAIN, but it might be that it’s gross enough that it’s worth it! More logically, you’re comparing T(this_food) + Sum (p(i)*T(food)) to Sum(p(i)*T(food)) * p(not starving immediately). The outcome depends on how high the grossness of the food is and how high you evaluate p(not starving immediately) to be.
(If the food’s even a little positive, or even just neutral, eating it wins every time, since p(not starving immediately) is <1 and not having it there wins automatically)
Note that the grossness of food and probability of starving are already not linear in how they correlate in their influence on the outcome. And that’s just for the idiot AI that knows nothing except tasty food and gross food! And if we allow it to compute T(average_food) based on how much of what food we’ve given it, it might choose to starve rather than eat gross things it expects to eat in the future! Look, I’ve simulated willful suicide in all three simplifications so far! No wonder evolution didn’t produce all that many organisms that could compute instrumental values.
Anyway, it gets more horrifically complex when you consider bigger goals. So our brain doesn’t compute the whole Sum( Sum(p(i)*T(outcome(j)))) every time. It gets computed once and then stored as a quasi-terminal value instead. QT(outcome) = T(outcome) + Sum( Sum(p(i)*T(outcome(j)))), and it might get recomputed sometimes, but most of the time it doesn’t. And recomputing it is what updating our beliefs must involve. For ALL outcomes linked to the update.
...Yeah, that tends to take a while.