We need a name for the following heuristic, I think, I think of it as one of those “tribal knowledge” things that gets passed on like an oral tradition without being citeable in the sense of being a part of a literature. If you come up with a name I’ll certainly credit you in a top level post!
I heard it from Abram Demski at AISU′21.
Suppose you’re either going to end up in world A or world B, and you’re uncertain about which one it’s going to be. Suppose you can pull lever LA which will be 100 valuable if you end up in world A, or you can pull lever LB which will be 100 valuable if you end up in world B. The heuristic is that if you pull LA but end up in world B, you do not want to have created disvalue, in other words, your intervention conditional on the belief that you’ll end up in world A should not screw you over in timelines where you end up in world B.
This can be fully mathematized by saying “if most of your probability mass is on ending up in world A, then obviously you’d pick a lever L such that V(L|A) is very high, just also make sure that V(L|B)>=0 or creates an acceptably small amount of disvalue.”, where V(L|A) is read “the value of pulling lever L if you end up in world A”
Why are you specifying 100 or 0 value, and using fuzzy language like “acceptably small” for disvalue?
Is this based on “value” and “disvalue” being different dimensions, and thus incomparable? Wouldn’t you just include both in your prediction, and run it through your (best guess of) utility function and pick highest expectation, weighted by your probability estimate of which universe you’ll find yourself in?
Why are you specifying 100 or 0 value, and using fuzzy language like “acceptably small” for disvalue?
100 and 0 in this context make sense. Or at least in my initial reading: arbitrarily-chosen values that are in a decent range to work quickly with (akin to why people often work in percentages instead of 0..1)
Is this based on “value” and “disvalue” being different dimensions, and thus incomparable?
It is—I’m going to say “often”, although I am aware this is suboptimal phrasing—often the case that you are confident in the sign of an outcome but not the magnitude of the outcome.
As such, you can often end up with discontinuities at zero.
Wouldn’t you just include both in your prediction, and run it through your (best guess of) utility function and pick highest expectation, weighted by your probability estimate of which universe you’ll find yourself in?
Dropping the entire probability distribution of outcomes through your utility function doesn’t even necessarily have a closed-form result. In a universe where computation itself is a cost, finding a cheaper heuristic (and working through if said heuristic has any particular basis or problems) can be valuable.
The heuristic in the grandparent comment is just what happens if you are simultaneously very confident in the sign of positive results, and have very little confidence in the magnitude of negative results.
It is often the case that you are confident in the sign of an outcome but not the magnitude of the outcome.
This heuristic is what happens if you are simultaneously very confident in the sign of positive results, and have very little confidence in the magnitude of negative results.
I’m not sure I understand. If the lever is +100 in world A and −90 in world B, it seems like a good bet if you don’t know which world you’re in. Or is that what you mean by “acceptably small amount of disvalue”?
Obviously there are considerations downstream of articulating this, one is that when P(A)>P(B) but V(LA|A)<V(LB|B) so it’s reasonable to hedge on ending up in world B even though it’s not strictly more probable than ending up in world A.
We need a name for the following heuristic, I think, I think of it as one of those “tribal knowledge” things that gets passed on like an oral tradition without being citeable in the sense of being a part of a literature. If you come up with a name I’ll certainly credit you in a top level post!
I heard it from Abram Demski at AISU′21.
Suppose you’re either going to end up in world A or world B, and you’re uncertain about which one it’s going to be. Suppose you can pull lever LA which will be 100 valuable if you end up in world A, or you can pull lever LB which will be 100 valuable if you end up in world B. The heuristic is that if you pull LA but end up in world B, you do not want to have created disvalue, in other words, your intervention conditional on the belief that you’ll end up in world A should not screw you over in timelines where you end up in world B.
This can be fully mathematized by saying “if most of your probability mass is on ending up in world A, then obviously you’d pick a lever L such that V(L|A) is very high, just also make sure that V(L|B)>=0 or creates an acceptably small amount of disvalue.”, where V(L|A) is read “the value of pulling lever L if you end up in world A”
Why are you specifying 100 or 0 value, and using fuzzy language like “acceptably small” for disvalue?
Is this based on “value” and “disvalue” being different dimensions, and thus incomparable? Wouldn’t you just include both in your prediction, and run it through your (best guess of) utility function and pick highest expectation, weighted by your probability estimate of which universe you’ll find yourself in?
100 and 0 in this context make sense. Or at least in my initial reading: arbitrarily-chosen values that are in a decent range to work quickly with (akin to why people often work in percentages instead of 0..1)
It is—I’m going to say “often”, although I am aware this is suboptimal phrasing—often the case that you are confident in the sign of an outcome but not the magnitude of the outcome.
As such, you can often end up with discontinuities at zero.
Dropping the entire probability distribution of outcomes through your utility function doesn’t even necessarily have a closed-form result. In a universe where computation itself is a cost, finding a cheaper heuristic (and working through if said heuristic has any particular basis or problems) can be valuable.
The heuristic in the grandparent comment is just what happens if you are simultaneously very confident in the sign of positive results, and have very little confidence in the magnitude of negative results.
It is often the case that you are confident in the sign of an outcome but not the magnitude of the outcome.
This heuristic is what happens if you are simultaneously very confident in the sign of positive results, and have very little confidence in the magnitude of negative results.
I’m not sure I understand. If the lever is +100 in world A and −90 in world B, it seems like a good bet if you don’t know which world you’re in. Or is that what you mean by “acceptably small amount of disvalue”?
Obviously there are considerations downstream of articulating this, one is that when P(A)>P(B) but V(LA|A)<V(LB|B) so it’s reasonable to hedge on ending up in world B even though it’s not strictly more probable than ending up in world A.