And to this I reply: Obviously, the measuring units of dignity are over humanity’s log odds of survival—the graph on which the logistic success curve is a straight line. A project that doubles humanity’s chance of survival from 0% to 0% is helping humanity die with one additional information-theoretic bit of dignity.
Joking aside, this sort of objective function is interesting, and incoherent due to being non-VNM. E.g. if there’s a lottery between 0.1% chance of survival and 1% chance of survival, then how this lottery compares to a flat 0.5% chance of survival depends on the order in which the lottery is resolved. A priori, (50% of 0.1%, 50% of 1%) is equivalent to 0.55%, which is greater than 0.5%. On the other hand, the average log-odds (after selecting an element of this lottery) is 0.5 * log(0.1%) + 0.5 * log(1%) < log(0.5%).
This could lead to “negative VOI” situations where we avoid learning facts relevant to survival probability, because they would increase the variance of our odds, and that reduces expected log-odds since log is convex.
It’s also unclear whether to treat different forms of uncertainty differently, e.g. is logical uncertainty treated differently from indexical/quantum uncertainty?
This could make sense as a way of evaluating policies chosen at exactly the present time, which would be equivalent to simply maximizing P(success). However, one has to be very careful with exactly how to evaluate odds to avoid VNM incoherence.
First-order play for log-probability over short-term time horizons, as a good idea in real life when probabilities are low, arises the same way as betting fractions of your bankroll arises as a good idea in real life, by:
expecting to have other future opportunities that look like a chance to play for log-odds gains,
not expecting to have future opportunities that look like a chance to play for lump-sum-of-probability gains,
and ultimately the horizon extending out to diminishing returns if you get that far.
That is, the pseudo-myopic version of your strategy is to bet fractions of your bankroll to win fractions of your bankroll. You don’t take a bet with 51% probability of doubling your bankroll and 49% probability of bankruptcy, if you expect more opportunities to bet later, there aren’t later opportunities that just give you lump-sum gains, and there’s a point beyond which money starts to saturate for you.
Hmm. It seems like if you really expected to be able to gain log-odds in expectation in repeated bets, you’d immediately update towards a high probability, due to conservation of expected evidence. But maybe a more causal/materialist model wouldn’t do this because it’s a fairly abstract consideration that doesn’t have obvious material support.
I see why “improve log-odds” is a nice heuristic for iteratively optimizing a policy towards greater chance of success, similar to the WalkSAT algorithm, which solves a constraint problem by changing variables around to reduce the number of violated constraints (even though the actual desideratum is to have no violated constraints); this is a way of “relaxing” the problem in a way that makes iterative hill-climbing-like approaches work more effectively.
Relatedly, some RL approaches give rewards for hitting non-victory targets in a game (e.g. number of levels cleared or key items gained), even if the eventual goal is to achieve a policy that beats the entire game.
I think possibly the key conceptual distinction you want to make is between short-term play and long-term play. If I deliberately assume an emotional stance, often a lot of the benefit to be gained therefrom is how it translates long-term correct play into myopic play for the emotional reward, assuming of course that the translation is correct. Long-term, you play for absolute probabilities. Short-term, you chase after “dignity”, aka stackable log-odds improvements, at least until you’re out of the curve’s basement.
I feel like this comment in particular is very clarifying with regards to the motivation of this stance. The benefit is that this imports recommendations of the ideal long-run policy into the short-run frame from which you’re actually acting.
I think that should maybe be in the post somewhere.
I had a similar thought. Also, in an expected value context it makes sense to pursue actions that succeed when your model is wrong and you are actually closer to the middle of the success curve, because if that’s the case you can increase our chances of survival more easily. In the logarithmic context doing so doesn’t make much sense, since your impact on the logistic odds is the same no matter where on the success curve you are.
Maybe this objective function (and the whole ethos of Death with Dignity) is way to justify working on alignment even if you think our chances of success are close to zero. Personally, I’m not compelled by it.
Do you think the decision heuristic Eliezer is (ambiguously jokingly) suggesting gives different policy recommendations from the more naive “maxipok” or not? If so, where might they differ? If not, what’s your guess as to why Eliezer worded the objective differently from Bostrom? Why involve log-probabilities at all?
I read this as being “maxipok”, with a few key extensions:
The ‘default’ probability of success is very low
There are lots of plans that look like they give some small-but-relatively-attractive probability of success, which are basically all fake / picked by motivated reasoning of “there has to be a plan.” (“If we cause WWIII, then there will be a 2% chance of aligning AI, right?”)
While there aren’t accessible plans that cause success all on their own, there probably are lots of accessible sub-plans which make it more likely that a surprising real plan could succeed. (“Electing a rationalist president won’t solve the problem on its own, but it does mean ‘letters from Einstein’ are more likely to work.”)
Joking aside, this sort of objective function is interesting, and incoherent due to being non-VNM. E.g. if there’s a lottery between 0.1% chance of survival and 1% chance of survival, then how this lottery compares to a flat 0.5% chance of survival depends on the order in which the lottery is resolved. A priori, (50% of 0.1%, 50% of 1%) is equivalent to 0.55%, which is greater than 0.5%. On the other hand, the average log-odds (after selecting an element of this lottery) is 0.5 * log(0.1%) + 0.5 * log(1%) < log(0.5%).
This could lead to “negative VOI” situations where we avoid learning facts relevant to survival probability, because they would increase the variance of our odds, and that reduces expected log-odds since log is convex.
It’s also unclear whether to treat different forms of uncertainty differently, e.g. is logical uncertainty treated differently from indexical/quantum uncertainty?
This could make sense as a way of evaluating policies chosen at exactly the present time, which would be equivalent to simply maximizing P(success). However, one has to be very careful with exactly how to evaluate odds to avoid VNM incoherence.
First-order play for log-probability over short-term time horizons, as a good idea in real life when probabilities are low, arises the same way as betting fractions of your bankroll arises as a good idea in real life, by:
expecting to have other future opportunities that look like a chance to play for log-odds gains,
not expecting to have future opportunities that look like a chance to play for lump-sum-of-probability gains,
and ultimately the horizon extending out to diminishing returns if you get that far.
That is, the pseudo-myopic version of your strategy is to bet fractions of your bankroll to win fractions of your bankroll. You don’t take a bet with 51% probability of doubling your bankroll and 49% probability of bankruptcy, if you expect more opportunities to bet later, there aren’t later opportunities that just give you lump-sum gains, and there’s a point beyond which money starts to saturate for you.
Hmm. It seems like if you really expected to be able to gain log-odds in expectation in repeated bets, you’d immediately update towards a high probability, due to conservation of expected evidence. But maybe a more causal/materialist model wouldn’t do this because it’s a fairly abstract consideration that doesn’t have obvious material support.
I see why “improve log-odds” is a nice heuristic for iteratively optimizing a policy towards greater chance of success, similar to the WalkSAT algorithm, which solves a constraint problem by changing variables around to reduce the number of violated constraints (even though the actual desideratum is to have no violated constraints); this is a way of “relaxing” the problem in a way that makes iterative hill-climbing-like approaches work more effectively.
Relatedly, some RL approaches give rewards for hitting non-victory targets in a game (e.g. number of levels cleared or key items gained), even if the eventual goal is to achieve a policy that beats the entire game.
I think possibly the key conceptual distinction you want to make is between short-term play and long-term play. If I deliberately assume an emotional stance, often a lot of the benefit to be gained therefrom is how it translates long-term correct play into myopic play for the emotional reward, assuming of course that the translation is correct. Long-term, you play for absolute probabilities. Short-term, you chase after “dignity”, aka stackable log-odds improvements, at least until you’re out of the curve’s basement.
I feel like this comment in particular is very clarifying with regards to the motivation of this stance. The benefit is that this imports recommendations of the ideal long-run policy into the short-run frame from which you’re actually acting.
I think that should maybe be in the post somewhere.
I had a similar thought. Also, in an expected value context it makes sense to pursue actions that succeed when your model is wrong and you are actually closer to the middle of the success curve, because if that’s the case you can increase our chances of survival more easily. In the logarithmic context doing so doesn’t make much sense, since your impact on the logistic odds is the same no matter where on the success curve you are.
Maybe this objective function (and the whole ethos of Death with Dignity) is way to justify working on alignment even if you think our chances of success are close to zero. Personally, I’m not compelled by it.
Measuring units and utilons are different, right? I measure my wealth in dollars but that doesn’t mean my utility function is linear in dollars.
Do you think the decision heuristic Eliezer is (ambiguously jokingly) suggesting gives different policy recommendations from the more naive “maxipok” or not? If so, where might they differ? If not, what’s your guess as to why Eliezer worded the objective differently from Bostrom? Why involve log-probabilities at all?
I read this as being “maxipok”, with a few key extensions:
The ‘default’ probability of success is very low
There are lots of plans that look like they give some small-but-relatively-attractive probability of success, which are basically all fake / picked by motivated reasoning of “there has to be a plan.” (“If we cause WWIII, then there will be a 2% chance of aligning AI, right?”)
While there aren’t accessible plans that cause success all on their own, there probably are lots of accessible sub-plans which make it more likely that a surprising real plan could succeed. (“Electing a rationalist president won’t solve the problem on its own, but it does mean ‘letters from Einstein’ are more likely to work.”)