But one could also think that the disvalue of extinction is more continuous with disvalue in non-extinction scenarios, which makes things a bit more tricky.
I’m happy to use continuous notions (and that’s what I was doing in my original comment) as long as “half the cost” means “you update such that the expected costs of misalignment according to your probability distribution over the future are halved”. One simple way to imagine this update is to take all the worlds where there was any misalignment, halve their probability, and distribute the extra probability mass to worlds with zero costs of misalignment. At which point I reason “well, 10% extinction changes to 5% extinction, I don’t need to know anything else to know that I’m still going to work on alignment, and given that, none of my actions are going to change (since the relative probabilities of different misalignment failure scenarios remain the same, which is what determines my actions within alignment)”.
I got the sense from your previous comment that you wanted me to imagine some different form of update and I was trying to figure out what.
Cool, that all makes sense.
I’m happy to use continuous notions (and that’s what I was doing in my original comment) as long as “half the cost” means “you update such that the expected costs of misalignment according to your probability distribution over the future are halved”. One simple way to imagine this update is to take all the worlds where there was any misalignment, halve their probability, and distribute the extra probability mass to worlds with zero costs of misalignment. At which point I reason “well, 10% extinction changes to 5% extinction, I don’t need to know anything else to know that I’m still going to work on alignment, and given that, none of my actions are going to change (since the relative probabilities of different misalignment failure scenarios remain the same, which is what determines my actions within alignment)”.
I got the sense from your previous comment that you wanted me to imagine some different form of update and I was trying to figure out what.