fwiw that strange philosophical bullet fits remarkably well with a set of thoughts I had while reading Anthropic Bias about ‘amount of existence’ being the fundamental currency of reality (a bunch of the anthropic paradoxes felt like they were showing that if you traded sufficiently large amounts of “patterns like me exist more” then you could get counterintuitive results like bending the probabilities of the world around you without any causal pathway), and infraBayes requiring it actually updated me a little towards infraBayes being on the right track.
My model of why humans seem to prefer non-existence to existence in some cases is that our ancestors faced situations which could reduce their ability to self-propagate to almost zero, and needed to avoid these really hard. Evolution gave us training signals which can easily generate subagents which are single-mindedly obsessed with avoiding certain kinds of intense suffering. This motivates us to avoid a wide range of realistic things which cost us existence, but as a side-effect of being emphasized so much make it possible to tip into suicidality in cases where, in our history, it was not too costly because things were bad enough anyway that the agent wouldn’t propagate much (suicide when the cues for self-propagation being relatively likely for on-distribution humans should have been weeded out). This strikes me as unintended and a result of a hack which works pretty well on-distribution, and likely not reflectively consistent in the limit. An evolution which could generate brains with unbounded compute would not make agents which ever preferred suicide or non-existence.
Another angle on this is thinking of evolution having set things up for a sign-flipped subagent to be reinforced, which just wants to Not Be. This is not a natural shape for an agent to be, but it’s useful enough that the pattern to generate it is common.
This is all pretty handwave-y and I don’t claim high confidence that it’s correct or useful, but might be interesting babble.
fwiw that strange philosophical bullet fits remarkably well with a set of thoughts I had while reading Anthropic Bias about ‘amount of existence’ being the fundamental currency of reality (a bunch of the anthropic paradoxes felt like they were showing that if you traded sufficiently large amounts of “patterns like me exist more” then you could get counterintuitive results like bending the probabilities of the world around you without any causal pathway), and infraBayes requiring it actually updated me a little towards infraBayes being on the right track.
My model of why humans seem to prefer non-existence to existence in some cases is that our ancestors faced situations which could reduce their ability to self-propagate to almost zero, and needed to avoid these really hard. Evolution gave us training signals which can easily generate subagents which are single-mindedly obsessed with avoiding certain kinds of intense suffering. This motivates us to avoid a wide range of realistic things which cost us existence, but as a side-effect of being emphasized so much make it possible to tip into suicidality in cases where, in our history, it was not too costly because things were bad enough anyway that the agent wouldn’t propagate much (suicide when the cues for self-propagation being relatively likely for on-distribution humans should have been weeded out). This strikes me as unintended and a result of a hack which works pretty well on-distribution, and likely not reflectively consistent in the limit. An evolution which could generate brains with unbounded compute would not make agents which ever preferred suicide or non-existence.
Another angle on this is thinking of evolution having set things up for a sign-flipped subagent to be reinforced, which just wants to Not Be. This is not a natural shape for an agent to be, but it’s useful enough that the pattern to generate it is common.
This is all pretty handwave-y and I don’t claim high confidence that it’s correct or useful, but might be interesting babble.