Bounded utilities—especially strongly bounded ones like your 1-bit probability-weighted utility function—give you outcomes that depend crucially on the probability of a world-state’s human-relative improvement versus the probability of degeneration. Once a maximal state has been reached, the agent has an incentive to further improve it if and only if that makes the maintenance of the state more likely. That’s not really a bad outcome if we’ve chosen our utility terms well (i.e. not foolishly ignored the hedonic treadmill or something), but it’s substantially less awesome than it could be; I suspect that after a certain point, probably easily achievable by a superintelligence, the probability mass would shift from favoring a development to a maintenance mode.
The first thing that comes to mind is a scenario like setting up Earth as a nature preserve and eating the rest of the galaxy for backup feedstock and as insurance against astronomical-level destructive events. That’s an unlikely outcome—I’m already thinking of ways it could be improved upon—but it ought to serve to illustrate the general point.
Once a maximal state has been reached, the agent has an incentive to further improve it if and only if that makes the maintenance of the state more likely.
This is true, but much depends on what is considered a ‘maximal state’. If our 1-bit utility superintelligence predicts future paths all the way to the possible end states of the universe, then it isn’t necessarily susceptible to getting stuck in maintenance states along the way. It all depends on what sub-set of future paths we classify as ‘good’.
Also keep in mind that the 1-bit utility model still rates entire future paths, not just end future states. So let’s say for example that we are really picky and we only want Tipler Omega Point end-states. If that is all we specify, then the SuperIntelligence may take us through a path that involves killing off most of humanity. However, we can avoid that by adding further constraints on the entire path: assigning 1 to future paths that end in the Omega Point but also satisfy some arbitrary list of constraints along the way. Again this is probably not the best type of utility model, but the weakness of 1-bit bounded utility is not that it tends to get stuck in maintenance mode for all utility models.
The failure in 1-bit utility is more in the specificity vs feasibility tradeoff. If we make the utility model very narrow and it turns out that the paths we want are unattainable, then the superintelligence will gleefully gamble everything and risks losing the future. For example the SI which only seeks specific Omega Point futures may eat the moon, think for a bit, and determine that even in the best actions sequences, it only has a 10^-99 of winning (according to it’s narrow OP criteria). In this case it won’t ‘fall back’ to some other more realistic but still quite awesome outcome, no it will still proceed to transform the universe in an attempt to achieve the OP, no matter how impossible. Unless of course there is some feedback mechanism with humans and utility model updating, but that amounts to circumventing the 1-bit utility idea.
Bounded utilities—especially strongly bounded ones like your 1-bit probability-weighted utility function—give you outcomes that depend crucially on the probability of a world-state’s human-relative improvement versus the probability of degeneration. Once a maximal state has been reached, the agent has an incentive to further improve it if and only if that makes the maintenance of the state more likely. That’s not really a bad outcome if we’ve chosen our utility terms well (i.e. not foolishly ignored the hedonic treadmill or something), but it’s substantially less awesome than it could be; I suspect that after a certain point, probably easily achievable by a superintelligence, the probability mass would shift from favoring a development to a maintenance mode.
The first thing that comes to mind is a scenario like setting up Earth as a nature preserve and eating the rest of the galaxy for backup feedstock and as insurance against astronomical-level destructive events. That’s an unlikely outcome—I’m already thinking of ways it could be improved upon—but it ought to serve to illustrate the general point.
This is true, but much depends on what is considered a ‘maximal state’. If our 1-bit utility superintelligence predicts future paths all the way to the possible end states of the universe, then it isn’t necessarily susceptible to getting stuck in maintenance states along the way. It all depends on what sub-set of future paths we classify as ‘good’.
Also keep in mind that the 1-bit utility model still rates entire future paths, not just end future states. So let’s say for example that we are really picky and we only want Tipler Omega Point end-states. If that is all we specify, then the SuperIntelligence may take us through a path that involves killing off most of humanity. However, we can avoid that by adding further constraints on the entire path: assigning 1 to future paths that end in the Omega Point but also satisfy some arbitrary list of constraints along the way. Again this is probably not the best type of utility model, but the weakness of 1-bit bounded utility is not that it tends to get stuck in maintenance mode for all utility models.
The failure in 1-bit utility is more in the specificity vs feasibility tradeoff. If we make the utility model very narrow and it turns out that the paths we want are unattainable, then the superintelligence will gleefully gamble everything and risks losing the future. For example the SI which only seeks specific Omega Point futures may eat the moon, think for a bit, and determine that even in the best actions sequences, it only has a 10^-99 of winning (according to it’s narrow OP criteria). In this case it won’t ‘fall back’ to some other more realistic but still quite awesome outcome, no it will still proceed to transform the universe in an attempt to achieve the OP, no matter how impossible. Unless of course there is some feedback mechanism with humans and utility model updating, but that amounts to circumventing the 1-bit utility idea.