What about The Lifespan Dilemma and Pascal’s Mugging?
These are really only problems for agents with unbounded utility functions. This is a great example of over-theorizing without considering practical computational limitations. If your AI design requires double (or even much higher) precision arithmetic just to evaluate it’s internal utility functions, you have probably already failed.
Consider the extreme example of bounded utility functions: 1-bit utilities. A 1-bit utility function can only categorize futures into two possible shades: good or bad. This by itself is not a crippling limitation if the AI considers a large number of potential futures and computes a final probability-weighted decision score with higher precision. For example when considering two action paths A and B, a monte carlo design could evaluate out a couple hundred futures branching from A and B, assign each a 0 or 1, and then add them up into a tally requiring precision proportional to the number of futures evaluated (in this case, around 8-bit).
This extremely bounded design would need to do far more future-simulation to compensate for it’s extremely low-granularity utility rankings: for example when playing chess, it could only categorize board states as ‘likely win’ or ‘likely loss’. Thus it would need to have a higher ply-depth than algos that use higher-bit depth evaluations. But even so, this only amounts to a performance efficiency disadvantage, not a fundamental limitation.
If we extrapolate a 1-bit friendly AI to planning humanity’s future, it would collapse all futures that humanity found largely ‘desirable’ into the same score of 1, with everything else being 0. If it’s utility classifier and future modelling is powerful enough this design can still work.
And curiously a 1-bit utility function gives more intuitively reasonable results in the Lifespan Dilemma or Pascal’s Mugging. Assuming dying in an hour is a 0-utility outcome and living for at least a billion years is a 1, it would never take any wagers increasing it’s probability of death. And it would be just as un-susceptible to Pascal Mugging.
Just to be clear, I’m not really advocating simplistic 1-bit utilities. What is clear is that human’s internal utility evaluations are bounded. This probably comes from practical computational limitations, but likely future AI’s will also have practical limitations.
Bounded utilities—especially strongly bounded ones like your 1-bit probability-weighted utility function—give you outcomes that depend crucially on the probability of a world-state’s human-relative improvement versus the probability of degeneration. Once a maximal state has been reached, the agent has an incentive to further improve it if and only if that makes the maintenance of the state more likely. That’s not really a bad outcome if we’ve chosen our utility terms well (i.e. not foolishly ignored the hedonic treadmill or something), but it’s substantially less awesome than it could be; I suspect that after a certain point, probably easily achievable by a superintelligence, the probability mass would shift from favoring a development to a maintenance mode.
The first thing that comes to mind is a scenario like setting up Earth as a nature preserve and eating the rest of the galaxy for backup feedstock and as insurance against astronomical-level destructive events. That’s an unlikely outcome—I’m already thinking of ways it could be improved upon—but it ought to serve to illustrate the general point.
Once a maximal state has been reached, the agent has an incentive to further improve it if and only if that makes the maintenance of the state more likely.
This is true, but much depends on what is considered a ‘maximal state’. If our 1-bit utility superintelligence predicts future paths all the way to the possible end states of the universe, then it isn’t necessarily susceptible to getting stuck in maintenance states along the way. It all depends on what sub-set of future paths we classify as ‘good’.
Also keep in mind that the 1-bit utility model still rates entire future paths, not just end future states. So let’s say for example that we are really picky and we only want Tipler Omega Point end-states. If that is all we specify, then the SuperIntelligence may take us through a path that involves killing off most of humanity. However, we can avoid that by adding further constraints on the entire path: assigning 1 to future paths that end in the Omega Point but also satisfy some arbitrary list of constraints along the way. Again this is probably not the best type of utility model, but the weakness of 1-bit bounded utility is not that it tends to get stuck in maintenance mode for all utility models.
The failure in 1-bit utility is more in the specificity vs feasibility tradeoff. If we make the utility model very narrow and it turns out that the paths we want are unattainable, then the superintelligence will gleefully gamble everything and risks losing the future. For example the SI which only seeks specific Omega Point futures may eat the moon, think for a bit, and determine that even in the best actions sequences, it only has a 10^-99 of winning (according to it’s narrow OP criteria). In this case it won’t ‘fall back’ to some other more realistic but still quite awesome outcome, no it will still proceed to transform the universe in an attempt to achieve the OP, no matter how impossible. Unless of course there is some feedback mechanism with humans and utility model updating, but that amounts to circumventing the 1-bit utility idea.
These are really only problems for agents with unbounded utility functions. This is a great example of over-theorizing without considering practical computational limitations. If your AI design requires double (or even much higher) precision arithmetic just to evaluate it’s internal utility functions, you have probably already failed.
Consider the extreme example of bounded utility functions: 1-bit utilities. A 1-bit utility function can only categorize futures into two possible shades: good or bad. This by itself is not a crippling limitation if the AI considers a large number of potential futures and computes a final probability-weighted decision score with higher precision. For example when considering two action paths A and B, a monte carlo design could evaluate out a couple hundred futures branching from A and B, assign each a 0 or 1, and then add them up into a tally requiring precision proportional to the number of futures evaluated (in this case, around 8-bit).
This extremely bounded design would need to do far more future-simulation to compensate for it’s extremely low-granularity utility rankings: for example when playing chess, it could only categorize board states as ‘likely win’ or ‘likely loss’. Thus it would need to have a higher ply-depth than algos that use higher-bit depth evaluations. But even so, this only amounts to a performance efficiency disadvantage, not a fundamental limitation.
If we extrapolate a 1-bit friendly AI to planning humanity’s future, it would collapse all futures that humanity found largely ‘desirable’ into the same score of 1, with everything else being 0. If it’s utility classifier and future modelling is powerful enough this design can still work.
And curiously a 1-bit utility function gives more intuitively reasonable results in the Lifespan Dilemma or Pascal’s Mugging. Assuming dying in an hour is a 0-utility outcome and living for at least a billion years is a 1, it would never take any wagers increasing it’s probability of death. And it would be just as un-susceptible to Pascal Mugging.
Just to be clear, I’m not really advocating simplistic 1-bit utilities. What is clear is that human’s internal utility evaluations are bounded. This probably comes from practical computational limitations, but likely future AI’s will also have practical limitations.
Bounded utilities—especially strongly bounded ones like your 1-bit probability-weighted utility function—give you outcomes that depend crucially on the probability of a world-state’s human-relative improvement versus the probability of degeneration. Once a maximal state has been reached, the agent has an incentive to further improve it if and only if that makes the maintenance of the state more likely. That’s not really a bad outcome if we’ve chosen our utility terms well (i.e. not foolishly ignored the hedonic treadmill or something), but it’s substantially less awesome than it could be; I suspect that after a certain point, probably easily achievable by a superintelligence, the probability mass would shift from favoring a development to a maintenance mode.
The first thing that comes to mind is a scenario like setting up Earth as a nature preserve and eating the rest of the galaxy for backup feedstock and as insurance against astronomical-level destructive events. That’s an unlikely outcome—I’m already thinking of ways it could be improved upon—but it ought to serve to illustrate the general point.
This is true, but much depends on what is considered a ‘maximal state’. If our 1-bit utility superintelligence predicts future paths all the way to the possible end states of the universe, then it isn’t necessarily susceptible to getting stuck in maintenance states along the way. It all depends on what sub-set of future paths we classify as ‘good’.
Also keep in mind that the 1-bit utility model still rates entire future paths, not just end future states. So let’s say for example that we are really picky and we only want Tipler Omega Point end-states. If that is all we specify, then the SuperIntelligence may take us through a path that involves killing off most of humanity. However, we can avoid that by adding further constraints on the entire path: assigning 1 to future paths that end in the Omega Point but also satisfy some arbitrary list of constraints along the way. Again this is probably not the best type of utility model, but the weakness of 1-bit bounded utility is not that it tends to get stuck in maintenance mode for all utility models.
The failure in 1-bit utility is more in the specificity vs feasibility tradeoff. If we make the utility model very narrow and it turns out that the paths we want are unattainable, then the superintelligence will gleefully gamble everything and risks losing the future. For example the SI which only seeks specific Omega Point futures may eat the moon, think for a bit, and determine that even in the best actions sequences, it only has a 10^-99 of winning (according to it’s narrow OP criteria). In this case it won’t ‘fall back’ to some other more realistic but still quite awesome outcome, no it will still proceed to transform the universe in an attempt to achieve the OP, no matter how impossible. Unless of course there is some feedback mechanism with humans and utility model updating, but that amounts to circumventing the 1-bit utility idea.