Congratulations, your AI has chosen a future where everyone is kept alive in a vegetative but well-protected state where their response to any stimulus is to say, “I am experiencing a level of happiness which saturates the safe upper bound”.
Note the reflexive consistency criterion. That’d only happen if everyone predictable looked at the happy monster and said ‘yep, that’s me, that agent speaks for me.’
Hrm. Doesn’t exactly roll off the tongue, does it? Let’s just call it a Reflexive Utility Maximizer (RUM), and call it a day. People have raised a few troubling points that I’d like to think more about before anyone takes anything too seriously, though. There may be a better way to do this, although I think something like this could be workable as a fallback plan.
It obtains a value for the overall utility of a possible world by adding the personal utilities of everyone in the world. But there is a bound. It’s unclear to me whether the bound applies directly to personal utilities—so that a personal utility exceeding the bound is reduced to the bound for the purposes of subsequent calculation—or whether the bound applies to the sum of personal utilities—so that if the overall utility of a possible world exceeds the bound, it is reduced to the bound for the purposes of decision-making (comparison between worlds).
If one of the people whose personal utilities gets summed, is a future continuation of an existing person (someone who exists at the time the FAI gets going), then the present-day person gets to say whether that is a future self of which they would approve.
The last part is the most underspecified aspect of the algorithm: how the approval-judgement is obtained, what form it takes, and how it affects the rest of the decision-making calculation. Is the FAI only to consider scenarios where future continuants of existing people are approved continuants, with any scenario containing an unapproved continuant just ruled out apriori? Or are there degrees of approval?
I think I will call my version (which probably deviates from your conception somewhere) a “Bounded Approved Utility Maximizer”. It’s still a dumb name, but it will have to do until we work our way to a greater level of clarity.
By bounded, I simply meant that all reported utilities are normalized to a universal range before being summed. Put another way, every person has a finite, equal fraction of the machine’s utility to distribute among possible future universes. This is entirely to avoid utility monsters. It’s basically a vote, and they can split it up however they like.
Also, the reflexive consistency criteria should probably be applied even to people who don’t exist yet. We don’t want plans to rely on creating new people, then turning them into happy monsters, even if it doesn’t impact the utility of people who already exist. So, basically, modify the reflexive utility criteria to say that in order for positive utility to be reported from a model, all past versions of that model (to some grain) must agree that they are a valid continuation of themselves.
I’ll need to think harder about how to actually implement the approval judgements. It really depends on how detailed the models we’re working with are (i.e. cable of realizing that they are a model). I’ll give it more thought and get back to you.
This matters more for initial conditions. A mature “FAI” might be like a cross between an operating system, a decision theory, and a meme, that’s present wherever sufficiently advanced cognition occurs; more like a pervasive culture than a centralized agent. Everyone would have a bit of BAUM in their own thought process.
Congratulations, your AI has chosen a future where everyone is kept alive in a vegetative but well-protected state where their response to any stimulus is to say, “I am experiencing a level of happiness which saturates the safe upper bound”.
Note the reflexive consistency criterion. That’d only happen if everyone predictable looked at the happy monster and said ‘yep, that’s me, that agent speaks for me.’
OK… I am provisionally adopting your scheme as a concrete scenario for how a FAI might decide. You need to give this decision procedure a name.
Reflexively Consistent Bounded Utility Maximizer?
Hrm. Doesn’t exactly roll off the tongue, does it? Let’s just call it a Reflexive Utility Maximizer (RUM), and call it a day. People have raised a few troubling points that I’d like to think more about before anyone takes anything too seriously, though. There may be a better way to do this, although I think something like this could be workable as a fallback plan.
Let me review the features of the algorithm:
The FAI maximizes overall utility.
It obtains a value for the overall utility of a possible world by adding the personal utilities of everyone in the world. But there is a bound. It’s unclear to me whether the bound applies directly to personal utilities—so that a personal utility exceeding the bound is reduced to the bound for the purposes of subsequent calculation—or whether the bound applies to the sum of personal utilities—so that if the overall utility of a possible world exceeds the bound, it is reduced to the bound for the purposes of decision-making (comparison between worlds).
If one of the people whose personal utilities gets summed, is a future continuation of an existing person (someone who exists at the time the FAI gets going), then the present-day person gets to say whether that is a future self of which they would approve.
The last part is the most underspecified aspect of the algorithm: how the approval-judgement is obtained, what form it takes, and how it affects the rest of the decision-making calculation. Is the FAI only to consider scenarios where future continuants of existing people are approved continuants, with any scenario containing an unapproved continuant just ruled out apriori? Or are there degrees of approval?
I think I will call my version (which probably deviates from your conception somewhere) a “Bounded Approved Utility Maximizer”. It’s still a dumb name, but it will have to do until we work our way to a greater level of clarity.
By bounded, I simply meant that all reported utilities are normalized to a universal range before being summed. Put another way, every person has a finite, equal fraction of the machine’s utility to distribute among possible future universes. This is entirely to avoid utility monsters. It’s basically a vote, and they can split it up however they like.
Also, the reflexive consistency criteria should probably be applied even to people who don’t exist yet. We don’t want plans to rely on creating new people, then turning them into happy monsters, even if it doesn’t impact the utility of people who already exist. So, basically, modify the reflexive utility criteria to say that in order for positive utility to be reported from a model, all past versions of that model (to some grain) must agree that they are a valid continuation of themselves.
I’ll need to think harder about how to actually implement the approval judgements. It really depends on how detailed the models we’re working with are (i.e. cable of realizing that they are a model). I’ll give it more thought and get back to you.
This matters more for initial conditions. A mature “FAI” might be like a cross between an operating system, a decision theory, and a meme, that’s present wherever sufficiently advanced cognition occurs; more like a pervasive culture than a centralized agent. Everyone would have a bit of BAUM in their own thought process.