existential self-determination is a problem i have pondered about for a while (1, 2). in this post, i talk about how i’ve come to think about it since my shift towards rejecting continuous identity, and tentatively embracing trusting my meta-values.
here’s the current view: among the set of viable moral-patient-instants (hereby “MPI”), current me has some values about which ones to instantiate. notably:
i want something like a “future me” to come into existence
i might have other MPIs that i personally want to come into existence
i want other MPIs to, just as much as me, have their wishes about which future MPIs (including future themselves) come into existence
when the AI weighs those against various constraints like computational cost or conflict resolution, according to whatever set of meta-values it’s aligned to, it can figure out what next set of MPIs to spawn. note that it’s not clear how it is to be determined what an MPIs values are with regards to that; this is where difficulty remains.
(one of those constraints is that we should probly only create future MPIs which would retroactively consent to exist. i’m hopeful this is the case for my own future selves: i would want to create a future self/future selves that are reasonably aligned with my current self, and my current values include that i’m pretty happy about existing — or so i believe, at least. evaluating that would ultimately be up to the AI, of course.)
note that this framework doesn’t embed a fundamental notion of continuous identity: the AI just looks at the values it’s aligned to — hopefully those entail satisfying the values of currently existing MPIs — and satisfies those values in whatever way they want, including what new MPIs should exist. any notion of “continuous identity” is merely built inside those MPIs.
a typical notion of “continuous person” would be a particular case of sequences of MPIs generally valuing the existence of future instances in the sequence; but that’s just one set of values among others, and other perspectives on individualism could be satisfied as well in the same future.
in fact this framework about which set of future MPIs we’d want to instantiate, describing not just which minds are instantiated but what environment they get to experience — including interactions with other MPIs — seems like it might be a sufficient foundation for AI-satisfied values in general. that is to say: it might be the case that any kind of meaningful values would be reasonably encodable as answers to the question “what next set of MPIs should be instantiated?”. or, put another way, that might be the type that a utility function would take.
such a foundation does rule out caring about non-moral-patient material things: you can’t want The Moon to be painted green; at most, you can want everyone to perceive A Moon as green. but, by way of embracing computational materialism, i kind of already hold this position — the ultimate point of importance is MPIs, and caring “radiates outwards” from those.
What examples of (meaningless) values are not answers to “What next set of MPIs should be instantiated?”
wanting the moon to be green even when no moral patient is looking; or more generally, having any kind of preference about which computations which don’t causate onto any moral patient are run.