What would an AI that ‘cares’ in the sense you spoke of be able to do to address this problem that a non-‘caring’ one wouldn’t?
DefectiveAlgorithm
Kind of. I wouldn’t defect against my copy without his consent, but I would want the pool trimmed down to only a single version of myself (ideally whichever one had the highest expected future utility, all else equal). The copy, being a copy, should want the same thing. The only time I wouldn’t be opposed to the existence of multiple instances of myself would be if those instances could regularly synchronize their memories and experiences (and thus constitute more a single distributed entity with mere synchronization delays than multiple diverging entities).
Leaving aside other matters, what does it matter if an FAI ‘cares’ in the sense that humans do so long as its actions bring about high utility from a human perspective?
This post starts off on a rather spoiler-ish note.
My first thought (in response to the second question) is ‘immediately terminate myself, leaving the copy as the only valid continuation of my identity’.
Of course, it is questionable whether I would have the willpower to go through with it. I believe that my copy’s mind would constitute just as ‘real’ a continuation of my consciousness as would my own mind following a procedure that removed the memories of the past few days (or however long since the split) whilst leaving all else intact (which is of course just a contrived-for-the-sake-of-the-thought-experiment variety of the sort of forgetting that we undergo all the time), but I have trouble alieving it.
Even leaving aside the matters of ‘permission’ (which lead into awkward matters of informed consent) as well as the difficulties of defining concepts like ‘people’ and ‘property’, define ‘do things to X’. Every action affects others. If you so much as speak a word, you’re causing others to undergo the experience of hearing that word spoken. For an AGI, even thinking draws a miniscule amount of electricity from the power grid, which has near-negligible but quantifiable effects on the power industry which will affect humans in any number of different ways. If you take chaos theory seriously, you could take this even further. It may seem obvious to a human that there’s a vast difference between innocuous actions like those in the above examples and those that are potentially harmful, but lots of things are intuitively obvious to humans and yet turn out to be extremely difficult to precisely quantify, and this seems like just such a case.
I know what terminal values are and I apologize if the intent behind my question was unclear. To clarify, my request was specifically for a definition in the context of human beings—that is, entities with cognitive architectures with no explicitly defined utility functions and with multiple interacting subsystems which may value different things (ie. emotional vs deliberative systems). I’m well aware of the huge impact my emotional subsystem has on my decision making. However, I don’t consider it ‘me’ - rather, I consider it an external black box which interacts very closely with that which I do identify as me (mostly my deliberative system). I can acknowledge the strong influence it has on my motivations whilst explicitly holding a desire that this not be so, a desire which would in certain contexts lead me to knowingly make decisions that would irreversibly sacrifice a significant portion of my expected future pleasure.
To follow up on my initial question, it had been intended to lay the groundwork for this followup: What empirical claims do you consider yourself to be making about the jumble of interacting systems that is the human cognitive architecture when you say that the sole ‘actual’ terminal value of a human is pleasure?
Can you define ‘terminal values’, in the context of human beings?
If the universe is infinite, then there are infinitely many copies of me, following the same algorithm
Does this follow? The set of computable functions is infinite, but has no duplicate elements.
“Comments (1)”
“There doesn’t seem to be anything here.”
????
I think this should get better and better for P1 the closer P1 gets to (2/3)C (1/3)B (without actually reaching it).
How so?
I do think ‘a disagreement on utility calculations’ may indeed be a big part of it. Are you a total utilitarian? I’m not. A big part of that comes from the fact that I don’t consider two copies of myself to be intrinsically more valuable than one—perhaps instrumentally valuable, if those copies can interact, sync their experiences and cooperate, but that’s another matter. With experience-syncing, I am mostly indifferent to the number of copies of myself to exist (leaving aside potential instrumental benefits), but without it I evaluate decreasing utility as the number of copies increases, as I assign zero terminal value to multiplicity but positive terminal value to the uniqueness of my identity.
My brand of utilitarianism is informed substantially by these preferences. I adhere to neither average nor total utilitarianism, but I lean closer to average. Whilst I would be against the use of force to turn a population of 10 with X utility each into a population of 3 with (X + 1) utility each, I would in isolation consider the latter preferable to the former (there is no inconsistency here—my utility function simply admits information about the past).
Well, ok, but if you agree with this then I don’t see how you can claim that such a system would be particularly useful for solving FAI problems.
Ok, but a system like you’ve described isn’t likely to think about what you want it to think about or produce output that’s actually useful to you either.
an Oracle AI you can trust
That’s a large portion of the FAI problem right there.
EDIT: To clarify, by this I don’t mean to imply that FAI is easy, but that (trustworthy) Oracle AI is hard.
No. Clippy cannot be persuaded away from paperclipping because maximizing paperclips is its only terminal goal.
If acquiring bacon was your ONLY terminal goal, then yes, it would be irrational not to do absolutely everything you could to maximize your expected bacon. However, most people have more than just one terminal goal. You seem to be using ‘terminal goal’ to mean ‘a goal more important than any other’. Trouble is, no one else is using it this way.
EDIT: Actually, it seems to me that you’re using ‘terminal goal’ to mean something analogous to a terminal node in a tree search (if you can reach that node, you’re done). No one else is using it that way either.
Consider an agent trying to maximize its Pacman score. ‘Getting a high Pacman score’ is a terminal goal for this agent—it doesn’t want a high score because that would make it easier for it to get something else, it simply wants a high score. On the other hand, ‘eating fruit’ is an instrumental goal for this agent—it only wants to eat fruit because that increases its expected score, and if eating fruit didn’t increase its expected score then it wouldn’t care about eating fruit.
That is the only difference between the two types of goals. Knowing that one of an agent’s goals is instrumental and another terminal doesn’t tell you which goal the agent values more.
Because I terminally value the uniqueness of my identity.