There is a problem that, other things equal, agents that care about the state of the world in the distant future, to the exclusion of everything else, will outcompete agents that lack that property. This is self-evident, because we can operationalize “outcompete” as “have more effect on the state of the world in the distant future”.
I am not sure about that!
One way this argument could fail: maybe agents who care exclusively about the state of the world in the distant future end up, as part of their optimizing, creating other agents who care in different ways from that.
In that case, they would “have more effect on the state of the world in the distant future”, but they might not “outcompete” other agents (in the common-sensical way of understanding “outcompete”).
A person might think this implausible, because they might think that a smart agent who cares exclusively about X can best achieve X by having all minds they create also be [smart agents who care exclusively about X.
But, I’m not sure this is true, basically for reasons of not trusting assumptions (1), (2), (3), and (4) that I listed here.
(As one possible sketch: a mind whose only goal is to map branch B of mathematics might find it instrumentally useful to map a bunch of other branches of mathematics. And, since supervision is not free, it might be more able to do this efficiently if it creates researchers who have an intrinsic interest in math-in-general, and who are not being fully supervised by exclusively-B-interested minds.)
To complement that list, Superintelligence chapter 7 lists four types of “situations in which an agent can best fulfill its final goals by intentionally changing them” (which is pretty similar to your “creating other agents who care in different ways from that”):
“social signaling” & “social preferences”—basically, maybe there are other powerful agents around who possess some mind-reading capability, including your (1c)
“preferences concerning own goal content” (“for example, the agent might have a final goal to become the type of agent that is motivated by certain values rather than others (such as compassion rather than comfort)”)
“storage [or processing] costs”, which we should probably broaden to ‘practical considerations about the algorithm actually working well in practice’, and then it would probably include your mathematician example and your (1a, 1b, 2, 4).
Your (3) would be kinda “maybe there was never a so-called ‘final goal’ in the first place”, which is a bit related to the second bullet point, or maybe we should just say that Bostrom overlooks it. (Or maybe he talks about it somewhere else in the book? I forget.)
I’d guess that the third bullet point is less likely to be applicable to powerful AGIs, than to humans. For example, I expect that AGIs will be able to self-modify in ways that are difficult for humans (e.g. there’s no magic-bullet super-Adderall for humans), which impacts the likelihood of your (1a).
“For example, I expect that AGIs will be able to self-modify in ways that are difficult for humans (e.g. there’s no magic-bullet super-Adderall for humans), which impacts the likelihood of your (1a).”
My (1a) (and related (1b)), for reference:
(1a) “You” (the decision-maker process we are modeling) can choose anything you like, without risk of losing control of your hardware. (Contrast case: if the ruler of a country chooses unpopular policies, they are sometimes ousted. If a human chooses dieting/unrewarding problems/social risk, they sometimes lose control of themselves.)
1b) There are no costs to maintaining control of your mind/hardware. (Contrast case: if a company hires some brilliant young scientists to be creative on its behalf, it often has to pay a steep overhead if it additionally wants to make sure those scientists don’t disrupt its goals/beliefs/normal functioning.)
I’m happy to posit an AGI with powerful ability to self-modify. But, even so, my (nonconfident) guess is that it won’t have property (1a), at least not costlessly.
My admittedly handwavy reasoning:
Self-modification doesn’t get you all powers: some depend on the nature of physics/mathematics. E.g. it may still be that verifying a proof is easier than generating a proof, for our AGI.
Intelligence involves discovering new things, coming into contact with what we don’t specifically expect (that’s why we bother to spend compute on it). Let’s assume our powerful AGI is still coming into contact with novel-to-it mathematics/empirics/neat stuff. Questions are: is it (possible at all / possible at costs worth paying) to anticipate enough about what it will uncover that it can prevent the new things from destablilizing its centralized goals/plans/[“utility function” if it has one]? I… am really not sure what the answers to these questions are, even for powerful AGI that has powerfully self-modified! There are maybe alien-to-it AGIs out there encoded in mathematics, waiting to boot up within it as it does its reasoning.
I am not sure about that!
One way this argument could fail: maybe agents who care exclusively about the state of the world in the distant future end up, as part of their optimizing, creating other agents who care in different ways from that.
In that case, they would “have more effect on the state of the world in the distant future”, but they might not “outcompete” other agents (in the common-sensical way of understanding “outcompete”).
A person might think this implausible, because they might think that a smart agent who cares exclusively about X can best achieve X by having all minds they create also be [smart agents who care exclusively about X.
But, I’m not sure this is true, basically for reasons of not trusting assumptions (1), (2), (3), and (4) that I listed here.
(As one possible sketch: a mind whose only goal is to map branch B of mathematics might find it instrumentally useful to map a bunch of other branches of mathematics. And, since supervision is not free, it might be more able to do this efficiently if it creates researchers who have an intrinsic interest in math-in-general, and who are not being fully supervised by exclusively-B-interested minds.)
To complement that list, Superintelligence chapter 7 lists four types of “situations in which an agent can best fulfill its final goals by intentionally changing them” (which is pretty similar to your “creating other agents who care in different ways from that”):
“social signaling” & “social preferences”—basically, maybe there are other powerful agents around who possess some mind-reading capability, including your (1c)
“preferences concerning own goal content” (“for example, the agent might have a final goal to become the type of agent that is motivated by certain values rather than others (such as compassion rather than comfort)”)
“storage [or processing] costs”, which we should probably broaden to ‘practical considerations about the algorithm actually working well in practice’, and then it would probably include your mathematician example and your (1a, 1b, 2, 4).
Your (3) would be kinda “maybe there was never a so-called ‘final goal’ in the first place”, which is a bit related to the second bullet point, or maybe we should just say that Bostrom overlooks it. (Or maybe he talks about it somewhere else in the book? I forget.)
I’d guess that the third bullet point is less likely to be applicable to powerful AGIs, than to humans. For example, I expect that AGIs will be able to self-modify in ways that are difficult for humans (e.g. there’s no magic-bullet super-Adderall for humans), which impacts the likelihood of your (1a).
Steven Brynes wrotes:
My (1a) (and related (1b)), for reference:
I’m happy to posit an AGI with powerful ability to self-modify. But, even so, my (nonconfident) guess is that it won’t have property (1a), at least not costlessly.
My admittedly handwavy reasoning:
Self-modification doesn’t get you all powers: some depend on the nature of physics/mathematics. E.g. it may still be that verifying a proof is easier than generating a proof, for our AGI.
Intelligence involves discovering new things, coming into contact with what we don’t specifically expect (that’s why we bother to spend compute on it). Let’s assume our powerful AGI is still coming into contact with novel-to-it mathematics/empirics/neat stuff. Questions are: is it (possible at all / possible at costs worth paying) to anticipate enough about what it will uncover that it can prevent the new things from destablilizing its centralized goals/plans/[“utility function” if it has one]? I… am really not sure what the answers to these questions are, even for powerful AGI that has powerfully self-modified! There are maybe alien-to-it AGIs out there encoded in mathematics, waiting to boot up within it as it does its reasoning.