If we’re talking about Darwinian contexts, systems which optimize for long-run power are in fact often selected out. Long-run benefit is of no utility unless short-term survival is taken care of, and long-run and short-term needs are often at odds.
So from a behavioral perspective I expect that you get systems which are optimizing for short-term survival. Indeed, I think this point is trivial and one which you probably agree with. What I’m saying is that short-term survival and long-run power are not necessarily correlated, and I think that is the crux.
Let’s take an example that is not rigorously worked out and is probably wrong in some details, but can serve to illustrate. Long-run power in humans is derivative of social structure: the leaders of the tribe control the tribe’s collective resources. If you want power in human society, you need to rise to the top of our social structures, and the optimal ways of doing that are generally not nice.
But why do we have social structures at all? Why are we organized as tribes? Because we are social animals who prefer the company of others. Being with others beats striking out on your own because, generally speaking, other people in the tribe are nice. Niceness creates an environment in which sycophantic power seeking pays off, but only because there is severe evolutionary pressure towards niceness in the first place. [In the environment which gave rise to humans, not as a general statement.]
Then shouldn’t such systems (which can surely recognize this argument) just take care of short term survival instrumentally? Maybe you’re making a claim about irrationality being likely or a claim that systems that care about long run benefit act in appararently myopic ways.
(Note that historically it was much harder to keep value stability/lock-in than it will be for AIs.)
If we’re talking about Darwinian contexts, systems which optimize for long-run power are in fact often selected out. Long-run benefit is of no utility unless short-term survival is taken care of, and long-run and short-term needs are often at odds.
So from a behavioral perspective I expect that you get systems which are optimizing for short-term survival. Indeed, I think this point is trivial and one which you probably agree with. What I’m saying is that short-term survival and long-run power are not necessarily correlated, and I think that is the crux.
Let’s take an example that is not rigorously worked out and is probably wrong in some details, but can serve to illustrate. Long-run power in humans is derivative of social structure: the leaders of the tribe control the tribe’s collective resources. If you want power in human society, you need to rise to the top of our social structures, and the optimal ways of doing that are generally not nice.
But why do we have social structures at all? Why are we organized as tribes? Because we are social animals who prefer the company of others. Being with others beats striking out on your own because, generally speaking, other people in the tribe are nice. Niceness creates an environment in which sycophantic power seeking pays off, but only because there is severe evolutionary pressure towards niceness in the first place. [In the environment which gave rise to humans, not as a general statement.]
Then shouldn’t such systems (which can surely recognize this argument) just take care of short term survival instrumentally? Maybe you’re making a claim about irrationality being likely or a claim that systems that care about long run benefit act in appararently myopic ways.
(Note that historically it was much harder to keep value stability/lock-in than it will be for AIs.)
I’m not going to engage in detail FYI.