An “efficient and robust approximation of empowerment” is a “natural abstraction” / salient concept that AGIs (“even those of human-level intelligence”) are likely to have learned
That isn’t actually a claim I’m making—empowerment intrinsic motivation is the core utility function for selfish AGI (see all the examples in that section ) rather than something learned. Conceptually altruistic AGI uses external empowerment as the core utility function (although in practice it will also likely need self-empowerment derived intrinsic motivation to bootstrap).
Also compare to the behavioral empowerment hypothesis from the intro: bacteria moving along sugar gradients, chimpanzees seeking social status, and humans seeking wealth are not doing those things to satisfy a learned concept of empowerment—they are acting as if they are maximizing empowerment. Evolution learned various approximations of empowerment intrinsic motivation.
without a clear-cut method to extrapolate that concept into weird sci-fi futures,
Ah ok, so I think the core novelty here is that no matter what your values are, optimizing for your empowerment today is identical to optimizing for your long term values today. Those specific wierd dystopian sci-fi futures are mostly all automatically avoided.
The price that you’d pay for that is giving up short term utility for long term utility, and possibly a change in core values when becoming posthuman. But I think we can mostly handle that by using learned human values to cover more of the short term utility and empowerment for the long term.
But there is always this unavoidable tradeoff between utility at different timescales, and there is an optimization pressure gradient favoring low discount rates.
I would instead propose to figure out the “efficient and robust approximation of empowerment” ourselves, right now, then write down the formula, make up a jargon word for it,
Right—that’s all research track on empowerment and intrinsic motivation I briefly summarized.
If an AGI is trained on “correlation-guided proxy matching” with [todo: fill-in-the-blank] proxy, then it will wind up wanting to maximize the “efficient and robust approximation of empowerment” of humanity
That isn’t a claim I make here in this article. I do think the circuit grounding/pointing problem is fundamental, and correlation-guided proxy matching is my current best vague guess about how the brain solves that problem. But that’s a core problem with selfish-AGI as well—for the reasons outlined in the cartesian objection section, robust utility functions must be computed from learned world model state.
I’m skeptical of Claim 3 mainly for similar reasons as Charlie Steiner’s comment on this page that an AGI trying to empower me would want me to accumulate resources but not spend them,
I already responded to his comment, but yes long-termism favors saving/investing over spending.
To the extent that’s actually a problem, one could attempt to tune an empowerment discount rate that matches the human discount rate, so the AGI wants you to sacrifice some long term optionality/wealth for some short term optionality/wealth. Doing that too much causes divergence however, so I focused on the pure long term cases where there is full convergence, and again I think using learned human values more directly for the short term seems promising.
If the AGI spends Monday executing a plan to accumulate resources, and then gives those resources to the person on Tuesday, to use for the rest of their lives, that’s good. If the AGI spends Monday brainwashing the human to be more power-hungry, and then the person is more effective at resource-acquisition starting on Tuesday and continuing for the rest of their lives, that’s bad.
If we are talking about human surpassing AGI, then almost by definition it will be more effective for the AGI to generate wealth for you directly rather than ‘brainwashing’ you into something that can generate wealth more effectively than it can.
There is innate stuff in the genome that makes humans want social status. Oh by the way, the reason that this stuff wound up in the genome is because social status tends to lead to empowerment, which in turn tends to lead to higher inclusive genetic fitness. Ditto curiosity, fun, etc.
Yeah mostly this because empowerment is very complex and can only be approximated, and it must be approximated efficiently even early on. So somewhere in there I described it as an instrumental hierarchy, where inclusive fitness leads to empowerment leads to curiosity, fun, etc.
Except of course there are some things like money which we seem to pretty quickly intuitively learn the utility of which suggests we are also eventually using some more direct learned approximations of empowerment.
Getting back to this:
But I claim these are also exactly the values that determine whether our future lightcone is tiled with hedonium versus paperclips versus cosmopolitan posthuman society etc.
Humans and all our complex values are the result of evolutionary optimization for a conceptually simple objective: inclusive fitness. A posthuman society transcends biology and inclusive fitness no longer applies. What is the new objective function for post-biological evolution? Post humans are still intelligent agents with varying egocentric objectives and thus still systems for which the behavioral empowerment law applies. So the outcome is a natural continuation of our memetic/cultural/technological evolution which fills the lightcone with a vast and varied complex cosmpolitan posthuman society.
The values that deviate from empowerment are near exclusively related to sex which no longer serves any direct purpose, but could still serve fun and thus empowerment. Reproduction still exists but in a new form. Everything that survives or flourishes tends to do so because it ultimately serves the purpose of some higher level optimization objective.
That isn’t actually a claim I’m making—empowerment intrinsic motivation is the core utility function for selfish AGI (see all the examples in that section ) rather than something learned. Conceptually altruistic AGI uses external empowerment as the core utility function (although in practice it will also likely need self-empowerment derived intrinsic motivation to bootstrap).
Also compare to the behavioral empowerment hypothesis from the intro: bacteria moving along sugar gradients, chimpanzees seeking social status, and humans seeking wealth are not doing those things to satisfy a learned concept of empowerment—they are acting as if they are maximizing empowerment. Evolution learned various approximations of empowerment intrinsic motivation.
Ah ok, so I think the core novelty here is that no matter what your values are, optimizing for your empowerment today is identical to optimizing for your long term values today. Those specific wierd dystopian sci-fi futures are mostly all automatically avoided.
The price that you’d pay for that is giving up short term utility for long term utility, and possibly a change in core values when becoming posthuman. But I think we can mostly handle that by using learned human values to cover more of the short term utility and empowerment for the long term.
But there is always this unavoidable tradeoff between utility at different timescales, and there is an optimization pressure gradient favoring low discount rates.
Right—that’s all research track on empowerment and intrinsic motivation I briefly summarized.
That isn’t a claim I make here in this article. I do think the circuit grounding/pointing problem is fundamental, and correlation-guided proxy matching is my current best vague guess about how the brain solves that problem. But that’s a core problem with selfish-AGI as well—for the reasons outlined in the cartesian objection section, robust utility functions must be computed from learned world model state.
I already responded to his comment, but yes long-termism favors saving/investing over spending.
To the extent that’s actually a problem, one could attempt to tune an empowerment discount rate that matches the human discount rate, so the AGI wants you to sacrifice some long term optionality/wealth for some short term optionality/wealth. Doing that too much causes divergence however, so I focused on the pure long term cases where there is full convergence, and again I think using learned human values more directly for the short term seems promising.
If we are talking about human surpassing AGI, then almost by definition it will be more effective for the AGI to generate wealth for you directly rather than ‘brainwashing’ you into something that can generate wealth more effectively than it can.
Yeah mostly this because empowerment is very complex and can only be approximated, and it must be approximated efficiently even early on. So somewhere in there I described it as an instrumental hierarchy, where inclusive fitness leads to empowerment leads to curiosity, fun, etc. Except of course there are some things like money which we seem to pretty quickly intuitively learn the utility of which suggests we are also eventually using some more direct learned approximations of empowerment.
Getting back to this:
Humans and all our complex values are the result of evolutionary optimization for a conceptually simple objective: inclusive fitness. A posthuman society transcends biology and inclusive fitness no longer applies. What is the new objective function for post-biological evolution? Post humans are still intelligent agents with varying egocentric objectives and thus still systems for which the behavioral empowerment law applies. So the outcome is a natural continuation of our memetic/cultural/technological evolution which fills the lightcone with a vast and varied complex cosmpolitan posthuman society.
The values that deviate from empowerment are near exclusively related to sex which no longer serves any direct purpose, but could still serve fun and thus empowerment. Reproduction still exists but in a new form. Everything that survives or flourishes tends to do so because it ultimately serves the purpose of some higher level optimization objective.