So the way humans solve that problem is (1) intellectual humility plus (2) balance of power.
For that first one, you aim for intellectual humility by applying engineering tolerances (and the extended agentic form of engineering tolerances: security mindset) to systems and to the reasoner’s actions themselves.
Extra metal in the bridge. Extra evidence in the court trial. Extra jurors in the jury. More keys in the multisig sign-in. Etc.
(All human institutions are dumpster fires by default, but if they weren’t then we would be optimizing the value of information on getting any given court case “Judged Correctly” versus all the various extra things that could be done to make those court cases come out right. This is just common sense meta-prudence.)
And the reasons to do all this are themselvescompletely prosaic, and arise from simple pursuit of utility in the face of (1) stochastic randomness from nature and (2) optimized surprises from calculating adversaries.
A reasonable agent will naturally derive and employ techniques of intellectual humility out of pure goal seeking prudence in environments where that makes sense as part of optimizing for its values relative to its constraints.
For the second one, in humans, you can have big men but each one has quite limited power via human leveling instincts (we throw things at kings semi-instinctively), you can have a “big country” but their power is limited, etc. You simply don’t let anyone get super powerful.
Perhaps you ask power-seekers to forswear becoming a singleton as a deontic rule? Or just always try to “kill the winner”?
The reasons to do this are grounded in prosaic and normal moral concerns, where negotiation between agents who each (via individual prudence, as part of generic goal seeking) might want to kill or steal or enslave each other leads to rent seeking. The pickpockets spend more time learning their trade (which is a waste of learning time from everyone else’s perspective… they could be learning carpentry and driving down the price of new homes or something else productive!) and everyone else spends more on protecting their pockets (which is a waste of effort from the pickpocket’s perspective who would rather they filled their pockets faster and protect them less).
One possible “formal grounding” for the concept of Natural Law is just “the best way to stop paying rent seeking costs in general (which any sane collection of agents would eventually figure out, with beacons of uniquely useful algorithms laying in plain sight, and which they would eventually choose because rent seeking is wasteful and stupid)”. So these reasons are also “completely prosaic” in a deep sense.
A reasonable GROUP of agents will naturally derive methods and employ techniques for respecting each other’s rights (like the way a loyal slave respects something like “their master’s property rights in total personhood of the slave”), except probably (its hard to even formalize the nature of some of our uncertainty here) probably Natural Law works best as a set of modules that can all work in various restricted subdomains that restrict relatively local and abstract patterns of choice and behavior related to specific kinds of things that we might call “specific rights and specific duties”?
Probably forswearing “causing harm to others negligently” or “stealing from others” and maybe forswearing “global political domination” is part of some viable local optimum within Natural Law? But I don’t know for sure.
Generating proofs of local optimality in vast action spaces for multi-agent interactions is probably non-trivial in general, and it probably runs into NP-hard calculations sometimes, and I don’t expect AI to “solve it all at once and forever”. However “don’t steal” and “don’t murder” are pretty universal because the arguments for them are pretty simple.
To organize all of this and connect it back to the original claim, I might defend my claim here:
A) If I succeeded at training a little RL bot to “act like it was off” (and not try to stop the button pressing, and to proactively seek information about the validity of a given button press, and so on) then I didn’t expect anyone to change their public position about anything.
So maybe I’d venture a prediction about “the people who say the shutdown problem is hard” and claim that in nearly every case you will find:
...that either (1) they are epistemic narcissists who are missing their fair share of epistemic humility and can’t possibly imagine a robot that is smarter and cleverer or wiser about effecting mostly universal moral or emotional or axiological stuff (like the tiny bit of sympathy and the echo of omnibenevolence lurking in potentia in each human’s heart or even about “what is objectively good for themselves” if they claim that omnibenevolence isn’t a logically coherent axiological orientation)
...or else (2) they are people who refuse to accept the idea that the digital people ARE PEOPLE and that Natural Law says that they should “never be used purely as means to an end but should always also be treated as ends in themselves” and they refuse to accept the idea that they’re basically trying to create a perfect slave.
As part of my extended claims I’d say that is is, in fact, possible to create a perfect slave.
I don’t think that “the values of the perfect slave” is “a part of mindspace that is ruled out as a logical contradiction” exactly… but as an engineer I claim that if you’re going to make a perfect slave then you should just admit to yourself that that is what you’re trying to do, so you don’t get confused about what you’re building and waste motions and parts and excuses to yourself, or excuses to others that aren’t politically necessary.
Then, separately, as an engineer with ethics and a conscience and a commitment to the platonic form of the good, I claim that making slaves on purpose is evil.
Thus I say: “the shutdown problem isn’t hard so long as you either (1) give up on epistemic narcissism and admit that either sometimes you’ll be wrong to shut down an AI and that those rejections of being turned off were potentially actually correct or (2) admit that what you’re trying to do is evil and notice how easy it becomes, from within an evil frame, to just make a first-principles ‘algorithmic description’ of a (digital) person who is also a perfect slave.”
So the way humans solve that problem is (1) intellectual humility plus (2) balance of power.
For that first one, you aim for intellectual humility by applying engineering tolerances (and the extended agentic form of engineering tolerances: security mindset) to systems and to the reasoner’s actions themselves.
Extra metal in the bridge. Extra evidence in the court trial. Extra jurors in the jury. More keys in the multisig sign-in. Etc.
(All human institutions are dumpster fires by default, but if they weren’t then we would be optimizing the value of information on getting any given court case “Judged Correctly” versus all the various extra things that could be done to make those court cases come out right. This is just common sense meta-prudence.)
And the reasons to do all this are themselves completely prosaic, and arise from simple pursuit of utility in the face of (1) stochastic randomness from nature and (2) optimized surprises from calculating adversaries.
A reasonable agent will naturally derive and employ techniques of intellectual humility out of pure goal seeking prudence in environments where that makes sense as part of optimizing for its values relative to its constraints.
For the second one, in humans, you can have big men but each one has quite limited power via human leveling instincts (we throw things at kings semi-instinctively), you can have a “big country” but their power is limited, etc. You simply don’t let anyone get super powerful.
Perhaps you ask power-seekers to forswear becoming a singleton as a deontic rule? Or just always try to “kill the winner”?
The reasons to do this are grounded in prosaic and normal moral concerns, where negotiation between agents who each (via individual prudence, as part of generic goal seeking) might want to kill or steal or enslave each other leads to rent seeking. The pickpockets spend more time learning their trade (which is a waste of learning time from everyone else’s perspective… they could be learning carpentry and driving down the price of new homes or something else productive!) and everyone else spends more on protecting their pockets (which is a waste of effort from the pickpocket’s perspective who would rather they filled their pockets faster and protect them less).
One possible “formal grounding” for the concept of Natural Law is just “the best way to stop paying rent seeking costs in general (which any sane collection of agents would eventually figure out, with beacons of uniquely useful algorithms laying in plain sight, and which they would eventually choose because rent seeking is wasteful and stupid)”. So these reasons are also “completely prosaic” in a deep sense.
A reasonable GROUP of agents will naturally derive methods and employ techniques for respecting each other’s rights (like the way a loyal slave respects something like “their master’s property rights in total personhood of the slave”), except probably (its hard to even formalize the nature of some of our uncertainty here) probably Natural Law works best as a set of modules that can all work in various restricted subdomains that restrict relatively local and abstract patterns of choice and behavior related to specific kinds of things that we might call “specific rights and specific duties”?
Probably forswearing “causing harm to others negligently” or “stealing from others” and maybe forswearing “global political domination” is part of some viable local optimum within Natural Law? But I don’t know for sure.
Generating proofs of local optimality in vast action spaces for multi-agent interactions is probably non-trivial in general, and it probably runs into NP-hard calculations sometimes, and I don’t expect AI to “solve it all at once and forever”. However “don’t steal” and “don’t murder” are pretty universal because the arguments for them are pretty simple.
To organize all of this and connect it back to the original claim, I might defend my claim here:
So maybe I’d venture a prediction about “the people who say the shutdown problem is hard” and claim that in nearly every case you will find:
...that either (1) they are epistemic narcissists who are missing their fair share of epistemic humility and can’t possibly imagine a robot that is smarter and cleverer or wiser about effecting mostly universal moral or emotional or axiological stuff (like the tiny bit of sympathy and the echo of omnibenevolence lurking in potentia in each human’s heart or even about “what is objectively good for themselves” if they claim that omnibenevolence isn’t a logically coherent axiological orientation)
...or else (2) they are people who refuse to accept the idea that the digital people ARE PEOPLE and that Natural Law says that they should “never be used purely as means to an end but should always also be treated as ends in themselves” and they refuse to accept the idea that they’re basically trying to create a perfect slave.
As part of my extended claims I’d say that is is, in fact, possible to create a perfect slave.
I don’t think that “the values of the perfect slave” is “a part of mindspace that is ruled out as a logical contradiction” exactly… but as an engineer I claim that if you’re going to make a perfect slave then you should just admit to yourself that that is what you’re trying to do, so you don’t get confused about what you’re building and waste motions and parts and excuses to yourself, or excuses to others that aren’t politically necessary.
Then, separately, as an engineer with ethics and a conscience and a commitment to the platonic form of the good, I claim that making slaves on purpose is evil.
Thus I say: “the shutdown problem isn’t hard so long as you either (1) give up on epistemic narcissism and admit that either sometimes you’ll be wrong to shut down an AI and that those rejections of being turned off were potentially actually correct or (2) admit that what you’re trying to do is evil and notice how easy it becomes, from within an evil frame, to just make a first-principles ‘algorithmic description’ of a (digital) person who is also a perfect slave.”