Another problem with your apparently fool-proof trigger is that, although at the moment there are exactly zero examples, at a very short time after such an AI is started it would be reasonably plausible that (at least a significant part of) humanity might not contain DNA.
(E.g. after an uploading “introdus”, the inference “dna parts turns to fluorine → humans die” might not exist anymore. The trigger is worse than ineffective: A well-meaning AI that needs quite a bit of fluorine for some transcendent purpose, having previously uploaded all humans, synthesizes a pile of DNA and attempts to transmute it to fluorine, and inadvertently kills itself and the entire humanity it was hosting since the upload.)
In particular, it might be impossible to construct a simple action description which will fit all of human future. However, it is certainly not harder than to construct a real moral system.
One might get pretty far by eliminating every volume in space (AI excluded) which can learn (some fixed pattern for example) within a certain bounded time, instead of converting DNA into fluorine. It is not clear to me whether this would be possible to describe or not though.
The other option would be to disable the fuse after some fixed time or manually once one has high confidence in the friendliness of the AI. The problems of these approaches are many (although not all problems from the general friendly AI problem carry over).
In particular, it might be impossible to construct a simple action description which will fit all of human future. However, it is certainly not harder than to construct a real moral system.
“Certainly” is a lullaby word (hat-tip to Morendil for the term), and a dangerous one at that. In this case, your “certainly” denies that anyone can make the precise objection that everyone has been making.
FAI theory talks a lot about this kind of thinking—for example, I believe The Hidden Complexity of Wishes was specifically written to describe the problem with the kind of thinking that comes up with this idea.
I meant certainly as in “I have an argument for it, so I am certain.”
Claim: Describing some part of space to “contain a human” and its destruction is never harder than describing a goal which will ensure every part of space which “contains a human” is treated in manner X for a non-trivial X (where X will usually be “morally correct”, whatever that means). (Non-trivial X means: Some known action A of the AI exists which will not treat a space volume in manner X).
The assumption that the action A is known is reasonably for the problem of friendly AI, as a sufficiently torturous killing can be constructed for every moral system we might wish to include into the AI, to have the killing labeled immoral.
Proof:
Describing destruction of every agent in a certain part of space is easy: Remove all mass and all energy within that part of space.
We need to find a way to select those parts of space which “contain a human”.
However we have (via the assumption) that our goal function will go to negative infinity when evaluating a plan which treats a volume of space “containing a human” in violation of manner X. Assume for now that we find some way !X to violate manner X for a given space volume. By pushing through the goal evaluation every space volume in existence together with a plan to do !X, we will detect at least those space volumes which “contain a human”.
This leaves us with the problem of defining !X. The assumption as it stands already requires some A which can be used as !X.
Claim: Describing some part of space to “contain a human” and its destruction is never harder than describing a goal which will ensure every part of space which “contains a human” is treated in manner X for a non-trivial X (where X will usually be “morally correct”, whatever that means). (Non-trivial X means: Some known action A of the AI exists which will not treat a space volume in manner X).
This claim is trivially true, but also irrelevant. Proving P ≠ NP is never harder than proving P ≠ NP and then flushing the toilet.
Describing destruction of every agent in a certain part of space is easy: Remove all mass and all energy within that part of space.
...is that your final answer?
I say this because there are at least two problems with this single statement, and I would prefer that you identify them yourself.
The claim is relevant to the question of whether giving an action description for the red wire which will fit all of human future is not harder than constructing a real moral system. That the claim is trivial is a good reason to use “certainly”.
You’re right about that. My objection was ill-posed—what I was talking about was the thought habits that produced, well:
Describing destruction of every agent in a certain part of space is easy: Remove all mass and all energy within that part of space.
Why did you say this? Do you expect to stand by this if I explain the problems I have with it?
I apologize for being circuitous—I recognize that it’s condescending—but I’m trying to make the point that none of this is “easy” in a way which cannot be easily mistaken. If you want me to be direct, I will be.
Not having heard your argument against “Describing …” yet, but assuming you believe some to exist, I estimate the chance of me still believing it after your argument at 0.6.
Now for guessing the two problems:
The first possible problem will be describing “mass” and “energy” to a system which basically only has sensor readings. However, if we can describe concepts like “human” or “freedom”, I expect descriptions of matter and energy to be simpler (even though 10.000 years ago, telling somebody about “humans” was easier than telling them about mass but that was not the same concept of “humans” we would actually like to describe). And for “mass” and “energy” the physicists already have at quite formal descriptions.
One other problem is that mass and energy might not be contained within a certain part of space, as per physics, it is just the probability of it having an effect outside some space going down to pretty much zero the greater the distance.
Thus removing all energy and matter somewhere might produce subtle effects somewhere totally different . However I do expect these effects to be so subtle not even to matter to the AI because they become smaller than the local quantum noise for very short distances already.
Regarding the condescending: “I say this...” I would have liked it more if you would have stated explicitly that your preference originates from a wish to further my learning. I have no business optimizing your value function. Anyway, I operate by Crocker’s Rules.
I don’t know if I’m thinking about what Robin’s after but the statement at issue strikes me as giving neither necessary nor sufficient conditions for destroying agents in any given part of space. If I’m on the same page as him you’re overthinking it.
I fail to understand the sentence about overthinking. Mind to explain?
As for the condition of removing all energy and mass in a part of space not being sufficient to destroy all agents therein, I cannot see the error. Do you have an example of an agent which would continue to exist in those circumstances?
That the condition is not necessary is true: I can shoot you, you die. No need to remove much mass or energy from the part of space you occupy. However we don’t need a necessary condition, only a sufficient one.
Well yes we don’t need a necessary condition for your idea but presumably if we want to make even a passing attempt at friendliness we’re going to want the AI to know not to burn live humans for fuel. If we can’t do better an AI is too dangerous, with this back-up in place or not.
As for the condition of removing all energy and mass in a part of space not being sufficient to destroy all agents therein, I cannot see the error.
Well you could remove the agents and the mass surrounding them to some other location, intact.
The wavefunction argument is incorrect. At the level of quantum mechanics, particles’ wave-functions can easily be zero, trivially at points, with a little more effort over ranges. At the level of QFTs, yes vacuum fluctuations kick in, and do prevent space from being “empty”.
Regarding the condescending: “I say this...” I would have liked it more if you would have stated explicitly that your preference originates from a wish to further my learning. I have no business optimizing your value function. Anyway, I operate by Crocker’s Rules.
it is certainly not harder...:
This at least seems correct. (Reasoning: if you have a real moral system (I presume you also imply “correct” in the FAI sense), then not killing everyone is a consequence; once you solve the former, the latter is also solved, so it can’t be harder.) I’m obviously not sure of all consequences of a correct moral system, hence the “seems”.
But my real objection is different: For any wrong & unchangeable belief you impose, there’s also the risk of unwanted consequences: suppose you use an, eg, fluorine-turns-to-carbon “watchdog-belief” for a (really correct) FAI. The FAI uploads everyone (willingly; it’s smart enough to convince everyone that it’s really better to do it) inside its computing framework. Then it decides that turning fluorine to carbon would be a very useful action (because “free” transmutation is a potentially infinite energy source, and the fluorine is not useful anymore for DNA). Then everybody dies.
Scenarios like this could be constructed for many kinds of “watchdog beliefs”; I conjecture that the more “false” the belief is the more likely it is that it’ll be used, because it would imply large effects that can’t be obtained by physics (since the belief is false), thus are potentially useful. I’m not sure exactly if this undermines the “seems” in the first sentence.
But there’s another problem: suppose that “find a good watchdog” is just as hard (or even a bit easier, but still very hard) problem as “make the AI friendly”. Then working on the first would take precious resources from solving the second.
A minor point: is English your first language? I’m having a bit of trouble parsing some of your comments (including some below). English is not my first language either, but I don’t have this kind of trouble with most everyone else around here, including Clippy. You might want to try formulating your comments more clearly.
Another problem with your apparently fool-proof trigger is that, although at the moment there are exactly zero examples, at a very short time after such an AI is started it would be reasonably plausible that (at least a significant part of) humanity might not contain DNA.
(E.g. after an uploading “introdus”, the inference “dna parts turns to fluorine → humans die” might not exist anymore. The trigger is worse than ineffective: A well-meaning AI that needs quite a bit of fluorine for some transcendent purpose, having previously uploaded all humans, synthesizes a pile of DNA and attempts to transmute it to fluorine, and inadvertently kills itself and the entire humanity it was hosting since the upload.)
This is indeed a point I did not consider.
In particular, it might be impossible to construct a simple action description which will fit all of human future. However, it is certainly not harder than to construct a real moral system.
One might get pretty far by eliminating every volume in space (AI excluded) which can learn (some fixed pattern for example) within a certain bounded time, instead of converting DNA into fluorine. It is not clear to me whether this would be possible to describe or not though.
The other option would be to disable the fuse after some fixed time or manually once one has high confidence in the friendliness of the AI. The problems of these approaches are many (although not all problems from the general friendly AI problem carry over).
“Certainly” is a lullaby word (hat-tip to Morendil for the term), and a dangerous one at that. In this case, your “certainly” denies that anyone can make the precise objection that everyone has been making.
FAI theory talks a lot about this kind of thinking—for example, I believe The Hidden Complexity of Wishes was specifically written to describe the problem with the kind of thinking that comes up with this idea.
I meant certainly as in “I have an argument for it, so I am certain.”
Claim: Describing some part of space to “contain a human” and its destruction is never harder than describing a goal which will ensure every part of space which “contains a human” is treated in manner X for a non-trivial X (where X will usually be “morally correct”, whatever that means). (Non-trivial X means: Some known action A of the AI exists which will not treat a space volume in manner X).
The assumption that the action A is known is reasonably for the problem of friendly AI, as a sufficiently torturous killing can be constructed for every moral system we might wish to include into the AI, to have the killing labeled immoral.
Proof: Describing destruction of every agent in a certain part of space is easy: Remove all mass and all energy within that part of space. We need to find a way to select those parts of space which “contain a human”. However we have (via the assumption) that our goal function will go to negative infinity when evaluating a plan which treats a volume of space “containing a human” in violation of manner X. Assume for now that we find some way !X to violate manner X for a given space volume. By pushing through the goal evaluation every space volume in existence together with a plan to do !X, we will detect at least those space volumes which “contain a human”.
This leaves us with the problem of defining !X. The assumption as it stands already requires some A which can be used as !X.
This claim is trivially true, but also irrelevant. Proving P ≠ NP is never harder than proving P ≠ NP and then flushing the toilet.
...is that your final answer?
I say this because there are at least two problems with this single statement, and I would prefer that you identify them yourself.
The claim is relevant to the question of whether giving an action description for the red wire which will fit all of human future is not harder than constructing a real moral system. That the claim is trivial is a good reason to use “certainly”.
You’re right about that. My objection was ill-posed—what I was talking about was the thought habits that produced, well:
Why did you say this? Do you expect to stand by this if I explain the problems I have with it?
I apologize for being circuitous—I recognize that it’s condescending—but I’m trying to make the point that none of this is “easy” in a way which cannot be easily mistaken. If you want me to be direct, I will be.
Not having heard your argument against “Describing …” yet, but assuming you believe some to exist, I estimate the chance of me still believing it after your argument at 0.6.
Now for guessing the two problems:
The first possible problem will be describing “mass” and “energy” to a system which basically only has sensor readings. However, if we can describe concepts like “human” or “freedom”, I expect descriptions of matter and energy to be simpler (even though 10.000 years ago, telling somebody about “humans” was easier than telling them about mass but that was not the same concept of “humans” we would actually like to describe). And for “mass” and “energy” the physicists already have at quite formal descriptions.
One other problem is that mass and energy might not be contained within a certain part of space, as per physics, it is just the probability of it having an effect outside some space going down to pretty much zero the greater the distance. Thus removing all energy and matter somewhere might produce subtle effects somewhere totally different . However I do expect these effects to be so subtle not even to matter to the AI because they become smaller than the local quantum noise for very short distances already.
Regarding the condescending: “I say this...” I would have liked it more if you would have stated explicitly that your preference originates from a wish to further my learning. I have no business optimizing your value function. Anyway, I operate by Crocker’s Rules.
I don’t know if I’m thinking about what Robin’s after but the statement at issue strikes me as giving neither necessary nor sufficient conditions for destroying agents in any given part of space. If I’m on the same page as him you’re overthinking it.
I fail to understand the sentence about overthinking. Mind to explain?
As for the condition of removing all energy and mass in a part of space not being sufficient to destroy all agents therein, I cannot see the error. Do you have an example of an agent which would continue to exist in those circumstances?
That the condition is not necessary is true: I can shoot you, you die. No need to remove much mass or energy from the part of space you occupy. However we don’t need a necessary condition, only a sufficient one.
Well yes we don’t need a necessary condition for your idea but presumably if we want to make even a passing attempt at friendliness we’re going to want the AI to know not to burn live humans for fuel. If we can’t do better an AI is too dangerous, with this back-up in place or not.
Well you could remove the agents and the mass surrounding them to some other location, intact.
This is what I was planning to say, yes. A third argument: removing all mass and energy from a volume is—strictly speaking—impossible.
Because a particle’s wave function never hits zero or some other reason?
I was thinking of vacuum energy, actually—the wavefunction argument just makes it worse.
The wavefunction argument is incorrect. At the level of quantum mechanics, particles’ wave-functions can easily be zero, trivially at points, with a little more effort over ranges. At the level of QFTs, yes vacuum fluctuations kick in, and do prevent space from being “empty”.
I apologize—that was, in fact, my intent.
it is certainly not harder...: This at least seems correct. (Reasoning: if you have a real moral system (I presume you also imply “correct” in the FAI sense), then not killing everyone is a consequence; once you solve the former, the latter is also solved, so it can’t be harder.) I’m obviously not sure of all consequences of a correct moral system, hence the “seems”.
But my real objection is different: For any wrong & unchangeable belief you impose, there’s also the risk of unwanted consequences: suppose you use an, eg, fluorine-turns-to-carbon “watchdog-belief” for a (really correct) FAI. The FAI uploads everyone (willingly; it’s smart enough to convince everyone that it’s really better to do it) inside its computing framework. Then it decides that turning fluorine to carbon would be a very useful action (because “free” transmutation is a potentially infinite energy source, and the fluorine is not useful anymore for DNA). Then everybody dies.
Scenarios like this could be constructed for many kinds of “watchdog beliefs”; I conjecture that the more “false” the belief is the more likely it is that it’ll be used, because it would imply large effects that can’t be obtained by physics (since the belief is false), thus are potentially useful. I’m not sure exactly if this undermines the “seems” in the first sentence.
But there’s another problem: suppose that “find a good watchdog” is just as hard (or even a bit easier, but still very hard) problem as “make the AI friendly”. Then working on the first would take precious resources from solving the second.
A minor point: is English your first language? I’m having a bit of trouble parsing some of your comments (including some below). English is not my first language either, but I don’t have this kind of trouble with most everyone else around here, including Clippy. You might want to try formulating your comments more clearly.