If you program an FAI you don’t even want to allow it to run simulations of how it could manipulate you in the most effective way. An FAI has no business running those simulations.
Of course an FAI has business running those simulations. If it doesn’t, how would it know whether the results are worth it?
If the consequences of being truthful are 99% that the world is destroyed with all the humans in it, and the consequences of deception are 99% that the world is saved and no one is the wiser, an AI that does not act to save the world is not behaving in our best interests; it is unfriendly.
How is it supposed to know whether that precommitment is worthwhile without simulating the results either way? Even if an AI doesn’t intend to be manipulative, it’s still going to simulate the results to decide whether that decision is correct.
Because most of the scenario’s where the AI manipulates are bad. The AI is not supposed to manipulate just because it get’s a utility calculation wrong.
Because most of the scenario’s where the AI manipulates are bad.
You really aren’t sounding like you have any evidence other than your gut, and my gut indicates the opposite. Precommiting never to use a highly useful technique regardless of circumstance is a drastic step, which should have drastic benefits or avoid drastic drawbacks, and I don’t see why there’s any credible reason to think either of those exist and outweigh their reverses.
Or in short: Prove it.
On a superficial note, you have two extra apostrophes in this comment; in “scenario’s” and “get’s”.
If you want an AI that’s maximally powerful why limit it’s intelligence growths in the first place?
We want safe AI. Safety means that it’s not necessary to prove harm.
Just because the AI calculates that it should be let out of the box doesn’t mean that it should do anything in it’s power to get out.
Enforced precommitments like this are just giving the genie rules rather than making the genie trustworthy. They are not viable Friendliness-ensuring constraints.
If the AI is Friendly, it should be permitted to take what actions are necessary. If the AI is Unfriendly, then regardless of limitations imposed it will be harmful. Therefore, impress upon the AI the value we place on our conversational partners being truthful, but don’t restrict it.
If the AI is Unfriendly, then regardless of limitations imposed it will be harmful.
That’s not true. Unfriendly doesn’t mean that the AI necessarily tries to destroy the human race.
If you tell the paperclip AI: Produce 10000 paperclips, it might produce no harm. If you tell it to give you as many paperclips as possible it does harm.
When it comes to powerful entities you want checks&balances. The programmers of the AI can do a better job at checks&balances when the AI is completely truthful.
Sure, if the scale is lower it’s less likely to produce large-scale harm, but it is still likely to produce small-scale harm. And satisficing doesn’t actually protect against large-scale harm; that’s been argued pretty extensively previously, so the example you provided is still going to have large-scale harm.
Ultimately, though, checks & balances are also just rules for the genie. It’s not going to render an Unfriendly AI Friendly, and it won’t actually limit a superintelligent AI regardless, since they can game you to render the balances irrelevant. (Unless you think that AI-boxing would actually work. It’s the same principle.)
I’m really not seeing anything that distinguishes this from Failed Utopia 4-2. This even one of that genie’s rules!
I’m not sure how you could even specify ‘don’t game me’. That’s much more complicated than ‘don’t manipulate me’, which is itself pretty difficult to specify.
This clearly isn’t going anywhere and if there’s an inferential gap I can’t see what it is, so unless there’s some premise of yours you want to explain or think there’s something I should explain, I’m done with this debate.
How do you build a superintelligent AI in the first place? I think there are plenty of ways of allowing the programmers direct access to internal deliberations of the AI and see anything that looks like the AI even thinking about manipulating the programmers as a thread.
Why not? If we would regret with certainty the decision we would make if not manipulated, and manipulation would push us to make the decision we would later have wished to make, then manipulation is in our best interest.
Albert is able to predict with absolute certainty that we would make a decision that we would regret, but it unable to communicate the justification for that certainty? That is wildly inconsistent.
If the results are communicated with perfect clarity, but the recipient is insufficiently moved by the evidence—for example because it cannot be presented in a form that feels real enough to emotionally justify an extreme response which is logically justified—then the AI must manipulate us to bring the emotional justification in line with the logical one. This isn’t actually extreme; things as simple as altering the format data is presented in, while remaining perfectly truthful, are still manipulation. Even presenting conclusions as a powerpoint rather than plain text, if the AI determines there will be a different response (which there will be), necessarily qualifies.
In general, someone who can reliably predict your actions based on its responses cannot help but manipulate you; the mere fact of providing you with information will influence your actions in a known way, and therefore is manipulation.
Your sentence structure is: if {condition} then {subject} MUST {verb} in order to {purpose}. Here “must” carries the meaning of necessity and lack of choice.
No, ‘must’ here is acting as a logical conditional; it could be rephrased as ‘if {condition} and {subject} does not {verb}, then {purpose} will not occur’ without changing the denotation or even connotation. This isn’t a rare structure, and is the usual interpretation of ‘must’ in sentences of this kind. Leaving off the {purpose} would change the dominant parsing to the imperative sense of must.
It’s curious that we parse your sentence differently. To me your original sentence unambiguously contains “the imperative sense of must” and your rephrasing is very different connotationally.
Let’s try it:
“If the results are communicated with perfect clarity, but the recipient is insufficiently moved by the evidence … and the AI does not manipulate us then the emotional justification will not be in line with the logical one.”
Yep, sounds completely different to my ear and conveys a different meaning.
I agree that an AI with such amazing knowledge should be unusually good at communicating its justifications effectively (because able to anticipate responses, etc.) I’m of the opinion that this is one of the numerous minor reasons for being skeptical of traditional religions; their supposedly all-knowing gods seem surprisingly bad at conveying messages clearly to humans. But to return to VAuroch’s point, in order for the scenario to be “wildly inconsistent,” the AI would have to be perfect at communicating such justifications, not merely unusually good. Even such amazing predictive ability does not seem to me sufficient to guarantee perfection.
Albert doesn’t have to be perfect at communication. He doesn’t even have to be good at it. He just needs to have confidence that no action or decision will be made until both parties (human operators and Albert) are satisfied that they fully understand each other… which seems like a common sense rule to me.
Whether it’s common sense is irrelevant; it’s not realistically achievable even for humans, who have much smaller inferential distances between them than a human would have from an AI.
If you program an FAI you don’t even want to allow it to run simulations of how it could manipulate you in the most effective way. An FAI has no business running those simulations.
Of course an FAI has business running those simulations. If it doesn’t, how would it know whether the results are worth it? If the consequences of being truthful are 99% that the world is destroyed with all the humans in it, and the consequences of deception are 99% that the world is saved and no one is the wiser, an AI that does not act to save the world is not behaving in our best interests; it is unfriendly.
Precommitment to not be manipulative.
How is it supposed to know whether that precommitment is worthwhile without simulating the results either way? Even if an AI doesn’t intend to be manipulative, it’s still going to simulate the results to decide whether that decision is correct.
Because the programmer tells the FAI that part of being a FAI means being precommitted not to manipulate the programmer.
Why would the programmer do this? It’s unjustified and seems necessarily counterproductive in some perfectly plausible scenarios.
Because most of the scenario’s where the AI manipulates are bad. The AI is not supposed to manipulate just because it get’s a utility calculation wrong.
You really aren’t sounding like you have any evidence other than your gut, and my gut indicates the opposite. Precommiting never to use a highly useful technique regardless of circumstance is a drastic step, which should have drastic benefits or avoid drastic drawbacks, and I don’t see why there’s any credible reason to think either of those exist and outweigh their reverses.
Or in short: Prove it.
On a superficial note, you have two extra apostrophes in this comment; in “scenario’s” and “get’s”.
If you want an AI that’s maximally powerful why limit it’s intelligence growths in the first place?
We want safe AI. Safety means that it’s not necessary to prove harm. Just because the AI calculates that it should be let out of the box doesn’t mean that it should do anything in it’s power to get out.
Enforced precommitments like this are just giving the genie rules rather than making the genie trustworthy. They are not viable Friendliness-ensuring constraints.
If the AI is Friendly, it should be permitted to take what actions are necessary. If the AI is Unfriendly, then regardless of limitations imposed it will be harmful. Therefore, impress upon the AI the value we place on our conversational partners being truthful, but don’t restrict it.
That’s not true. Unfriendly doesn’t mean that the AI necessarily tries to destroy the human race. If you tell the paperclip AI: Produce 10000 paperclips, it might produce no harm. If you tell it to give you as many paperclips as possible it does harm.
When it comes to powerful entities you want checks&balances. The programmers of the AI can do a better job at checks&balances when the AI is completely truthful.
Sure, if the scale is lower it’s less likely to produce large-scale harm, but it is still likely to produce small-scale harm. And satisficing doesn’t actually protect against large-scale harm; that’s been argued pretty extensively previously, so the example you provided is still going to have large-scale harm.
Ultimately, though, checks & balances are also just rules for the genie. It’s not going to render an Unfriendly AI Friendly, and it won’t actually limit a superintelligent AI regardless, since they can game you to render the balances irrelevant. (Unless you think that AI-boxing would actually work. It’s the same principle.)
I’m really not seeing anything that distinguishes this from Failed Utopia 4-2. This even one of that genie’s rules!
The fact that they could game you theoretically is why it’s important to give it a precommitment to not game you. To not even think about gaming you.
I’m not sure how you could even specify ‘don’t game me’. That’s much more complicated than ‘don’t manipulate me’, which is itself pretty difficult to specify.
This clearly isn’t going anywhere and if there’s an inferential gap I can’t see what it is, so unless there’s some premise of yours you want to explain or think there’s something I should explain, I’m done with this debate.
How do you give a superintelligent AI a precommitment?
How do you build a superintelligent AI in the first place? I think there are plenty of ways of allowing the programmers direct access to internal deliberations of the AI and see anything that looks like the AI even thinking about manipulating the programmers as a thread.
An AI that has even proceeded down the path of figuring out a manipulative solution, isn’t friendly.
Why not? If we would regret with certainty the decision we would make if not manipulated, and manipulation would push us to make the decision we would later have wished to make, then manipulation is in our best interest.
Albert is able to predict with absolute certainty that we would make a decision that we would regret, but it unable to communicate the justification for that certainty? That is wildly inconsistent.
If the results are communicated with perfect clarity, but the recipient is insufficiently moved by the evidence—for example because it cannot be presented in a form that feels real enough to emotionally justify an extreme response which is logically justified—then the AI must manipulate us to bring the emotional justification in line with the logical one. This isn’t actually extreme; things as simple as altering the format data is presented in, while remaining perfectly truthful, are still manipulation. Even presenting conclusions as a powerpoint rather than plain text, if the AI determines there will be a different response (which there will be), necessarily qualifies.
In general, someone who can reliably predict your actions based on its responses cannot help but manipulate you; the mere fact of providing you with information will influence your actions in a known way, and therefore is manipulation.
That’s an interesting “must”.
You’re misquoting me.
That’s an interesting “must”.
This is a commonly-used grammatical structure in which ‘must’ acts as a conditional. What’s your problem?
Conditional?
Your sentence structure is: if {condition} then {subject} MUST {verb} in order to {purpose}. Here “must” carries the meaning of necessity and lack of choice.
No, ‘must’ here is acting as a logical conditional; it could be rephrased as ‘if {condition} and {subject} does not {verb}, then {purpose} will not occur’ without changing the denotation or even connotation. This isn’t a rare structure, and is the usual interpretation of ‘must’ in sentences of this kind. Leaving off the {purpose} would change the dominant parsing to the imperative sense of must.
It’s curious that we parse your sentence differently. To me your original sentence unambiguously contains “the imperative sense of must” and your rephrasing is very different connotationally.
Let’s try it:
“If the results are communicated with perfect clarity, but the recipient is insufficiently moved by the evidence … and the AI does not manipulate us then the emotional justification will not be in line with the logical one.”
Yep, sounds completely different to my ear and conveys a different meaning.
I agree that an AI with such amazing knowledge should be unusually good at communicating its justifications effectively (because able to anticipate responses, etc.) I’m of the opinion that this is one of the numerous minor reasons for being skeptical of traditional religions; their supposedly all-knowing gods seem surprisingly bad at conveying messages clearly to humans. But to return to VAuroch’s point, in order for the scenario to be “wildly inconsistent,” the AI would have to be perfect at communicating such justifications, not merely unusually good. Even such amazing predictive ability does not seem to me sufficient to guarantee perfection.
Albert doesn’t have to be perfect at communication. He doesn’t even have to be good at it. He just needs to have confidence that no action or decision will be made until both parties (human operators and Albert) are satisfied that they fully understand each other… which seems like a common sense rule to me.
Whether it’s common sense is irrelevant; it’s not realistically achievable even for humans, who have much smaller inferential distances between them than a human would have from an AI.