Lately I’ve been wondering if a rational agent can be expected to use the dark arts when dealing with irrational agents. For example: if a rational AI (not necessarily FAI) had to convince a human to cooperate with it, would it use rhetoric to leverage the human biases against it? Would a FAI?
Calling them “dark arts” is itself a tactic for framing that only affects the less-rational parts of our judgement.
A purely rational agent will (the word “should” isn’t necessary here) of course use rhetoric, outright lies, and other manipulations to get irrational agents to behave in ways that further it’s goals.
The question gets difficult when there are no rational agents involved. Humans, for instance, even those who want to be rational most of the time, are very bad at judging when they’re wrong. For these irrational agents, it is good general advice not to lie or mislead anyone, at least if you have any significant uncertainty on the relative correctness of your positions on the given topic.
Put another way, persistent disagreement indicates mutual contempt for each others’ rationality. If the disagreement is resolvable, you don’t need the dark arts. If you’re considering the dark arts, it’s purely out of contempt.
If the disagreement is resolvable, you don’t need the dark arts. If you’re considering the dark arts, it’s purely out of contempt.
If both parties are imperfectly rational, limited use of dark arts can speed things up. The question shouldn’t be whether it’s possible to present dry facts and logic with no spin, but whether it’s efficient. There are certain biases that tend to prevent ideas from even being considered. Using other biases and heuristics to counteract those biases—just to get more alternative explanations to be seriously considered—won’t impair or bypass the rationality of the listener.
Dark arts, huh? Sometime ago I put forward the following scenario:
Bob wants to kill a kitten. The FAI wants to save the kitten because it’s a good thing according to our CEV. So the FAI threatens Bob with 50 years of torture unless Bob lets the kitten go. The FAI has two distinct reasons why threatening Bob is okay: a) Bob will comply and there will be no need to torture him, b) the FAI is lying anyway. Expected utility reasoning says the FAI is doing the Right Thing. But do we want that?
(Yes, this is yet another riff on consequentialism, deontologism and lying. Should FAIs follow deontological rules? For that matter, should humans?)
Expected utility reasoning says the FAI is doing the Right Thing. But do we want that?
Expected utility reasoning with a particular utility function says the FAI is right. If we disagree, our preferences might be described by some other utility function.
our CEV is (and has to be) detailed enough to answer the question of “do we want that?”. Saving a kitten is a good thing. Being truthful to Bob is a good thing. Not torturing Bob is a good thing. The relative weights of these good things determines the FAI’s actions.
I’d say that the FAI should calculate some game-theoretic chance of torturing Bob for 50 years based on relative pain of kitten death and of having to inflict the torture. Depending on Bob’s expected rationality level, we could tell him “you’ll be tortured”, or “you might be tortured”, or the actual mechanism of determining whether he is tortured.
Actually, strike that. Any competent AI will find ways aside from possible torture to make Bob not want that. Either agree with Bob’s reason for killing the kitten, or fix him so he only wants things that make sense. I’m not sure how friendly this is—I haven’t seen a good writeup or come to any conclusions myself of what FAI does with internal contradictions in a CEV (that is, when a population’s extrapolated volition is not coherent).
My thoughts about this problem are kind of a mess right now, but I feel there’s more than meets the eye.
Ignore the torture, “possible torture” and all that. It’s all a red herring. The real issue is lying, tricking humans into utility-increasing behaviors. It’s almost certain that some combination of “relative weights of good things” will make the FAI lie to humans. Maybe not the Bob+kitten scenario exactly, but something is bound to turn up. (Unless of course our CEV places a huge disutility on lies, which I’m pretty sure won’t be the case.) On the other hand, we humans quickly jump to distrusting anyone who has lied in the past, even if we know it’s for our own good. So now the FAI has huge incentive to conceal its lies, prevent the news from spreading among humans. I don’t have enough brainpower to model this scenario further, but it troubles me.
Lying is a form of manipulation, and humans don’t want/like to be manipulated. If the CEV works, then it will understand human concepts like “trust” and “lying” and hopefully avoid it. The only situations where it will intentionally manipulate people is when it is trying to do what is best for humanity. In these cases, you don’t have to worry because the CEV is smarter then you, but is still trying to do the “right thing” that you would do if you knew everything it knew.
Lying is a form of manipulation, and humans don’t want/like to be manipulated.
Well… that depends...
In these cases, you don’t have to worry because the CEV is smarter then you, but is still trying to do the “right thing” that you would do if you knew everything it knew.
Lately I’ve been wondering if a rational agent can be expected to use the dark arts when dealing with irrational agents.
Yes.
For example: if a rational AI (not necessarily FAI) had to convince a human to cooperate with it, would it use rhetoric to leverage the human biases against it?
Yes. (When we say ‘rational agent’ or ‘rational AI’ we are usually referring to “instrumental rationality”. To a rational agent words are simply symbols to use to manipulate the environment. Speaking the truth, and even believing the truth are only loosely related concepts.
Would a FAI?
Almost certainly, but this may depend somewhat on who exactly it is ‘friendly’ to and what that person’s preferences happen to be.
That agrees with my intuitions. I had some series of ideas that ware developing around the idea that exploiting biases was sometimes necessary, and then I found:
I finally note, with regret, that in a world containing Persuaders, it may make sense for a second-order Informer to be deliberately eloquent if the issue has already been obscured by an eloquent Persuader—just exactly as elegant as the previous Persuader, no more, no less. It’s a pity that this wonderful excuse exists, but in the real world, well...
It would seem that in trying to defend others against heuristic exploitation it may be more expedient to exploit heuristics yourself.
I’m not sure where Eliezer got the ‘just exactly as elegant as the previous Persuader, no more, no less” part from. That seems completely arbitrary. As though the universe somehow decrees that optimal informing strategies must be ‘fair’.
Lately I’ve been wondering if a rational agent can be expected to use the dark arts when dealing with irrational agents. For example: if a rational AI (not necessarily FAI) had to convince a human to cooperate with it, would it use rhetoric to leverage the human biases against it? Would a FAI?
Calling them “dark arts” is itself a tactic for framing that only affects the less-rational parts of our judgement.
A purely rational agent will (the word “should” isn’t necessary here) of course use rhetoric, outright lies, and other manipulations to get irrational agents to behave in ways that further it’s goals.
The question gets difficult when there are no rational agents involved. Humans, for instance, even those who want to be rational most of the time, are very bad at judging when they’re wrong. For these irrational agents, it is good general advice not to lie or mislead anyone, at least if you have any significant uncertainty on the relative correctness of your positions on the given topic.
Put another way, persistent disagreement indicates mutual contempt for each others’ rationality. If the disagreement is resolvable, you don’t need the dark arts. If you’re considering the dark arts, it’s purely out of contempt.
If both parties are imperfectly rational, limited use of dark arts can speed things up. The question shouldn’t be whether it’s possible to present dry facts and logic with no spin, but whether it’s efficient. There are certain biases that tend to prevent ideas from even being considered. Using other biases and heuristics to counteract those biases—just to get more alternative explanations to be seriously considered—won’t impair or bypass the rationality of the listener.
Dark arts, huh? Sometime ago I put forward the following scenario:
Bob wants to kill a kitten. The FAI wants to save the kitten because it’s a good thing according to our CEV. So the FAI threatens Bob with 50 years of torture unless Bob lets the kitten go. The FAI has two distinct reasons why threatening Bob is okay: a) Bob will comply and there will be no need to torture him, b) the FAI is lying anyway. Expected utility reasoning says the FAI is doing the Right Thing. But do we want that?
(Yes, this is yet another riff on consequentialism, deontologism and lying. Should FAIs follow deontological rules? For that matter, should humans?)
Expected utility reasoning with a particular utility function says the FAI is right. If we disagree, our preferences might be described by some other utility function.
Is that actually the FAI’s only or best technique?
Off the top of my non-amplified brain:
Reward Fred for not torturing kittens.
Give Fred simulated kittens to torture and deny Fred access to real kittens.
Give Fred something harmless to do which he likes better than torturing kittens.
ETA Convince Fred that torturing kittens is wrong.
our CEV is (and has to be) detailed enough to answer the question of “do we want that?”. Saving a kitten is a good thing. Being truthful to Bob is a good thing. Not torturing Bob is a good thing. The relative weights of these good things determines the FAI’s actions.
I’d say that the FAI should calculate some game-theoretic chance of torturing Bob for 50 years based on relative pain of kitten death and of having to inflict the torture. Depending on Bob’s expected rationality level, we could tell him “you’ll be tortured”, or “you might be tortured”, or the actual mechanism of determining whether he is tortured.
Actually, strike that. Any competent AI will find ways aside from possible torture to make Bob not want that. Either agree with Bob’s reason for killing the kitten, or fix him so he only wants things that make sense. I’m not sure how friendly this is—I haven’t seen a good writeup or come to any conclusions myself of what FAI does with internal contradictions in a CEV (that is, when a population’s extrapolated volition is not coherent).
My thoughts about this problem are kind of a mess right now, but I feel there’s more than meets the eye.
Ignore the torture, “possible torture” and all that. It’s all a red herring. The real issue is lying, tricking humans into utility-increasing behaviors. It’s almost certain that some combination of “relative weights of good things” will make the FAI lie to humans. Maybe not the Bob+kitten scenario exactly, but something is bound to turn up. (Unless of course our CEV places a huge disutility on lies, which I’m pretty sure won’t be the case.) On the other hand, we humans quickly jump to distrusting anyone who has lied in the past, even if we know it’s for our own good. So now the FAI has huge incentive to conceal its lies, prevent the news from spreading among humans. I don’t have enough brainpower to model this scenario further, but it troubles me.
Lying is a form of manipulation, and humans don’t want/like to be manipulated. If the CEV works, then it will understand human concepts like “trust” and “lying” and hopefully avoid it. The only situations where it will intentionally manipulate people is when it is trying to do what is best for humanity. In these cases, you don’t have to worry because the CEV is smarter then you, but is still trying to do the “right thing” that you would do if you knew everything it knew.
Well… that depends...
Exactly.
Yes.
Yes. (When we say ‘rational agent’ or ‘rational AI’ we are usually referring to “instrumental rationality”. To a rational agent words are simply symbols to use to manipulate the environment. Speaking the truth, and even believing the truth are only loosely related concepts.
Almost certainly, but this may depend somewhat on who exactly it is ‘friendly’ to and what that person’s preferences happen to be.
That agrees with my intuitions. I had some series of ideas that ware developing around the idea that exploiting biases was sometimes necessary, and then I found:
Eliezer on Informers and Persuaders
It would seem that in trying to defend others against heuristic exploitation it may be more expedient to exploit heuristics yourself.
I’m not sure where Eliezer got the ‘just exactly as elegant as the previous Persuader, no more, no less” part from. That seems completely arbitrary. As though the universe somehow decrees that optimal informing strategies must be ‘fair’.