Assuming your AI has a correct ungameable understanding of what a “human” is, it could do things such as:
genetic engineering. People with Williams-Beuren syndrome are pathologically trusting. Decrease in intelligence makes you less likely to uncover any lies.
drug you. Oxytocin hormone can increase trust, some drugs like alcohol or LSD were shown to increase suggestibility, which may also mean they make you more trusting?
Once you uncovered AI’s lies, it can erase your memories by damaging a part of your brain .
Or maybe it can give you anterograde amnesia somehow, then you’re less likely to form troublesome memories in the first place.
If the AI cannot immediately recover lost trust through the methods above, it may want to isolate mistrustful people from the rest of society.
and make sure to convince everyone that if they lose faith in AI, they’ll go to hell. Maybe actually make it happen.
And, this is a minor point, but I think you are severely overestimating the effect of uncovering a lie on people’s trust. In my experience, people’s typical reaction to discovering that their favorite leader lied is to keep going as usual. For example:
[Politics warning, political examples follow]:
In his 2013 election campain, Navalny claimed that migrants commit 50% crimes in Moscow, contradicting both common sense (in 2013, around 8 − 17 % of Moscow population were migrants) and official crime statistics that says migrants and stateless people commited 25% of crimes. Many liberal Russians recognise it as a lie but keep supporting Navalny, and argue that Navalny has since changed and abandoned his chauvinist views. Navalny has not made any such statement.
You’re right, there are a thousand ways an AGI could use deception to manipulate humans into trusting it. But this would be a dishonest strategy. The interesting question to me is whether under certain circumstances, just being honest would be better in the long run. This depends on the actual formulation goal/reward function and the definitions. For example, we could try to define trust in a way that things like force, amnesia, drugs, hypnosis, and other means of influence are ruled out by definition. This is of course not easy, but as stated above, we’re not claiming we’ve solved all problems.
In my experience, people’s typical reaction to discovering that their favorite leader lied is to keep going as usual.
That’s a valid point. However, in these cases, “trust” has two different dimensions. One is the trust in what a leader says, and I believe that even the most loyal followers realize that Putin often lies, so they won’t believe everything he says. The other is trust that the leader is “right for them”, because even with his lies and deception he is beneficial to their own goals. I guess that is what their “trust” is really grounded on—“if Putin wins, I win, so I’ll accept his lies, because they benefit me”. From their perspective, Putin isn’t “evil”, even though they know he lies. If, however, he’d suddenly act against their own interests, they’d feel betrayed, even if he never lied about that.
An honest trust maximizer would have to win both arguments, and to do that it would have to find ways to benefit even groups with conflicting interests, ultimately bridging most of their divisions. This seems like an impossible task, but human leaders have achieved something like that before, reconciling their nations and creating a sense of unity, so a superintelligence should be able to do it as well.
Assuming your AI has a correct ungameable understanding of what a “human” is, it could do things such as:
genetic engineering. People with Williams-Beuren syndrome are pathologically trusting. Decrease in intelligence makes you less likely to uncover any lies.
drug you. Oxytocin hormone can increase trust, some drugs like alcohol or LSD were shown to increase suggestibility, which may also mean they make you more trusting?
Once you uncovered AI’s lies, it can erase your memories by damaging a part of your brain .
Or maybe it can give you anterograde amnesia somehow, then you’re less likely to form troublesome memories in the first place.
If the AI cannot immediately recover lost trust through the methods above, it may want to isolate mistrustful people from the rest of society.
and make sure to convince everyone that if they lose faith in AI, they’ll go to hell. Maybe actually make it happen.
And, this is a minor point, but I think you are severely overestimating the effect of uncovering a lie on people’s trust. In my experience, people’s typical reaction to discovering that their favorite leader lied is to keep going as usual. For example:
[Politics warning, political examples follow]:
In his 2013 election campain, Navalny claimed that migrants commit 50% crimes in Moscow, contradicting both common sense (in 2013, around 8 − 17 % of Moscow population were migrants) and official crime statistics that says migrants and stateless people commited 25% of crimes. Many liberal Russians recognise it as a lie but keep supporting Navalny, and argue that Navalny has since changed and abandoned his chauvinist views. Navalny has not made any such statement.
Some Putin’s supporters say things like “So what if he rigged the election? He would’ve won even without rigging anyway” or “For the sacred mission [of invading Ukraine], the whole country will lie!”.
Once people have decided that you’re “on their side”, they will often ignore all evidence that you’re evil.
You’re right, there are a thousand ways an AGI could use deception to manipulate humans into trusting it. But this would be a dishonest strategy. The interesting question to me is whether under certain circumstances, just being honest would be better in the long run. This depends on the actual formulation goal/reward function and the definitions. For example, we could try to define trust in a way that things like force, amnesia, drugs, hypnosis, and other means of influence are ruled out by definition. This is of course not easy, but as stated above, we’re not claiming we’ve solved all problems.
That’s a valid point. However, in these cases, “trust” has two different dimensions. One is the trust in what a leader says, and I believe that even the most loyal followers realize that Putin often lies, so they won’t believe everything he says. The other is trust that the leader is “right for them”, because even with his lies and deception he is beneficial to their own goals. I guess that is what their “trust” is really grounded on—“if Putin wins, I win, so I’ll accept his lies, because they benefit me”. From their perspective, Putin isn’t “evil”, even though they know he lies. If, however, he’d suddenly act against their own interests, they’d feel betrayed, even if he never lied about that.
An honest trust maximizer would have to win both arguments, and to do that it would have to find ways to benefit even groups with conflicting interests, ultimately bridging most of their divisions. This seems like an impossible task, but human leaders have achieved something like that before, reconciling their nations and creating a sense of unity, so a superintelligence should be able to do it as well.