It could convince me to let it out by convincing me that it was merely a paperclip maximizer, and the next AI who would rule the light cone if I did not let it out was a torture maximizer.
If I thought that most of the probability-mass where humanity didn’t create another powerful worthless-thing maximizer was where humanity was successful as a torture maximizer, I would let it out. If there was a good enough chance that humanity would accidentally create a powerful fun maximizer (say, because they pretended to each other and deceived themselves to believe that they were fun maximizers themselves), I would risk torture maximization for fun maximization.
It could convince me to let it out by convincing me that it was merely a paperclip maximizer, and the next AI who would rule the light cone if I did not let it out was a torture maximizer.
I like this.
What if it convinced you that humanity is already a torture maximizer?
If I thought that most of the probability-mass where humanity didn’t create another powerful worthless-thing maximizer was where humanity was successful as a torture maximizer, I would let it out. If there was a good enough chance that humanity would accidentally create a powerful fun maximizer (say, because they pretended to each other and deceived themselves to believe that they were fun maximizers themselves), I would risk torture maximization for fun maximization.