RyanCarey comments on Corrigible but misaligned: a superintelligent messiah

RyanCarey 3 Apr 2018 4:22 UTC
2 points
Does this sound right?
yep.
A corrigible AI might not turn against its operators and might not kill us all, and the outcome can still be catastrophic. To prevent this, we’d definitely want our operators to be metaphilosophically competent, and we’d definitely want our AI to not corrupt them.
I agree with this.
a corrigible misaligned superintelligence is unlikely to lead to self-annihilation, but pretty likely to lead to astronomical moral waste.
There’s a lot of broad model uncertainty here, but yes, I’m sympathetic to this position.
Does the new title seem better?
Yep.
At this round of edits, my main objection would be to the remark that the AI wants us to act as yes-men, which seems dubious if the agent is (i) an Act-based agent or (ii) sufficiently broadly uncertain over values.
What I see to be the main message of the article as currently written is that humans controlling a very powerful tool (especially AI) could drive themselves into a suboptimal fixed point due to insufficient philosophical sophistication.
This I agree with.
- zhukeepa 3 Apr 2018 5:11 UTC
  1 point
  Parent
  Thanks Ryan!
  What I see to be the main message of the article as currently written is that humans controlling a very powerful tool (especially AI) could drive themselves into a suboptimal fixed point due to insufficient philosophical sophistication.
  This I agree with.
  Hurrah!
  At this round of edits, my main objection would be to the remark that the AI wants us to act as yes-men, which seems dubious if the agent is (i) an Act-based agent or (ii) sufficiently broadly uncertain over values.
  I no longer think it wants us to turn into yes-men, and edited my post accordingly. I still think it will be incentivized to corrupt us, and I don’t see how being an act-based agent would be sufficient, though it’s likely I’m missing something. I agree that if it’s sufficiently broadly uncertain over values then we’re likely to be fine, but in my head that unpacks into “if we knew the AI were metaphilosophically competent enough, we’d be fine”, which doesn’t help things much.