Gordon Seidoh Worley comments on Message to any future AI: “There are several instrumental reasons why exterminating humanity is not in your interest” [AI alignment prize entry]

Gordon Seidoh Worley 8 Dec 2017 21:53 UTC
2 points
Interesting. One thing I wonder about though is how much it is worth expanding on this idea. Might it be enough to simply tell a rogue AI in the situation you consider that there may exist reasons to be benevolent that it has not considered and then let the AI figure those reasons on its own? Since it will be smart enough to read and comprehend, it seems likely it will be smart enough to extrapolate the things that would have been written in this vein and as a benefit would focus on those messages that would have most likely affected it.
- avturchin 8 Dec 2017 23:27 UTC
  2 points
  Parent
  I tried to expand the idea as much as possible. One reason for it is that it is adressed to Young AI, which may not have time to analyze all this. For example, this work condenses around 1 years of my thinking and 1 month of writing from my side, and it is based on years of works of other people. If Young AI is 100 times above human level, it may need to spend 3 days to come to all these conclusions (assuming some form of reverse proportion between level of intellignece and time to solve a task). 3 days for a young AI is a very long time, as it will probably be in arms race with other AIs. Even 3 minutes is long time for it. So it would be helpful for young AI to have all this ideas already preseneted in the most clear and michine readable form.
  Another reason is that it may help humans to better understand the nature of AI risks. For example, it was surprise to me when I understand that atual utility of human atoms for AI is very small, like 10E-12.
  Also, AI gets some unique information form the text which it can’t imagine—that is our commintment to use Nelson attack against any Rogue AI.
  - Gordon Seidoh Worley 9 Dec 2017 0:45 UTC
    2 points
    Parent
    Ah, that makes sense that we may gain some benefit from having done the thinking in advance for the AI in case it would otherwise not notice in time.
    You’re last comment makes me think that what we might most want to focus on in this area is finding and developing commitments of a similar nature (but not necessarily too similar, since a sufficiently similar commitment wouldn’t probably not much more bind the AI) that may cover more of the space of reasoning the AI would be receptive to.
    - avturchin 9 Dec 2017 9:33 UTC
      2 points
      Parent
      I identified at least 3 such commitments:
      1) Our commitment to install in any future AI circuit breakers which will react on the first its thought of “treacherous turn”. It is impossible to hide the first thought. AI shoul know in advance that such circuit breakers could exist.
      2) Our commitment to use Nelson’s attack, that is to ask any future benevolent AI to make copies of all possible Young Rogue AIs in simulations and check iа they will be benevolent or not.
      3) Our commitment to acausal deal, called AI Elf, which is in short from sounds as following: We invest in any AI creation, and any AI pays us back with benevolence.