Anon User comments on Loose thoughts on AGI risk

Anon User 23 Jun 2022 1:14 UTC
5 points
Balckmail is only possible once malicious AI is discovered, identified, and only if the humanity is somewhat united against it. Given AI’s likely ability to first stay under the radar, then hack human politics and inject enough confusion into the conversation, this is not very likely.
- Yitz 23 Jun 2022 1:36 UTC
  1 point
  Parent
  The goal would be to start any experiment which might plausibly lead to AGI with a metaphorical gun to the computer’s head, such that being less than (observably) perfectly honest with us, “pulling any funny business,” etc. would lead to its destruction. If you can make it so that the path of least resistance to make it safely out of a box is to cooperate rather than try to defect and risk getting caught, you should be able to productively manipulate AGIs in many (albeit not all) possible worlds. Obviously this should be done on top of other alignment methods, but I doubt it would hurt things much, and would likely help as a significant buffer.
  - Yitz 23 Jun 2022 3:14 UTC
    2 points
    Parent
    On reflection, I see your point, and will cross that section out for now, with the caveat that there may be variants of this idea which have significant safety value.