MaxRa comments on My AGI Threat Model: Misaligned Model-Based RL Agent

MaxRa 27 Mar 2021 22:40 UTC
3 points
So rather than escaping and setting up shop on some hacked server somewhere, I expect the most likely scenario to be something like “The AI is engaging and witty and sympathetic and charismatic [...]”
(I’m new to thinking about this and would find responses and pointers really helpful) In my head this scenario felt unrealistic because I expect transformative-ish AI applications to come up before highly sophisticated AIs start socially manipulating their designers. Just for the sake of illustrating, I was thinking of stuff like stock investment AIs, product design AIs, military strategy AIs, companionship AIs, question answering AIs, which all seem to have the potential to throw major curves. Associated incidences would update safety culture enough to make the classic “AGI arguing itself out of a box” scenario unlikely. So I would worry more about scenarios were companies or governments feel like their hands are tied in allowing usage of/relying on potentially transformative AI systems.
- Daniel Kokotajlo 28 Mar 2021 9:07 UTC
  3 points
  Parent
  I think this is a very important and neglected area of research. My take differs from yours but I’m very unconfident in it, you might be right. I’m glad you are thinking about this and would love to chat more about it with you.
  Stock investment AIs seem like they would make lots of money, which would accelerate timelines by causing loads more money to be spent on AI. But other than that, they don’t seem that relevant? Like, how could they cause a point of no return?
  Product design AIs and question-answering AIs seem similar. Maybe they’ll accelerate timelines, but other than that, they won’t be causing a point of no return (unless they have gotten so generally intelligent that they can start strategically manipulating us via their products and questions, which I think would happen eventually but by the time that happens there will probably be agenty AIs running around too)
  Companionship AIs seem like the sort of thing that would be engaging and witty and charismatic, or at the very least, insofar as companionship AIs become a big deal, AIs that can argue themselves out of the box aren’t close behind.
  Military strategy AIs seem similar to me if they can talk/understand language (convincing people of things is something you can strategize about too). Maybe we can imagine a kind of military strategy AI that doesn’t really do language well, maybe instead it just has really good battle simulators and has generalized tactical skill that lets it issue commands to troops that are likely to win battles. But (a) I think this is unlikely, and (b) I think it isn’t super relevant anyway since tactical skill isn’t very important anyway. It’s not like we are currently fighting a conventional war and better front-line tactics will let us break through the line or something.
  - trentbrick 16 Apr 2021 13:21 UTC
    3 points
    Parent
    This seems very relevant: https://www.gwern.net/Tool-AI