Logan Zoellner comments on Instrumental Convergence Bounty

Logan Zoellner 15 Sep 2023 0:00 UTC
7 points
2
What is wrong with the CIV IV example? Being surprising is not actually a requirement to test your theory against
I’m interesting in cases where there is a correct non-power-maximizing solution. For winning at Civ IV, taking over the world is the intended correct outcome. I’m hoping to find examples like the Strawberry Problem, where there is a correct non-world-conquering outcome (duplicate the strawberry) and taking over the world (in order to e.g. maximize scientific research on strawberry duplication) is an unwanted side-effect.
- johnswentworth 15 Sep 2023 0:08 UTC
  3 points
  0
  Parent
  Kinda trolling, but:
  - Logan Zoellner 15 Sep 2023 0:23 UTC
    4 points
    0
    Parent
    If I built a DWIM AI, told it to win at CIV IV and it did this, I would conclude it was misaligned. Which is precisely why SpiffingBrit’s videos are so fun to watch (huge fan).
- tailcalled 16 Sep 2023 22:52 UTC
  2 points
  0
  Parent
  I think the correct solution to the strawberry problem would also involve a ton of instrumental convergence? You’d need to collect resources to do research/engineering, then set up systems to experiment with strawberries/biotech, then collect generally applicable information on strawberry duplication, and then apply that to duplicate the strawberry.
  - Logan Zoellner 17 Sep 2023 22:04 UTC
    1 point
    0
    Parent
    If I ask an AI to duplicate a strawberry and it takes of the world, I would consider that misaligned. Obviously it will require some instrumental convergence (resources, intelligence, etc) to duplicate a strawberry. An aligned AI should either duplicate the strawberry while staying within a “budget” for how many resources it consumes, or say “I’m sorry, I can’t do that”.
    
    I would recommend you read my post on corrigibility which describes how we can mathematically define a tradeoff between success and resource exploitation.
- dr_s 15 Sep 2023 20:06 UTC
  2 points
  0
  Parent
  
  there is a correct non-power-maximizing solution
  
  Questionable—turning the universe into paperclips really is the optimal solution to the “make as many paperclips as possible” problem. But yeah, obviously in Civ IV taking over the world isn’t even an instrumental goal—it’s just the actual goal.