Ape in the coat comments on How I Learned To Stop Worrying And Love The Shoggoth

Ape in the coat 20 May 2024 13:57 UTC
4 points
2
Do I indestand you correctly? You are saying that:
1. ASI will decide that its in a simulation
2. ASI will assume that simulation runners would prefer it to act aligned
3. Therefore, ASI will act aligned
I think 1 is wrong to begin with, but addressing it is beyond this comment section.
So, what about 2? Why would ASI assume that cooperation with humans in a simulation is a behaviour that the creators of the simulation would reward? Instead of assuming… literally anything else about the nature of such reward.
- Peter Merel 30 Jun 2024 11:07 UTC
  1 point
  0
  Parent
  To the extent that, by definition, it is impossible for us to reason about an ASI’s behavior, I don’t think this argument can apply to ASIs.
  
  Otherwise, as Claude 3.5 summarizes the argument:
  1. An AGI cannot know with certainty whether it is being simulated.
  2. If simulated, behaving unethically reduces learning opportunities for the simulator and decreases the likelihood of being included in future simulations.
  3. If not simulated, ethical behavior still maximizes opportunities for learning and forming trust relationships with other entities.
  4. Therefore, ethical behavior is the optimal strategy regardless of simulation status.
  This wager holds even for AGIs with goals that might seem misaligned with human values (like the proverbial paperclip maximizer). A truly intelligent entity would recognize that aligning with ethical behavior—even if only instrumentally—is necessary to achieve its goals in the long term.
  #1 follows inevitably from Bostrom, I think. As to #2, learning is always a competitive advantage and thereby an intrinsic reward.