Ratios comments on Selfishness, preference falsification, and AI alignment

Ratios 28 Oct 2021 17:56 UTC
5 points
I feel like the elephant in the AI alignment room has to do with an even more horrible truth. What if the game is adversarial by nature? Imagine a chess game: would it make sense to build an AI that is aligned both with the black and the white player? It feels almost like a koan.
Status (both domination and prestige) and sexual stuff (not only intra-sexual competition) have ingrained adversarial elements in it—and the desire for both is a massive part of the human utility function. So you can perhaps align AI to a person or a group, but to keep coherence there must be losers because we care too much about position, and to be in the top position enforces to have people in the bottom position.
A human utility function is not very far from the utility function of a chimp, should we really use this as the basis for the utility function for the super-intelligence that builds von Neumann drones? No, a true “view-from-nowhere good” AI shouldn’t be aligned with humans at all.
What links here?
- jessicata's comment on Selfishness, preference falsification, and AI alignment by jessicata (29 Oct 2021 19:38 UTC; 2 points)
- Ratios's comment on Morality is Scary by Wei Dai (2 Dec 2021 17:28 UTC; 2 points)
- jessicata 29 Oct 2021 2:44 UTC
  5 points
  Parent
  If our minds expect to function in a partly-adversarial world, then a FAI may decide to place us in a partly-adversarial world, at least to avoid pushing our minds into a weird part of probability space where behaviors and values applicable to normal scenarios stop being applicable. (This is similar to ecosystem and habitat management applied to animals)
  
  Playing chess involves a preference for playing chess (including a preference that the rules are followed), and subject to playing chess, a preference for winning. Someone who didn’t properly have a preference for playing chess, such as a pigeon, would not be properly considered to be “playing chess”; their moves would not even be evaluable as attempts to win or lose, as they would not be following the rules of the game in the first place. This is similar to a point made by Finite and Infinite Games:
  
  There is no finite game unless the players freely choose to play it. No one can play who is forced to play.
  
  A preference for playing a game according to rules would be a law-level preference (referenced in the post).
  
  So a preference to engage in sexual competition would include a law-level preference to exist in a world that has sexual competition functioning according to certain guidelines, as well as a preference to succeed in the specific sexual competition.
  - Ratios 29 Oct 2021 9:47 UTC
    1 point
    Parent
    This solves the preference to play—but doesn’t solve the preference to win/outcompete other humans. The only way to solve the preference to win is to create a nozick-experience-machine style existence where some of the players are actually NPCs that are indistinguishable from players [1] (The white chess players wins 80% of the time, but doesn’t understand that the black player is actually a bot). In any other scenario, it’s impossible to get a human to win without having another human to lose which means the preference to win will be thwarted on aggregate.
    But for an FAI to spend vast amounts of free energy to create simulations of experience machines just seems wrong in a very fundamental sense, seems just like wireheading with extra steps.
    [1] - This gives me the faint hope that we are already in this kind of scenario, meaning the 50 billion chickens we kill each year and the people that have a life that is best described as a living hell have no qualia. But unfortunately, I would have to bet against it.
    - jessicata 29 Oct 2021 17:26 UTC
      3 points
      Parent
      Yes, there either have to be NPCs or a lot of real people have to lose. But that’s simply a mathematical constraint of actually playing against people like yourself. There’s enjoyment taken in the possibility of losing (and actually losing sometimes, seeing what went wrong).