Kaj_Sotala comments on quila’s Shortform

Kaj_Sotala 1 Jun 2024 11:14 UTC
3 points
0
You may find Superintelligence as a Cause or Cure for Risks of Astronomical Suffering of interest; among other things, it discusses s-risks that might come about from having unaligned AGI.
Superintelligence is related to three categories of suffering risk: suffering subroutines (Tomasik 2017), mind crime (Bostrom 2014) and flawed realization (Bostrom 2013).
5.1 Suffering subroutines
Humans have evolved to be capable of suffering, and while the question of which other animals are conscious or capable of suffering is controversial, pain analogues are present in a wide variety of animals. The U.S. National Research Council’s Committee on Recognition and Alleviation of Pain in Laboratory Animals (2004) argues that, based on the state of existing evidence, at least all vertebrates should be considered capable of experiencing pain.
Pain seems to have evolved because it has a functional purpose in guiding behavior: evolution having found it suggests that pain might be the simplest solution for achieving its purpose. A superintelligence which was building subagents, such as worker robots or disembodied cognitive agents, might then also construct them in such a way that they were capable of feeling pain—and thus possibly suffering (Metzinger 2015)—if that was the most efficient way of making them behave in a way that achieved the superintelligence’s goals.
Humans have also evolved to experience empathy towards each other, but the evolutionary reasons which cause humans to have empathy (Singer 1981) may not be relevant for a superintelligent singleton which had no game-theoretical reason to empathize with others. In such a case, a superintelligence which had no disincentive to create suffering but did have an incentive to create whatever furthered its goals, could create vast populations of agents which sometimes suffered while carrying out the superintelligence’s goals. Because of the ruling superintelligence’s indifference towards suffering, the amount of suffering experienced by this population could be vastly higher than it would be in e.g. an advanced human civilization, where humans had an interest in helping out their fellow humans.
Depending on the functional purpose of positive mental states such as happiness, the subagents might or might not be built to experience them. For example, Fredrickson (1998) suggests that positive and negative emotions have differing functions. Negative emotions bias an individual’s thoughts and actions towards some relatively specific response that has been evolutionarily adaptive: fear causes an urge to escape, anger causes an urge to attack, disgust an urge to be rid of the disgusting thing, and so on. In contrast, positive emotions bias thought-action tendencies in a much less specific direction. For example, joy creates an urge to play and be playful, but “play” includes a very wide range of behaviors, including physical, social, intellectual, and artistic play. All of these behaviors have the effect of developing the individual’s skills in whatever the domain. The overall effect of experiencing positive emotions is to build an individual’s resources—be those resources physical, intellectual, or social.
To the extent that this hypothesis were true, a superintelligence might design its subagents in such a way that they had pre-determined response patterns for undesirable situations, so exhibited negative emotions. However, if it was constructing a kind of a command economy in which it desired to remain in control, it might not put a high value on any subagent accumulating individual resources. Intellectual resources would be valued to the extent that they contributed to the subagent doing its job, but physical and social resources could be irrelevant, if the subagents were provided with whatever resources necessary for doing their tasks. In such a case, the end result could be a world whose inhabitants experienced very little if any in the way of positive emotions, but did experience negative emotions. [...]
5.2 Mind crime
A superintelligence might run simulations of sentient beings for a variety of purposes. Bostrom (2014, p. 152) discusses the specific possibility of an AI creating simulations of human beings which were detailed enough to be conscious. These simulations could then be placed in a variety of situations in order to study things such as human psychology and sociology, and be destroyed afterwards.
The AI could also run simulations that modeled the evolutionary history of life on Earth in order to obtain various kinds of scientific information, or to help estimate the likely location of the “Great Filter” (Hanson 1998) and whether it should expect to encounter other intelligent civilizations. This could repeat the wildanimal suffering (Tomasik 2015, Dorado 2015) experienced in Earth’s evolutionary history. The AI could also create and mistreat, or threaten to mistreat, various minds as a way to blackmail other agents. [...]
5.3 Flawed realization
A superintelligence with human-aligned values might aim to convert the resources in its reach into clusters of utopia, and seek to colonize the universe in order to maximize the value of the world (Bostrom 2003a), filling the universe with new minds and valuable experiences and resources. At the same time, if the superintelligence had the wrong goals, this could result in a universe filled by vast amounts of disvalue.
While some mistakes in value loading may result in a superintelligence whose goal is completely unlike what people value, certain mistakes could result in flawed realization (Bostrom 2013). In this outcome, the superintelligence’s goal gets human values mostly right, in the sense of sharing many similarities with what we value, but also contains a flaw that drastically changes the intended outcome.
For example, value-extrapolation (Yudkowsky 2004) and value-learning (Soares 2016, Sotala 2016) approaches attempt to learn human values in order to create a world that is in accordance with those values.
There have been occasions in history when circumstances that cause suffering have been defended by appealing to values which seem pointless to modern sensibilities, but which were nonetheless a part of the prevailing values at the time. In Victorian London, the use of anesthesia in childbirth was opposed on the grounds that being under the partial influence of anesthetics may cause “improper” and “lascivious” sexual dreams (Farr 1980), with this being considered more important to avoid than the pain of childbirth.
A flawed value-loading process might give disproportionate weight to historical, existing, or incorrectly extrapolated future values whose realization then becomes more important than the avoidance of suffering. Besides merely considering the avoidance of suffering less important than the enabling of other values, a flawed process might also tap into various human tendencies for endorsing or celebrating cruelty (see the discussion in section 4), or outright glorifying suffering. Small changes to a recipe for utopia may lead to a future with much more suffering than one shaped by a superintelligence whose goals were completely different from ours.
What links here?
- quila's comment on quila’s Shortform by quila (1 Jun 2024 10:14 UTC; 13 points)
- quila's comment on Evaluating the historical value misspecification argument by Matthew Barnett (3 Aug 2024 2:31 UTC; 1 point)
- quila 1 Jun 2024 12:10 UTC
  3 points
  0
  Parent
  thanks for sharing. here’s my thoughts on the possibilities in the quote.
  Suffering subroutines—maybe 10-20% likely. i don’t think suffering reduces to “pre-determined response patterns for undesirable situations,” because i can think of simple algorithmic examples of that which don’t seem like suffering.
  suffering feels like it’s about the sense of aversion/badness (often in response a situation), and not about the policy “in <situation>, steer towards <new situation>”. (maybe humans were instilled with a policy of steering away from ‘suffering’ states generally, and that’s why evolution made us enter those states in some types of situation?). (though i’m confused about what suffering really is)
  i would also give the example of positive-feeling emotions sometimes being narrowly directed. for example, someone can feel ‘excitement/joy’ about a gift or event and want to <go to/participate in> it. sexual and romantic subroutines can also be both narrowly-directed and positive-feeling. though these examples lack the element of a situation being steered away from, vs steering (from e.g any neutral situation) towards other ones.
  Suffering simulations—seems likely (75%?) for the estimation of universal attributes, such as the distribution of values. my main uncertainty is about whether there’s some other way for the ASIs to compute that information which is simple enough to be suffering free. this also seems lower magnitude than other classes, because (unless it’s being calculated indefinetely for ever-greater precision) this computation terminates at some point, rather than lasting until heat death (or forever if it turns out that’s avoidable).
  Blackmail—i don’t feel knowledgeable enough about decision theory to put a probability on this one, but in the case where it works (or is precommitted to under uncertainty in hopes that it works), it’s unfortunately a case where building aligned ASI would incentive unaligned entities to do it.
  Flawed realization—again i’m too uncertain about what real-world paths lead to this, but intuitively, it’s worryingly possible if the future contains LLM-based LTPAs (long term planning agents) intelligent enough to solve alignment and implement their own (possibly simulated) ‘values’.
  - Kaj_Sotala 1 Jun 2024 14:50 UTC
    3 points
    0
    Parent
    Suffering subroutines—maybe 10-20% likely. i don’t think suffering reduces to “pre-determined response patterns for undesirable situations,” because i can think of simple algorithmic examples of that which don’t seem like suffering.
    Yeah, I agree with this to be clear. Our intended claim wasn’t that just “pre-determined response patterns for undesirable situations” would be enough for suffering. Actually, there were meant to be two separate claims, which I guess we should have distinguished more clearly:
    1) If evolution stumbled on pain and suffering, those might be relatively easy and natural ways to get a mind to do something. So an AGI that built other AGIs might also build them to experience pain and suffering (that it was entirely indifferent to), if that happened to be an effective motivational system.
    2) If this did happen, then there’s also some speculation suggesting that an AI that wanted to stay in charge might not want to give its worker AGIs things much in the way of things that looked like positive emotions, but did have a reason to give them things that looked like negative emotions. Which would then tilt the balance of pleasure vs. pain in the post-AGI world much more heavily in favor of (emotional) pain.
    Now the second claim is much more speculative and I don’t even know if I’d consider it a particularly likely scenario (probably not); we just put it in since much of the paper was just generally listing various possibilities of what might happen. But the first claim—that since all the biological minds we know of seem to run on something like pain and pleasure, we should put a substantial probability on AGI architectures also ending up with something like that—seems much stronger to me.