MichaelStJules comments on Key questions about artificial sentience: an opinionated guide

MichaelStJules 25 Apr 2022 20:25 UTC
4 points
The training and behavior of these two systems would be identical, in spite of the shift in the value of the rewards. Does simply shifting the numerical value of the reward to “positive” correspond to a deeper shift towards positive valence? It seems strange that simply switching the sign of a scalar value could be affecting valence in this way. Imagine shifting the reward signal for agents with more complex avoidance behavior and verbal reports. Lenhart Schubert (quoted in Tomasik (2014), from whom I take this point) remarks: “If the shift…causes no behavioural change, then the robot (analogously, a person) would still behave as if suffering, yelling for help, etc., when injured or otherwise in trouble, so it seems that the pain would not have been banished after all!”
So valence seems to depend on something more complex than the mere numerical value of the reward signal. For example, perhaps it depends on prediction error in certain ways. Or perhaps the balance of pain and pleasure depends on efficient coding schemes which minimize the cost of reward signals / pain and pleasure themselves: this is the thought behind Yew‑Kwang Ng’s work on wild animal welfare, and Shlegeris’s brief remarks inspired by this work.
I think attention probably plays an important role in valence. States with high intensity valence (of either sign, or at least negative; I’m less sure how pleasure works) tend to take immediate priority, in terms of attention and, (at least partly) consequently, behaviour. The Welfare Footprint Project (funded by Open Phil for animal welfare) defines pain intensity based on attention and priority, with more intense pains harder to ignore and pain intensity clusters for annoying, hurtful, disabling and excruciating. If you were to shift rewards and change nothing else, how much attention and priority is given to various things would have to change, and so the behaviour would, too. One example I like to give is that if we shifted valence into the positive range without adjusting attention or anything else, animals would continue to eat while being attacked, because the high positive valence from eating would get greater priority than the easily ignorable low positive or neutral valence from being attacked.
The costs of reward signals or pain and pleasure themselves help explain why it’s evolutionarily adaptive to have positive and negative values with valence roughly balanced around low neural activity neutral states, but I don’t see why it would follow that it’s a necessary feature of valence that it should be balanced around neutral (which would surely be context-specific).
I’m less sure either way about prediction error. I guess it would have to be somewhat low-level or non-reflective (my understanding is that it usually is, although I’m barely familiar with these approaches), since surely we can accurately predict something and its valence in our reportable awareness, and still find it unpleasant or pleasant.
What links here?
- MichaelStJules's comment on The case to abolish the biology of suffering as a longtermist action by Gaetan_Selle (EA Forum; 21 May 2022 18:18 UTC; 9 points)
- MichaelStJules 30 Apr 2022 16:08 UTC
  1 point
  Parent
  Also, there’s a natural neutral/zero point for rewards separating pleasure and suffering in animals: subnetwork inactivity. Not just unconsciousness, but some experiences feel like they have no or close to no valence, and this is probably reflected in lower activity in valence-generating subnetworks. If this doesn’t hold for some entity, we should be skeptical that they experience pleasure or suffering at all. (They could experience only one, with the neutral point strictly to one side, the low valence intensity side.)
  
  Plus, my guess is that pleasure and suffering are generated in not totally overlapping structures of animal brains, so shifting rewards more negative, say, would mean more activity in suffering structures and less in the pleasure structures.
  
  Still, one thing I wonder is whether preferences without natural neutral points can still matter. Someone can have preferences between two precise states of affairs (or features of them), but not believe either is good or bad in absolute terms. They could even have something like moods that are ranked, but no neutral mood. Such values could still potentially be aggregated for comparisons, but you’d probably need to use some kind of preference-affecting view if you don’t want to make arbitrary assumptions about where the neutral points should be.
- Robbo 26 Apr 2022 12:53 UTC
  1 point
  Parent
  Thanks for this great comment! Will reply to the substantive stuff later, but first—I hadn’t heard of the The Welfare Footprint Project! Super interesting and relevant, thanks for bringing to my attention