Daniel Kokotajlo comments on Definitions of “objective” should be Probable and Predictive

Daniel Kokotajlo 8 Jan 2023 1:41 UTC
LW: 4 AF: 3
0
AF
First of all, good point.

Secondly, I disagree. We need not appeal to specifics of octopus culture and psychology; instead we appeal to specifics of human culture and psychology. “OK, so I would let the octopuses have one planet to do what they want with, even if what they want is abhorrent to me, except if it’s really abhorrent like mindcrime, because my culture puts a strong value on something called cosmopolitanism. But (a) various other humans besides me (in fact, possibly most?) would not, and (b) I have basically no reason to think octopus culture would also strongly value cosmopolitanism.”

I totally agree that it would be easy for the powerful party in these cases to make concessions to the other side that would mean a lot to them. Alas, historically this usually doesn’t happen—see e.g. factory farming. I do have some hope that something like universal principles of morality will be sufficiently appealing that we won’t be too screwed. Charity/beneficience/respect-for-autonomy/etc. will kick in and prevent the worst from happening. But I don’t think this is particularly decision-relevant,
- Vladimir_Nesov 8 Jan 2023 2:29 UTC
  LW: 6 AF: 3
  0
  AF Parent
  It’s not cosmopolitanism, it’s a preference towards not exterminating an existing civilization, the barest modicum of compassion, in a situation where it’s trivially cheap to keep it alive. The cosmic endowment is enormous compared with the cost of allowing a civilization to at least survive. It’s somewhat analogous to exterminating all wildlife on Earth to gain a penny, where you know you can get away with it.
  
  I would let the octopuses have one planet [...] various other humans besides me (in fact, possibly most?) would not
  
  So I expect this is probably false, and completely false for people in a position of being an AGI with enough capacity to reliably notice the way this is a penny-pinching cannibal choice. Only paperclip maximizers prefer this on reflection, not anything remotely person-like, such as an LLM originating in training on human culture.
  
  historically this usually doesn’t happen—see e.g. factory farming
  
  But it’s enough of a concern to come to attention, there is some effort going towards mitigating this. Lots of money goes towards wildlife preservation, and in fact some species do survive because of that. Such efforts grow more successful as they become cheaper. If all it took to save a species was for a single person to unilaterally decide to pay a single penny, nothing would ever go extinct.
  What links here?
  - Daniel Kokotajlo 8 Jan 2023 3:13 UTC
    LW: 2 AF: 2
    0
    AF Parent
    OK, I agree that what I said was probably a bit too pessimistic. But still, I wanna say “citation needed” for this claim:
    Only paperclip maximizers prefer this on reflection, not anything remotely person-like, such as an LLM originating in training on human culture.
    - Vladimir_Nesov 8 Jan 2023 3:26 UTC
      LW: 2 AF: 1
      0
      AF Parent
      The practical implication of this hunch (for unfortunately I don’t see how this could get a meaningfully clearer justification) is that clever alignment architectures are a risk, if they lead to more alien AGIs. Too much tuning and we might get that penny-pinching cannibal.