Noosphere89 comments on quila’s Shortform

Noosphere89 10 Jan 2025 21:34 UTC
2 points
0

i’m not sure what this means. my values basically refer to other beings having not-tormentful (and next in order of priority, happy/good) existences. (tried to formalize this more but it’s hard)

That would immediately exclude quite a bit of people, from both the far left and far right, because I predict a lot of people definitely want at least some people to have tormentful lives.

in particular, i’m not sure if you’re saying something which would seem trivially true to me or not. (example trivially true thing: someone who wants to tile literally the entire lightcone with happy humans not being able to do that is losing out under ‘cosmopolitan’ values relative to if their values controlled the entire lightcone. example trivially true thing 2: “the best possible world is relative to a given value set”)

I was trying to say something trivially true in your ontology, but far too many people tend to deny that you do in fact have to make other values lose out, and people usually think the best possible world is absolute, not relative, and in particular I think a lot of people use the idea of value-aligned superintelligence as though it was a magic wand that could solve all conflict.
- quila 12 Jan 2025 1:27 UTC
  1 point
  0
  Parent
  far too many people tend to deny that you do in fact have to make other values lose out
  i don’t know where that might be true, but at least on lesswrong i imagine it’s an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe). most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.
  also on the “lose out” phrasing: even if someone “wants at least some people to have tormentful lives”, they don’t “lose out” overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.
  - Noosphere89 12 Jan 2025 2:10 UTC
    4 points
    0
    Parent
    I think a crux I have with the entire alignment community may ultimately come down to me not believing that human values overlap strongly enough to make alignment the most positive thing, compared to other AI safety things.
    
    In particular, I’d expect a surprising amount of disagreement on whether making a hell is good, if you managed to sell it as eternally punishing a favored enemy.
    
    most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.
    
    I agree LWers tend to at least admit that severe enough value conflicts can exist, though I think that people like Eliezer don’t realize that human values conflicts sort of break collective CEV type solutions, and a lot of collective alignment solutions tend to assume that either someone puts their thumb on the scale and exclude certain values, or assume that human values are so similar and their idealizations are so similar that no conflicts are expected, which I personally don’t think is true.
    
    i don’t know where that might be true, but at least on lesswrong i imagine it’s an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe).
    
    also on the “lose out” phrasing: even if someone “wants at least some people to have tormentful lives”, they don’t “lose out” overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.
    
    Agree with this, which handles some cases, but my worry is that there are still likely to be big values conflicts where one value set must ultimately win out over another.