MondSemmel comments on quila’s Shortform

MondSemmel 6 Jan 2025 17:12 UTC
4 points
−6
The default outcome is an unaligned superintelligence singleton destroying the world and not caring about human concepts like property rights. Whereas an aligned superintelligence can create a far more utopian future than a human could come up with, and cares about capitalism and property rights only to the extent that that’s what it was designed to care about.
So I indeed don’t get your perspective. Why are humans still appearing as agents or decision-makers in your post-superintelligence scenario at all? If the superintelligence for some unlikely reason wants a human to stick around and to do something, then it doesn’t need to pay them. And if a superintelligence wants a resource, it can just take it, no need to pay for anything.
- Noosphere89 6 Jan 2025 18:26 UTC
  6 points
  2
  Parent
  @L Rudolf L can talk on his own, but for me, a crux probably is I don’t expect either unaligned superintelligence singleton or a value aligned superintelligence creating utopia as the space of likely outcomes within the next few decades.
  For the unaligned superintelligence point, my basic reasons is I now believe the alignment problem got significantly easier compared to 15 years ago, I’ve become more bullish on AI control working out since o3, and I’ve come to think instrumental convergence is probably correct for some AIs we build in practice, but that instrumental drives are more constrainable on the likely paths to AGI and ASI.
  For the alignment point, a big reason for this is I now think a lot of what makes an AI aligned is primarily data, compared to inductive biases, and one of my biggest divergences with the LW community comes down to me thinking that inductive bias is way less necessary for alignment than people usually think, especially compared to 15 years ago.
  For AI control, one update I’ve made for o3 is that I believe OpenAI managed to get the RL loop working in domains where outcomes are easily verifiable, but not in domains where verifying is hard, and programming/mathematics are such domains where verifying is easy, but the tie-in is that capabilities will be more spikey/narrow than you may think, and this matters since I believe narrow/tool AI has a relevant role to play in an intelligence explosion, so you can actually affect the outcome by building narrow capabilities AI for a few years, and the fact that AI capabilities are spikey in domains where we can easily verify outcomes is good for eliciting AI capabilities, which is a part of AI control.
  For the singleton point, it’s probably because I believe takeoff is both slow and somewhat distributed enough such that multiple superintelligent AIs can arise.
  For the value-aligned superintelligence creating a utopia for everyone, my basic reason for why I don’t really believe in this is because I believe value conflicts are effectively irresolvable due to moral subjectivism, which forces the utopia to be a utopia for some people, and I expect the set of people that are in an individual utopia to be small in practice (because value conflicts become more relevant for AIs that can create nation-states all by themselves.)
  For why humans are decision makers, this is probably because AI is either controlled or certain companies have chosen to make AIs follow instruction-following drives, and that actually succeeding.
- Sohaib Imran 6 Jan 2025 17:41 UTC
  6 points
  0
  Parent
  And why must alignment be binary? (aligned, or misaligned, where misaligned necessarily means it destroys the world and does not care about property rights)
  
  Why can you not have an a superintelligence that is only misaligned when it comes to issues of wealth distribution?
  
  Relatedly, are we sure that CEV is computable?
  - MondSemmel 6 Jan 2025 18:18 UTC
    3 points
    0
    Parent
    I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?
    And what does it even mean for a superintelligence to be “only misaligned when it comes to issues of wealth distribution”? Can’t you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?
    - Sohaib Imran 6 Jan 2025 23:27 UTC
      3 points
      0
      Parent
      
      I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?
      
      Are you saying that the 1 aligned mind design in the space of all potential mind designs is an easier target than the subspace composed of mind designs that does not destroy the world? If so, why? is it a bigger target? is it more stable?
      
      Can’t you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?
      
      No, because the you who can ask (the persons in power) is themselves misaligned with the 1 alignment target that perfectly captures all our preferences.
      - MondSemmel 14 Jan 2025 16:24 UTC
        1 point
        −1
        Parent
        Are you saying that the 1 aligned mind design in the space of all potential mind designs is an easier target than the subspace composed of mind designs that does not destroy the world?
        I didn’t mean that there’s only one aligned mind design, merely that almost all (99.999999...%) conceivable mind designs are unaligned by default, so the only way to survive is if the first AGI is designed to be aligned, there’s no hope that a random AGI just happens to be aligned. And since we’re heading for the latter scenario, it would be very surprising to me if we managed to design a partially aligned AGI and lose that way.
        No, because the you who can ask (the persons in power) is themselves misaligned with the 1 alignment target that perfectly captures all our preferences.
        I expect the people in power are worrying about this way more than they worry about the overwhelming difficulty of building an aligned AGI in the first place. (Case in point: the manufactured AI race with China.) As a result I expect they’ll succeed at building a by-default-unaligned AGI and driving themselves and us to extinction. So I’m not worried about instead ending up in a dystopia ruled by some government or AI lab owner.