Satron comments on Alignment Is Not All You Need

Satron 2 Jan 2025 20:54 UTC
6 points
3
I will try to write down my thoughts on these problems below:
1) The Coordination Problem
For any organization developing AI, failing to align it is just as dangerous—if not more so—than losing the AI race altogether. If an organization has already secured the resources needed to win the capabilities race and has a functioning alignment solution (two of the most challenging hurdles), I’d be confident that it can successfully implement that solution (which, in comparison, seems like the easiest part). The risks of failing to implement alignment solutions are essentially the same as the risks of not having an alignment solution in the first place:
If you don’t have a working alignment solution, you die.
If you fail to implement to implement a working alignment solution, you die.
Companies spending considerable resources on creating a working solution to the alignment problem will have all the same reasons for actually implementing it.
2) The Power Distribution Problem
I wouldn’t necessarily frame this as a problem. Consider a world where multiple entities control AI—this scenario appears quite a bit more problematic. As it stands, the US is seemingly at the forefront of the AI race. Do we really want China and Russia to develop their own AIs? Even more troubling is the idea of multiple individuals owning superhuman AI. Just one person bent on global vengeance could lead to catastrophic outcomes. I’d be much more inclined to trust the AI race’s winner to act in humanity’s best interest than to rely on the goodness of every individual AI owner (including the winner of the AI race).
If the winner of the AI race will not act in humanity’s best interests, then we won’t have the means to make him share AI with others.
If the winner of the AI race will act in humanity’s best interests, then we won’t want him to share AI with other agents who might not act in humanity’s best interests.
3) The Economic Transition Problem
If AI is aligned with human values, there is no need for humans to retain economic control. AI would simply leverage our economic resources for the benefit of humanity.
- Adam Jones 2 Jan 2025 23:40 UTC
  4 points
  3
  Parent
  Re: Your comments on the power distribution problem
  
  Agreed that multiple entities powerful adversaries controlling AI seems like not a good plan. And I agree if the decisive winner of the AI race will not act in humanity’s best interests, we are screwed.
  
  But I think this is a problem for before that happens: we can shape the world today so it’s more likely the winner of the AI race will act in humanity’s best interests.
  - Satron 3 Jan 2025 9:35 UTC
    4 points
    3
    Parent
    I agree with everything.
    
    We can and should be trying to improve our odds by making sure that the leading AI labs don’t have any revenge-seeking psychopaths in their leadership.
- Adam Jones 2 Jan 2025 23:34 UTC
  4 points
  3
  Parent
  Re: Your points about alignment solving this.
  
  I agree if you define alignment as ‘get your AI system to act in the best interests in humans’, then the coordination problem becomes harder and likely sufficient for problems 2 and 3. But I think it then bundles more problems together in a way that might be less conducive to solving them.
  
  For loss of control, I was primarily thinking about making systems intent-aligned, by which I mean getting the AI system to try to do what its creators intend. I think this makes dividing these challenges up into subproblems easier (and seems to be what many people appear to be gunning for).
  
  If you do define alignment as human-values alignment, I think “If you fail to implement to implement a working alignment solution, you [the creating organization] die” doesn’t hold—I can imagine successfully aligning a system to ‘get your AI system to act in the best interests of its creators’ working fine for its creators but not being great for the world.
  - Satron 3 Jan 2025 9:30 UTC
    4 points
    3
    Parent
    Ah, I see. You are absolutely right. I unintentionally used two different meanings of the word “alignment” in problems 1 and 3.
    
    If we define alignment as intent alignment (from my comment on problem 1), then humans don’t necessarily lose control over the economy in The Economic Transition Problem. The group of people to win the AI race will basically control the entire economy via controlling AI that’s controlling the world (and is intent aligned to them).
    
    If we are lucky, they can create a democratic online council where each human gets a say in how the economy is run. The group will tell AI what to do based on how humanity voted.
    
    Alternatively, with the help of their intent aligned AI, the group can try to build a value aligned AI. When they are confident that this AI is indeed value aligned, they can then release it and let it be the steward of humanity.
    
    In this scenario, The Economic Transition Problem just becomes The Power Distribution Problem of ensuring that whoever wins the AI race will act in humanity’s best interests (or close enough).