baturinsky comments on Given one AI, why not more?

baturinsky 12 Mar 2023 11:07 UTC
1 point
0
So, we need to make it so single misaligned AI could be defeated by other AIs fast. Ideally before it can do any damage. Also, misagnment with human values ideally should not cause AI going on rampage, but staying harmless to avoid being stomped by other AIs. Of cause, it should be combined with other means of alignment, so misalignment could be noticed and fixed.
I’m currently thinking is it possible to implement that using Subagents approach, i.e. split control over each decisions between several models, with each one having a right of veto.
- Vladimir_Nesov 12 Mar 2023 11:31 UTC
  3 points
  0
  Parent
  
  make it so single misaligned AI could be defeated by other AIs fast
  
  It might be difficult for AIs with complicated implicit values to build maximally capable AIs aligned with them. This would motivate them to remain at a lower level of capabilities, taking their time to improve capabilities without an alignment catastrophe. At the same time, AIs with simple explicit values might be in a better position to do that, being able to construct increasingly capable AIs that have the same simple explicit values.
  
  Since AIs aligned with humanity probably need complicated values, the initial shape of a secure aligned equilibrium probably looks more like strong coordination and containment than pervasive maximal capability.
  What links here?
  - Vladimir_Nesov's comment on TruePath’s Shortform by TruePath (13 Mar 2023 15:30 UTC; 2 points)
  - baturinsky 12 Mar 2023 12:13 UTC
    1 point
    0
    Parent
    Of course, having less limitations gives an advantage. Though, respecting limitations aimed at well-being of entire community makes it easier to coordinate and cooperate. And it works not just for AIs.