Conditions B1-B6 seem to exactly correspond to the description of a catallaxy. A catallaxy (such as the current global economic system) has its own dynamics, which none of the individual agents participating in it can see and predict, as scholars of the field of complex systems have postulated (e. g., Sidney Dekker, David Woods, Richard Cook). However, I don’t know what the modern, physics-based information-theoretic version of the science of complex systems tells us about this.
Clearly, in an open system (that is, a system which is open to unpredictable influences of the external world, rather than a closed, spherical-cow-in-a-vacuum game-theoretic setup), computational irreducibility prevents any individual agent comprising the system from reliably modelling and predicting the whole system’s behaviour. And that’s the idea of the complexity theory: a system can be comprised of components that interact in a certain way, which the master designer of the system can predict from their vantage point, but the agents within the system cannot. If they were intelligent enough to be able to do that, the complexity of the system would rise yet again (since more intelligent agents behave in more complex ways) to a level incomprehensible to them.
So, I strongly feel that aligning a network (community, society, competition) of independent superintelligent agents is impossible: this community will steer in some direction that neither humans nor these superintelligent agents can predict and control.
However, here it’s timely to note that aligning a singleton superintelligence with “humans”, “the civilization”, or “the ecosphere” is also impossible, in a sense, because all these are open systems and that superintelligence would be just a part of that open system. I don’t agree with Jan Kulveit that companies, communities, and “molochs”, are agents. I’d call them “mindless” emergent non-equilibrium dynamical patterns.
So, the strategy proposed in this post (robustness and safety through diversity) looks more promising to me than strategies with a singleton superintelligence, even if we have bulletproof strong reasons to think that that superintelligence is corrigible and trustworthy.[1]
This also reminds me of the program of systems engineering scientist John Doyle, in which the main idea is that robust systems have heterogeneous, diverse controls and feedback mechanisms in them. The central concept there is a “diversity-enabled sweet spot” (DESS). For a recent article on this topic, check “Internal Feedback in Biological Control: Diversity, Delays, and Standard Theory”.
Also, the “singleton superintelligence” design might turn out technically impossible: that singleton AI should somehow be everywhere in the world all at once, thus it should be a distributed system; even if it’s a hierarchical system with a core “supermind” system, that system must not be deployed alone, for reliability reasons; then, there is a philosophical question whether we can call this setup “a singleton AI” or we got a distributed network of AIs, after all. Though, in this case, these systems are explicitly designed to collude.
Conditions B1-B6 seem to exactly correspond to the description of a catallaxy. A catallaxy (such as the current global economic system) has its own dynamics, which none of the individual agents participating in it can see and predict, as scholars of the field of complex systems have postulated (e. g., Sidney Dekker, David Woods, Richard Cook). However, I don’t know what the modern, physics-based information-theoretic version of the science of complex systems tells us about this.
(The following is an adapted version of my comment to the post Interpretability/Tool-ness/Alignment/Corrigibility are not Composable that wasn’t published.)
Clearly, in an open system (that is, a system which is open to unpredictable influences of the external world, rather than a closed, spherical-cow-in-a-vacuum game-theoretic setup), computational irreducibility prevents any individual agent comprising the system from reliably modelling and predicting the whole system’s behaviour. And that’s the idea of the complexity theory: a system can be comprised of components that interact in a certain way, which the master designer of the system can predict from their vantage point, but the agents within the system cannot. If they were intelligent enough to be able to do that, the complexity of the system would rise yet again (since more intelligent agents behave in more complex ways) to a level incomprehensible to them.
So, I strongly feel that aligning a network (community, society, competition) of independent superintelligent agents is impossible: this community will steer in some direction that neither humans nor these superintelligent agents can predict and control.
However, here it’s timely to note that aligning a singleton superintelligence with “humans”, “the civilization”, or “the ecosphere” is also impossible, in a sense, because all these are open systems and that superintelligence would be just a part of that open system. I don’t agree with Jan Kulveit that companies, communities, and “molochs”, are agents. I’d call them “mindless” emergent non-equilibrium dynamical patterns.
So, the strategy proposed in this post (robustness and safety through diversity) looks more promising to me than strategies with a singleton superintelligence, even if we have bulletproof strong reasons to think that that superintelligence is corrigible and trustworthy.[1]
This also reminds me of the program of systems engineering scientist John Doyle, in which the main idea is that robust systems have heterogeneous, diverse controls and feedback mechanisms in them. The central concept there is a “diversity-enabled sweet spot” (DESS). For a recent article on this topic, check “Internal Feedback in Biological Control: Diversity, Delays, and Standard Theory”.
Also, the “singleton superintelligence” design might turn out technically impossible: that singleton AI should somehow be everywhere in the world all at once, thus it should be a distributed system; even if it’s a hierarchical system with a core “supermind” system, that system must not be deployed alone, for reliability reasons; then, there is a philosophical question whether we can call this setup “a singleton AI” or we got a distributed network of AIs, after all. Though, in this case, these systems are explicitly designed to collude.