I would like to see a much deeper technical comparison of open agencies vs. unitary agents from the perspectives of control theory, game theory, mechanism design, resilience theory, political science, theories of evolution and development, and many more.
So far, the proposal doesn’t seem technically grounded, and the claims that are made, for example,
The structure of tasks and accountability relationships speaks against merging exploratory planning, strategic decision-making, project management, task performance, reporting, and auditing into a single opaque process. The costs would be high, and any gains in simplicity would be illusory.
Could appear to be the reverse on a deeper inspection.
Also, the proposal seems to unnecessarily strawman “unitary agents”. It’s likely that general AI architectures will be multi-componental in one way or another (e.g., see LeCun proposal; see some arguments for that this might be a general trend even under recursive self-improvement here). OTOH, “open agencies”, on a longer timescale (when the AI and the complexity of problems it solves grow beyond the capacity of human supervision) will converge essentially to the same point: a multi-componental AI architecture.
Every multi-componental cognitive architecture is still a cognitive architecture, and its properties could differ from the properties of the components (which is the textbook definition of emergence). Even if all the components are perfectly “aligned” and ethical, the system as a whole could drift in directions (or “move to the goal”) which we would not like. Cf. capitalism, misaligned corporations made of good people, prisoner’s dilemma, the tragedy of commons, political polarisation and/or standstill, etc.
Agents: Goal-driven agents may defend goals against change. Agencies: Generative planning models respond to goals as prompts.
Cognitive science doesn’t predict a priori that multi-component systems will not defend their goals. In fact, goals of multi-component systems ranging from corporations to totalitarian regimes to ant colonies could be notoriously hard to change. It doesn’t suffice to say “open agency → easily changeable goals”; we should analyse concrete architectures in detail for this property (after deciding if this property is in fact desirable in the first place; maybe we actually want the highest levels of civilisational intelligence to be conservative in its goals).
Agents: Plan descriptions may be internal and opaque. Agencies: Plan descriptions are externalized and interpretable. (Plan descriptions are of necessity interpretable as actions by downstream components and (with caveats) may also be interpretable by humans. Plans are not Transformer activations, and poorly explained plans are good candidates for rejection.)
First, “unitary” general AI designs (or quasi-unitary, because as I gestured at above, probably you just can’t get to AGI with a single Transformer, you need at least some version of debate or something multi-step/multi-componental/reflexive in the AI architecture) that will be deployed will have plans externalised (I hope, but not so sure after Bing Chat...). Considering otherwise would be crazy. Second, just having a plan outputted in either a quasi-unitary agent or open agency is not enough: we should also apply interpretability to activations when the plan was generated to get evidence (engineering assurance) that this plan is not deceptive. There is no way around this in any AI architecture.
Agents: Open-ended goals may motivate secret plans to gain power. Agencies: Bounded tasks include time and budget constraints.
Again, straw-manning unitary agents. Not sure what precludes implementing time and budget constraints in them.
Agents: Humans may have only one chance to set the goals of a dominant agent. Agencies: Humans engage in ongoing development and direction of diverse systems.
This points to your previous post about collusion. I agree that collusion is not guaranteed (we could find a mechanism design in which it would be precluded), but collusion is likewise not ruled out. So, on the rhetorical level, it’s just as valid to say that “humans may have only one chance to deploy an open agency” if this open agency will truly have general intelligence. If it is “safety (or, at least, buying of another chance) by human gatekeeping”, then first, it should be analysed from safety science and reliability engineering perspectives how likely it is that it will help, and second, we can in principle add this sort of gatekeeping to quasi-unitary agents as well. If the problem with the latter is that quasi-unitary agents will be uninterpretable in some ways, then the proposal boils down to saying that “multi-component AI is more interpretable”, which depends a lot on the details (as I pointed out above, in multi-component AI there is an extra task of interpreting the entire system even if all components are interpretable) of the exact “open agency” proposal. So, again, the comparison that dunks quasi-unitary agents is not founded a priori.
I would like to see a much deeper technical comparison of open agencies vs. unitary agents from the perspectives of control theory, game theory, mechanism design, resilience theory, political science, theories of evolution and development, and many more.
So far, the proposal doesn’t seem technically grounded, and the claims that are made, for example,
Could appear to be the reverse on a deeper inspection.
Also, the proposal seems to unnecessarily strawman “unitary agents”. It’s likely that general AI architectures will be multi-componental in one way or another (e.g., see LeCun proposal; see some arguments for that this might be a general trend even under recursive self-improvement here). OTOH, “open agencies”, on a longer timescale (when the AI and the complexity of problems it solves grow beyond the capacity of human supervision) will converge essentially to the same point: a multi-componental AI architecture.
Every multi-componental cognitive architecture is still a cognitive architecture, and its properties could differ from the properties of the components (which is the textbook definition of emergence). Even if all the components are perfectly “aligned” and ethical, the system as a whole could drift in directions (or “move to the goal”) which we would not like. Cf. capitalism, misaligned corporations made of good people, prisoner’s dilemma, the tragedy of commons, political polarisation and/or standstill, etc.
Cognitive science doesn’t predict a priori that multi-component systems will not defend their goals. In fact, goals of multi-component systems ranging from corporations to totalitarian regimes to ant colonies could be notoriously hard to change. It doesn’t suffice to say “open agency → easily changeable goals”; we should analyse concrete architectures in detail for this property (after deciding if this property is in fact desirable in the first place; maybe we actually want the highest levels of civilisational intelligence to be conservative in its goals).
First, “unitary” general AI designs (or quasi-unitary, because as I gestured at above, probably you just can’t get to AGI with a single Transformer, you need at least some version of debate or something multi-step/multi-componental/reflexive in the AI architecture) that will be deployed will have plans externalised (I hope, but not so sure after Bing Chat...). Considering otherwise would be crazy. Second, just having a plan outputted in either a quasi-unitary agent or open agency is not enough: we should also apply interpretability to activations when the plan was generated to get evidence (engineering assurance) that this plan is not deceptive. There is no way around this in any AI architecture.
Again, straw-manning unitary agents. Not sure what precludes implementing time and budget constraints in them.
This points to your previous post about collusion. I agree that collusion is not guaranteed (we could find a mechanism design in which it would be precluded), but collusion is likewise not ruled out. So, on the rhetorical level, it’s just as valid to say that “humans may have only one chance to deploy an open agency” if this open agency will truly have general intelligence. If it is “safety (or, at least, buying of another chance) by human gatekeeping”, then first, it should be analysed from safety science and reliability engineering perspectives how likely it is that it will help, and second, we can in principle add this sort of gatekeeping to quasi-unitary agents as well. If the problem with the latter is that quasi-unitary agents will be uninterpretable in some ways, then the proposal boils down to saying that “multi-component AI is more interpretable”, which depends a lot on the details (as I pointed out above, in multi-component AI there is an extra task of interpreting the entire system even if all components are interpretable) of the exact “open agency” proposal. So, again, the comparison that dunks quasi-unitary agents is not founded a priori.