I understood the proposal as “let’s create relatively small, autonomous AIs of multi-component cognitive architecture instead of unitary DNNs becoming global services (and thus ushering cognitive globalisation)”. The key move seems to be that you want to increase the AI architecture’s interpretability by pushing some intelligence into the multi-component interaction from the “DNN depths”. In this setup, components may remain relatively small (e.g., below 100B parameter scale, i.e., at the level of the current SoTA DNNs), while their interaction leads to the emergence of general intelligence, which may not be achievable for unitary DNNs at these model scales.
The proposal seems to be in some ways very close to the recent Eric Drexler’s Open Agency Model. Both your proposal and Drexler’s “open agencies” seem to noticeably allude to the (classic) approaches to cognitive architecture, a la OpenCog, or LeCun’s H-JEPA.
As well as Drexler, it seems that you take for granted that multi-component cognitive architectures will have various “good” properties, ranging from interpretability to retargetability (which Michael Levin calls persuadability, btw). However, as I noted in this comment, none of these “good properties” are actually granted for an arbitrary multi-component AI. It must be demonstrated for a specific multi-component architecture why it is more interpretable (see also in the linked comment my note that “outputting plans” != interpretability), persuadable, robust, and ethical than an alternative architecture.
Your approach to this seems to be “minimal”, that is, something like “at least we know the level of interpretability, persuadability/corrigibility, robustness, and ethics of humans, so let’s try to build AI ‘after humans’ so that we get at least these properties, rather than worse properties”.
As such, I think this approach might not really solve anything and might not “end the acute risk period”, because it amounts to “building more human-like minds”, with all other economic, social, and political dynamics intact. I don’t see how building “just more humans” prevents corporations from pursuing cognitive globalisation in one form or another. The approach just seems to add more “brain-power” to the engine (either via building AI “like humans, but 2-3 sigmas more intelligent”, or just building many of them and keeping them running around the clock), without targeting any problems with the engine itself. In other words, the approach is not placed within a frame of a larger vision for civilisational intelligence architecture, which, I argued, is a requirement for any “AI safety paradigm”: “Both AI alignment paradigms (protocols) and AGI capability research (intelligence architectures) that don’t position themselves within a certain design for civilisational intelligence are methodologically misguided and could be dangerous.”
Humans have rather bad interpretability (see Chater’s “The Mind is Flat”), bad persuadability (see Scott Alexander’s “Trapped Priors”), bad robustness (see John Doyle’s hijackable language and memetic viruses), poor capability for communication and alignment (as anyone who tried to reliably communicate any idea to anyone else or to align with anyone else on anything can easily attest; cf. discussion of communication protocols in “Designing Ecosystems of Intelligence from First Principles”), and, of course, poor ethics. It’s also important to note that all these characteristics seem to be largely uncorrelated (at least, not strictly pegged) with raw general intelligence (GI factor) in humans.
I agree with you and Friston and others who worry about the cognitive globalisation and hyperscaling approach, but I also think that in order to improve the chances of humanity, we should at least aim at better than human architecture from the beginning. Creating many AI minds of the architecture “just like humans” (unless you count on some lucky emergence and that even trying to target “at humans” will yield architecture “better than humans”) doesn’t seem to help in civilisational intelligence and robustness, just accelerate the current trends.
I’m also optimistic because I see nothing impossibly hard or intractable in designing cognitive architectures that would be better than humans from the beginning and getting good engineering assurances that the architecture will indeed yield these (better than human) characteristics. It just takes time and a lot of effort (but so as architecturing AI “just like humans” does). E.g., (explicit) Active Inference architecture seems to help significantly at least with interpretability and the capacity for reliable and precise communication and, hence, belief and goal alignment (see Friston et al., 2022).
Thank you. You phrased the concerns about “integrating with a bigger picture” better than I could. To temper the negatives, I see at least two workable approaches, plus a framing for identifying more workable approaches.
Enable other safety groups to use and reproduce Conjecture’s research on CogEms so those groups can address more parts of the “bigger picture” using Conjecture’s findings. Under this approach, Conjecture becomes a safety research group, and the integration work of turning that research into actionable safety efforts becomes someone else’s task.
Understand the societal motivations for taking short-term steps toward creating dangerous AI, and demonstrate that CogEms are better suited for addressing those motivations, not just the motivations of safety enthusiasts, and not just hypothetical motivations that people “should” have. To take an example, OpenAI has taken steps towards building dangerous AI, and Microsoft has taken another dangerous step of attaching a massive search database to it, exposing the product to millions of people, and kicking off an arms race with Google. There were individual decision-makers involved in that process, not just as “Big Company does Bad Thing because that’s what big companies do.” Why did they make those decisions? What was the decision process for those product managers? Who created the pitch that convinced the executives? Why didn’t Microsoft’s internal security processes mitigate more of the risks? What would it have taken for Microsoft to have released a CogEm instead of Sydney? The answer is not just research advances. Finding the answers would involve talking to people familiar with these processes, ideally people that were somehow involved. Once safety-oriented people understand these things, it will be much easier for them to replace more dangerous AI systems with CogEms.
As a general framework, there needs to be more liquidity between the safety research and the high-end AI capabilities market, and products introduce liquidity between research and markets. Publishing research addresses one part of that by enabling other groups to productize that research. Understanding societal motivations addresses another part of that, and it would typically fall under “user research.” Clarity on how others can use your product is another part, one that typically falls under a “go-to-market strategy.” There’s also market awareness & education, which helps people understand where to use products, then the sales process, which helps people through the “last mile” efforts of actually using the product, then the nebulous process of scaling everything up. As far as I can tell, this is a minimal set of steps required for getting the high-end AI capabilities market to adopt safety features, and it’s effectively the industry standard approach.
As an aside, I think CogEms are a perfectly valid strategy for creating aligned AI. It doesn’t matter if most humans have bad interpretability, persuadability, robustness, ethics, or whatever else. As long as it’s possible for some human (or collection of humans) to be good at those things, we should expect that some subclass of CogEms (or collection of CogEms) can also be good at those things.
I also have concerns with this plan (mainly about timing, see my comment elsewhere on this thread). However, I disagree with your concerns. I think that a CogEm as described here has much better interpretability than a human brain (we can read the connections and weights completely). Based on my neuroscience background, I think that human brains are already more intepretable and controllable than black-box ml models. I think that the other problems you mention are greatly mitigated by the fact that we’d have edit-access to the weights and connections of the CogEm, thus would be able to redirect it much more easily than a human. I think that having full edit access to the weights and connections of a human brain would make that human quite controllable! Especially in combination with being able to wipe its memory and restore it to a previous state, rerun it over test scenarios many thousands of times with different parameters, etc.
I understood the proposal as “let’s create relatively small, autonomous AIs of multi-component cognitive architecture instead of unitary DNNs becoming global services (and thus ushering cognitive globalisation)”. The key move seems to be that you want to increase the AI architecture’s interpretability by pushing some intelligence into the multi-component interaction from the “DNN depths”. In this setup, components may remain relatively small (e.g., below 100B parameter scale, i.e., at the level of the current SoTA DNNs), while their interaction leads to the emergence of general intelligence, which may not be achievable for unitary DNNs at these model scales.
The proposal seems to be in some ways very close to the recent Eric Drexler’s Open Agency Model. Both your proposal and Drexler’s “open agencies” seem to noticeably allude to the (classic) approaches to cognitive architecture, a la OpenCog, or LeCun’s H-JEPA.
As well as Drexler, it seems that you take for granted that multi-component cognitive architectures will have various “good” properties, ranging from interpretability to retargetability (which Michael Levin calls persuadability, btw). However, as I noted in this comment, none of these “good properties” are actually granted for an arbitrary multi-component AI. It must be demonstrated for a specific multi-component architecture why it is more interpretable (see also in the linked comment my note that “outputting plans” != interpretability), persuadable, robust, and ethical than an alternative architecture.
Your approach to this seems to be “minimal”, that is, something like “at least we know the level of interpretability, persuadability/corrigibility, robustness, and ethics of humans, so let’s try to build AI ‘after humans’ so that we get at least these properties, rather than worse properties”.
As such, I think this approach might not really solve anything and might not “end the acute risk period”, because it amounts to “building more human-like minds”, with all other economic, social, and political dynamics intact. I don’t see how building “just more humans” prevents corporations from pursuing cognitive globalisation in one form or another. The approach just seems to add more “brain-power” to the engine (either via building AI “like humans, but 2-3 sigmas more intelligent”, or just building many of them and keeping them running around the clock), without targeting any problems with the engine itself. In other words, the approach is not placed within a frame of a larger vision for civilisational intelligence architecture, which, I argued, is a requirement for any “AI safety paradigm”: “Both AI alignment paradigms (protocols) and AGI capability research (intelligence architectures) that don’t position themselves within a certain design for civilisational intelligence are methodologically misguided and could be dangerous.”
Humans have rather bad interpretability (see Chater’s “The Mind is Flat”), bad persuadability (see Scott Alexander’s “Trapped Priors”), bad robustness (see John Doyle’s hijackable language and memetic viruses), poor capability for communication and alignment (as anyone who tried to reliably communicate any idea to anyone else or to align with anyone else on anything can easily attest; cf. discussion of communication protocols in “Designing Ecosystems of Intelligence from First Principles”), and, of course, poor ethics. It’s also important to note that all these characteristics seem to be largely uncorrelated (at least, not strictly pegged) with raw general intelligence (GI factor) in humans.
I agree with you and Friston and others who worry about the cognitive globalisation and hyperscaling approach, but I also think that in order to improve the chances of humanity, we should at least aim at better than human architecture from the beginning. Creating many AI minds of the architecture “just like humans” (unless you count on some lucky emergence and that even trying to target “at humans” will yield architecture “better than humans”) doesn’t seem to help in civilisational intelligence and robustness, just accelerate the current trends.
I’m also optimistic because I see nothing impossibly hard or intractable in designing cognitive architectures that would be better than humans from the beginning and getting good engineering assurances that the architecture will indeed yield these (better than human) characteristics. It just takes time and a lot of effort (but so as architecturing AI “just like humans” does). E.g., (explicit) Active Inference architecture seems to help significantly at least with interpretability and the capacity for reliable and precise communication and, hence, belief and goal alignment (see Friston et al., 2022).
Thank you. You phrased the concerns about “integrating with a bigger picture” better than I could. To temper the negatives, I see at least two workable approaches, plus a framing for identifying more workable approaches.
Enable other safety groups to use and reproduce Conjecture’s research on CogEms so those groups can address more parts of the “bigger picture” using Conjecture’s findings. Under this approach, Conjecture becomes a safety research group, and the integration work of turning that research into actionable safety efforts becomes someone else’s task.
Understand the societal motivations for taking short-term steps toward creating dangerous AI, and demonstrate that CogEms are better suited for addressing those motivations, not just the motivations of safety enthusiasts, and not just hypothetical motivations that people “should” have. To take an example, OpenAI has taken steps towards building dangerous AI, and Microsoft has taken another dangerous step of attaching a massive search database to it, exposing the product to millions of people, and kicking off an arms race with Google. There were individual decision-makers involved in that process, not just as “Big Company does Bad Thing because that’s what big companies do.” Why did they make those decisions? What was the decision process for those product managers? Who created the pitch that convinced the executives? Why didn’t Microsoft’s internal security processes mitigate more of the risks? What would it have taken for Microsoft to have released a CogEm instead of Sydney? The answer is not just research advances. Finding the answers would involve talking to people familiar with these processes, ideally people that were somehow involved. Once safety-oriented people understand these things, it will be much easier for them to replace more dangerous AI systems with CogEms.
As a general framework, there needs to be more liquidity between the safety research and the high-end AI capabilities market, and products introduce liquidity between research and markets. Publishing research addresses one part of that by enabling other groups to productize that research. Understanding societal motivations addresses another part of that, and it would typically fall under “user research.” Clarity on how others can use your product is another part, one that typically falls under a “go-to-market strategy.” There’s also market awareness & education, which helps people understand where to use products, then the sales process, which helps people through the “last mile” efforts of actually using the product, then the nebulous process of scaling everything up. As far as I can tell, this is a minimal set of steps required for getting the high-end AI capabilities market to adopt safety features, and it’s effectively the industry standard approach.
As an aside, I think CogEms are a perfectly valid strategy for creating aligned AI. It doesn’t matter if most humans have bad interpretability, persuadability, robustness, ethics, or whatever else. As long as it’s possible for some human (or collection of humans) to be good at those things, we should expect that some subclass of CogEms (or collection of CogEms) can also be good at those things.
I also have concerns with this plan (mainly about timing, see my comment elsewhere on this thread). However, I disagree with your concerns. I think that a CogEm as described here has much better interpretability than a human brain (we can read the connections and weights completely). Based on my neuroscience background, I think that human brains are already more intepretable and controllable than black-box ml models. I think that the other problems you mention are greatly mitigated by the fact that we’d have edit-access to the weights and connections of the CogEm, thus would be able to redirect it much more easily than a human. I think that having full edit access to the weights and connections of a human brain would make that human quite controllable! Especially in combination with being able to wipe its memory and restore it to a previous state, rerun it over test scenarios many thousands of times with different parameters, etc.