Thank you. You phrased the concerns about “integrating with a bigger picture” better than I could. To temper the negatives, I see at least two workable approaches, plus a framing for identifying more workable approaches.
Enable other safety groups to use and reproduce Conjecture’s research on CogEms so those groups can address more parts of the “bigger picture” using Conjecture’s findings. Under this approach, Conjecture becomes a safety research group, and the integration work of turning that research into actionable safety efforts becomes someone else’s task.
Understand the societal motivations for taking short-term steps toward creating dangerous AI, and demonstrate that CogEms are better suited for addressing those motivations, not just the motivations of safety enthusiasts, and not just hypothetical motivations that people “should” have. To take an example, OpenAI has taken steps towards building dangerous AI, and Microsoft has taken another dangerous step of attaching a massive search database to it, exposing the product to millions of people, and kicking off an arms race with Google. There were individual decision-makers involved in that process, not just as “Big Company does Bad Thing because that’s what big companies do.” Why did they make those decisions? What was the decision process for those product managers? Who created the pitch that convinced the executives? Why didn’t Microsoft’s internal security processes mitigate more of the risks? What would it have taken for Microsoft to have released a CogEm instead of Sydney? The answer is not just research advances. Finding the answers would involve talking to people familiar with these processes, ideally people that were somehow involved. Once safety-oriented people understand these things, it will be much easier for them to replace more dangerous AI systems with CogEms.
As a general framework, there needs to be more liquidity between the safety research and the high-end AI capabilities market, and products introduce liquidity between research and markets. Publishing research addresses one part of that by enabling other groups to productize that research. Understanding societal motivations addresses another part of that, and it would typically fall under “user research.” Clarity on how others can use your product is another part, one that typically falls under a “go-to-market strategy.” There’s also market awareness & education, which helps people understand where to use products, then the sales process, which helps people through the “last mile” efforts of actually using the product, then the nebulous process of scaling everything up. As far as I can tell, this is a minimal set of steps required for getting the high-end AI capabilities market to adopt safety features, and it’s effectively the industry standard approach.
As an aside, I think CogEms are a perfectly valid strategy for creating aligned AI. It doesn’t matter if most humans have bad interpretability, persuadability, robustness, ethics, or whatever else. As long as it’s possible for some human (or collection of humans) to be good at those things, we should expect that some subclass of CogEms (or collection of CogEms) can also be good at those things.
I think you’re missing an important edge case where all of your resolved subsystems are in agreement that their collective desires are simultaneously compatible and unattainable without enormous amounts of motivation, which is something that an arms race can provide. Adaptation isn’t just about spinning cycles and causing stress. It does have actual tangible outcomes, and not all of those outcomes are bad. Though I think for most people, your advice is probably close enough to the right advice.