Executive summary: This post proposes a strategy for safely accelerating alignment research. The plan is to set up human-in-the-loop systems which empower human agency rather than outsource it, and to use those systems to differentially accelerate progress on alignment.
Introduction: An explanation of the context and motivation for this agenda.
Automated Research Assistants: A discussion of why the paradigm of training AI systems to behave as autonomous agents is both counterproductive and dangerous.
Becoming a Cyborg: A proposal for an alternative approach/frame, which focuses on a particular type of human-in-the-loop system I am calling a “cyborg”.
Failure Modes: An analysis of how this agenda could either fail to help or actively cause harm by accelerating AI research more broadly.
Testimony of a Cyborg: A personal account of how Janus uses GPT as a part of their workflow, and how it relates to the cyborgism approach to intelligence augmentation.
What are the most promising plans for automating alignment research as mentioned in for example OpenAI’s approach to alignment and by others?
The cyborgism post might be relevant: