How can we move the needle on AI safety? In this sequence I think through some approaches that don’t rely on precise specifications—instead they involve “shaping” our agents to think in safer ways, and have safer motivations. This is particularly relevant to the prospect of training AGIs in multi-agent (or other open-ended) environments.
Note that all of the techniques I propose here are speculative brainstorming; I’m not confident in any of them as research directions, although I’d be excited to see further exploration along these lines.