I recently gave a two-part talk on the big picture of alignment, as I see it. The talk is not-at-all polished, but contains a lot of stuff for which I don’t currently know of any good writeup. Major pieces in part one:
Some semitechnical intuition-building for high-dimensional problem-spaces.
Optimization compresses information “by default”
Resources and “instrumental convergence” without any explicit reference to agents
A frame for thinking about the alignment problem which only talks about high-dimensional problem-spaces, without reference to AI per se.
The central challenge is to get enough bits-of-information about human values to narrow down a search-space to solutions compatible with human values.
Details like whether an AI is a singleton, tool AI, multipolar, oracle, etc are mostly irrelevant.
Fermi estimate: just how complex are human values?
Coherence arguments, presented the way I think they should be done.
Also subagents!
Note that I don’t talk about timelines or takeoff scenarios; this talk is just about the technical problem of alignment.
Here’s the video for part one:
Big thanks to Rob Miles for editing! Also, the video includes some good questions and discussion from Adam Shimi, Alex Flint, and Rob Miles.
The Big Picture Of Alignment (Talk Part 1)
Link post
I recently gave a two-part talk on the big picture of alignment, as I see it. The talk is not-at-all polished, but contains a lot of stuff for which I don’t currently know of any good writeup. Major pieces in part one:
Some semitechnical intuition-building for high-dimensional problem-spaces.
Optimization compresses information “by default”
Resources and “instrumental convergence” without any explicit reference to agents
A frame for thinking about the alignment problem which only talks about high-dimensional problem-spaces, without reference to AI per se.
The central challenge is to get enough bits-of-information about human values to narrow down a search-space to solutions compatible with human values.
Details like whether an AI is a singleton, tool AI, multipolar, oracle, etc are mostly irrelevant.
Fermi estimate: just how complex are human values?
Coherence arguments, presented the way I think they should be done.
Also subagents!
Note that I don’t talk about timelines or takeoff scenarios; this talk is just about the technical problem of alignment.
Here’s the video for part one:
Big thanks to Rob Miles for editing! Also, the video includes some good questions and discussion from Adam Shimi, Alex Flint, and Rob Miles.