A sequence of my alignment distillations, written up as I work my way through understanding AI alignment theory.
My rough guiding research algorithm is to focus in on the biggest hazard in my current model of alignment, try to understand and explain that hazard and the proposed solutions to it, and then recurse.
This now-finished sequence is representative of my developing alignment model before I became substantially informationally entangled, in-person, with the Berkeley alignment community. It’s what I was able to glean from just reading a lot online.