Edit: Originally the sequence was going to contain a post about SLT for Alignment, but this can now be found here instead, where a new research agenda, Developmental Interpretability, is introduced. I have also now included references to the lectures from the recent SLT for Alignment Workshop in June 2023.
Edit: Originally the sequence was going to contain a post about SLT for Alignment, but this can now be found here instead, where a new research agenda, Developmental Interpretability, is introduced. I have also now included references to the lectures from the recent SLT for Alignment Workshop in June 2023.