“Why Not Just...”johnswentworthAug 8, 2022, 6:15 PMA compendium of rants about alignment proposals, of varying charitability.Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/EtcjohnswentworthJun 4, 2022, 5:41 AM159 points55 comments2 min readLW link1 reviewGodzilla StrategiesjohnswentworthJun 11, 2022, 3:44 PM159 points72 comments3 min readLW linkRant on Problem Factorization for AlignmentjohnswentworthAug 5, 2022, 7:23 PM102 points53 comments6 min readLW linkInterpretability/Tool-ness/Alignment/Corrigibility are not ComposablejohnswentworthAug 8, 2022, 6:05 PM143 points13 comments3 min readLW linkHow To Go From Interpretability To Alignment: Just Retarget The SearchjohnswentworthAug 10, 2022, 4:08 PM209 points34 comments3 min readLW link1 reviewOversight Misses 100% of Thoughts The AI Does Not ThinkjohnswentworthAug 12, 2022, 4:30 PM110 points49 comments1 min readLW linkHuman Mimicry Mainly Works When We’re Already ClosejohnswentworthAug 17, 2022, 6:41 PM81 points16 comments5 min readLW linkWorlds Where Iterative Design FailsjohnswentworthAug 30, 2022, 8:48 PM208 points30 comments10 min readLW link1 reviewWhy Not Just… Build Weak AI Tools For AI Alignment Research?johnswentworthMar 5, 2023, 12:12 AM175 points18 comments6 min readLW linkWhy Not Just Outsource Alignment Research To An AI?johnswentworthMar 9, 2023, 9:49 PM151 points50 comments9 min readLW link1 reviewOpenAI Launches Superalignment TaskforceZviJul 11, 2023, 1:00 PM149 points40 comments49 min readLW link(thezvi.wordpress.com)