Alignment Hot Take Advent CalendarCharlie SteinerDec 1, 2022, 10:42 PMTake 1: We’re not going to reverse-engineer the AI.Charlie SteinerDec 1, 2022, 10:41 PM38 points4 comments4 min readLW linkTake 2: Building tools to help build FAI is a legitimate strategy, but it’s dual-use.Charlie SteinerDec 3, 2022, 12:54 AM17 points1 comment2 min readLW linkTake 3: No indescribable heavenworlds.Charlie SteinerDec 4, 2022, 2:48 AM23 points12 comments2 min readLW linkTake 4: One problem with natural abstractions is there’s too many of them.Charlie SteinerDec 5, 2022, 10:39 AM37 points4 comments1 min readLW linkTake 5: Another problem for natural abstractions is laziness.Charlie SteinerDec 6, 2022, 7:00 AM31 points4 comments3 min readLW linkTake 6: CAIS is actually Orwellian.Charlie SteinerDec 7, 2022, 1:50 PM14 points8 comments2 min readLW linkTake 7: You should talk about “the human’s utility function” less.Charlie SteinerDec 8, 2022, 8:14 AM50 points22 comments2 min readLW linkTake 8: Queer the inner/outer alignment dichotomy.Charlie SteinerDec 9, 2022, 5:46 PM31 points2 comments2 min readLW linkTake 9: No, RLHF/IDA/debate doesn’t solve outer alignment.Charlie SteinerDec 12, 2022, 11:51 AM33 points13 comments2 min readLW linkTake 10: Fine-tuning with RLHF is aesthetically unsatisfying.Charlie SteinerDec 13, 2022, 7:04 AM37 points3 comments2 min readLW linkTake 11: “Aligning language models” should be weirder.Charlie SteinerDec 18, 2022, 2:14 PM34 points0 comments2 min readLW linkTake 12: RLHF’s use is evidence that orgs will jam RL at real-world problems.Charlie SteinerDec 20, 2022, 5:01 AM25 points1 comment3 min readLW linkTake 13: RLHF bad, conditioning good.Charlie SteinerDec 22, 2022, 10:44 AM54 points4 comments2 min readLW linkTake 14: Corrigibility isn’t that great.Charlie SteinerDec 25, 2022, 1:04 PM15 points3 comments3 min readLW link