Distillation & Pedagogy

TagLast edit: 21 Aug 2020 22:31 UTC by Raemon

Distillation is the process of taking a complex subject, and making it easier to understand. Pedagogy is the method and practice of teaching. A good intellectual pipeline requires not just discovering new ideas, but making it easier for newcomers to learn them, stand on the shoulders of giants, and discover even more ideas.

Chris Olah, founder of distill.pub, writes in his essay Research Debt:

Programmers talk about technical debt: there are ways to write software that are faster in the short run but problematic in the long run. Managers talk about institutional debt: institutions can grow quickly at the cost of bad practices creeping in. Both are easy to accumulate but hard to get rid of.
Research can also have debt. It comes in several forms:
Poor Exposition – Often, there is no good explanation of important ideas and one has to struggle to understand them. This problem is so pervasive that we take it for granted and don’t appreciate how much better things could be.
Undigested Ideas – Most ideas start off rough and hard to understand. They become radically easier as we polish them, developing the right analogies, language, and ways of thinking.
Bad abstractions and notation – Abstractions and notation are the user interface of research, shaping how we think and communicate. Unfortunately, we often get stuck with the first formalisms to develop even when they’re bad. For example, an object with extra electrons is negative, and pi is wrong.
Noise – Being a researcher is like standing in the middle of a construction site. Countless papers scream for your attention and there’s no easy way to filter or summarize them.Because most work is explained poorly, it takes a lot of energy to understand each piece of work. For many papers, one wants a simple one sentence explanation of it, but needs to fight with it to get that sentence. Because the simplest way to get the attention of interested parties is to get everyone’s attention, we get flooded with work. Because we incentivize people being “prolific,” we get flooded with a lot of work… We think noise is the main way experts experience research debt.
The insidious thing about research debt is that it’s normal. Everyone takes it for granted, and doesn’t realize that things could be different. For example, it’s normal to give very mediocre explanations of research, and people perceive that to be the ceiling of explanation quality. On the rare occasions that truly excellent explanations come along, people see them as one-off miracles rather than a sign that we could systematically be doing better.

How to teach things well

Neel Nanda28 Aug 2020 16:44 UTC

108 points

17 comments15 min readLW link 1 review

(www.neelnanda.io)

Research Debt

Elizabeth15 Jul 2018 19:36 UTC

24 points

2 comments1 min readLW link

(distill.pub)

Ironing Out the Squiggles

Zack_M_Davis29 Apr 2024 16:13 UTC

153 points

36 comments11 min readLW link

[Question] What are Examples of Great Distillers?

adamShimi12 Nov 2020 14:09 UTC

35 points

12 comments1 min readLW link

The Cave Allegory Revisited: Understanding GPT’s Worldview

Jan_Kulveit14 Feb 2023 16:00 UTC

85 points

5 comments3 min readLW link

Explainers Shoot High. Aim Low!

Eliezer Yudkowsky24 Oct 2007 1:13 UTC

101 points

35 comments1 min readLW link

Davidad’s Bold Plan for Alignment: An In-Depth Explanation

Charbel-Raphaël and Gabin

19 Apr 2023 16:09 UTC

161 points

34 comments21 min readLW link

Infra-Bayesianism Unwrapped

adamShimi20 Jan 2021 13:35 UTC

58 points

0 comments24 min readLW link

Call For Distillers

johnswentworth4 Apr 2022 18:25 UTC

206 points

43 comments3 min readLW link 1 review

Abstracts should be either Actually Short™, or broken into paragraphs

Raemon24 Mar 2023 0:51 UTC

93 points

27 comments5 min readLW link

Learning how to learn

Neel Nanda30 Sep 2020 16:50 UTC

38 points

0 comments15 min readLW link

(www.neelnanda.io)

Features that make a report especially helpful to me

lukeprog14 Apr 2022 1:12 UTC

40 points

0 comments2 min readLW link

TAPs for Tutoring

Mark Xu24 Dec 2020 20:46 UTC

27 points

3 comments5 min readLW link

Stampy’s AI Safety Info—New Distillations #3 [May 2023]

markov6 Jun 2023 14:18 UTC

16 points

0 comments2 min readLW link

(aisafety.info)

(Summary) Sequence Highlights—Thinking Better on Purpose

qazzquimby2 Aug 2022 17:45 UTC

33 points

3 comments11 min readLW link

The 101 Space You Will Always Have With You

Screwtape29 Nov 2023 4:56 UTC

253 points

21 comments6 min readLW link 1 review

Natural Abstractions: Key claims, Theorems, and Critiques

LawrenceC, Leon Lang and Erik Jenner

16 Mar 2023 16:37 UTC

237 points

22 comments45 min readLW link 2 reviews

Stampy’s AI Safety Info—New Distillations #1 [March 2023]

markov7 Apr 2023 11:06 UTC

42 points

0 comments2 min readLW link

(aisafety.info)

Stampy’s AI Safety Info—New Distillations #2 [April 2023]

markov9 May 2023 13:31 UTC

25 points

1 comment1 min readLW link

(aisafety.info)

Expansive translations: considerations and possibilities

ozziegooen18 Sep 2020 15:39 UTC

43 points

15 comments6 min readLW link

DARPA Digital Tutor: Four Months to Total Technical Expertise?

SebastianG 6 Jul 2020 23:34 UTC

213 points

22 comments7 min readLW link

Rationality Dojo

lsusr24 Apr 2022 0:53 UTC

14 points

5 comments1 min readLW link

Calling for Student Submissions: AI Safety Distillation Contest

Aris24 Apr 2022 1:53 UTC

48 points

15 comments4 min readLW link

Infra-Bayesianism Distillation: Realizability and Decision Theory

Thomas Larsen26 May 2022 21:57 UTC

40 points

9 comments18 min readLW link

[Request for Distillation] Coherence of Distributed Decisions With Different Inputs Implies Conditioning

johnswentworth25 Apr 2022 17:01 UTC

22 points

14 comments2 min readLW link

The Solomonoff Prior is Malign

Mark Xu14 Oct 2020 1:33 UTC

173 points

52 comments16 min readLW link 3 reviews

How to get people to produce more great exposition? Some strategies and their assumptions

riceissa25 May 2022 22:30 UTC

26 points

10 comments3 min readLW link

Beren’s “Deconfusing Direct vs Amortised Optimisation”

DragonGod7 Apr 2023 8:57 UTC

52 points

10 comments3 min readLW link

Cheat sheet of AI X-risk

momom229 Jun 2023 4:28 UTC

19 points

1 comment7 min readLW link

But why would the AI kill us?

So8res17 Apr 2023 18:42 UTC

132 points

95 comments2 min readLW link

Exposition as science: some ideas for how to make progress

riceissa8 Jul 2022 1:29 UTC

21 points

1 comment8 min readLW link

A distillation of Evan Hubinger’s training stories (for SERI MATS)

Daphne_W18 Jul 2022 3:38 UTC

15 points

1 comment10 min readLW link

Rationality, Pedagogy, and “Vibes”: Quick Thoughts

Nicholas / Heather Kross15 Jul 2023 2:09 UTC

14 points

1 comment4 min readLW link

Pitfalls with Proofs

scasper19 Jul 2022 22:21 UTC

19 points

21 comments8 min readLW link

AGI ruin mostly rests on strong claims about alignment and deployment, not about society

Rob Bensinger24 Apr 2023 13:06 UTC

70 points

8 comments6 min readLW link

Distillation Contest—Results and Recap

Aris29 Jul 2022 17:40 UTC

34 points

0 comments7 min readLW link

[Question] Which intro-to-AI-risk text would you recommend to...

Sherrinford1 Aug 2022 9:36 UTC

12 points

1 comment1 min readLW link

Seeking PCK (Pedagogical Content Knowledge)

CFAR!Duncan12 Aug 2022 4:15 UTC

59 points

11 comments5 min readLW link

A concise sum-up of the basic argument for AI doom

Mergimio H. Doefevmil24 Apr 2023 17:37 UTC

11 points

6 comments2 min readLW link

AI alignment as “navigating the space of intelligent behaviour”

Nora_Ammann23 Aug 2022 13:28 UTC

18 points

0 comments6 min readLW link

Alignment is hard. Communicating that, might be harder

Eleni Angelou1 Sep 2022 16:57 UTC

7 points

8 comments3 min readLW link

How To Know What the AI Knows—An ELK Distillation

Fabien Roger4 Sep 2022 0:46 UTC

7 points

0 comments5 min readLW link

Learning-theoretic agenda reading list

Vanessa Kosoy9 Nov 2023 17:25 UTC

99 points

1 comment2 min readLW link 1 review

Summaries: Alignment Fundamentals Curriculum

Leon Lang18 Sep 2022 13:08 UTC

44 points

3 comments1 min readLW link

(docs.google.com)

Superposition is not “just” neuron polysemanticity

LawrenceC26 Apr 2024 23:22 UTC

64 points

4 comments13 min readLW link

Power-Seeking AI and Existential Risk

Antonio Franca11 Oct 2022 22:50 UTC

6 points

0 comments9 min readLW link

Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?

Neel Nanda1 Nov 2022 23:56 UTC

69 points

16 comments1 min readLW link

(youtu.be)

Distillation Experiment: Chunk-Knitting

DirectedEvolution7 Nov 2022 19:56 UTC

9 points

1 comment6 min readLW link

The No Free Lunch theorem for dummies

Steven Byrnes5 Dec 2022 21:46 UTC

37 points

16 comments3 min readLW link

Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceC19 Dec 2022 22:52 UTC

143 points

30 comments18 min readLW link

Summary of 80k’s AI problem profile

JakubK1 Jan 2023 7:30 UTC

7 points

0 comments5 min readLW link

(forum.effectivealtruism.org)

Results from the Turing Seminar hackathon

Charbel-Raphaël, jeanne_ and WCargo

7 Dec 2023 14:50 UTC

29 points

1 comment6 min readLW link

Induction heads—illustrated

CallumMcDougall2 Jan 2023 15:35 UTC

113 points

9 comments3 min readLW link

AI Safety Info Distillation Fellowship

Robert Miles and mwatkins

17 Feb 2023 16:16 UTC

47 points

3 comments3 min readLW link

How ARENA course material gets made

CallumMcDougall2 Jul 2024 18:04 UTC

41 points

2 comments7 min readLW link

[Question] Is “Strong Coherence” Anti-Natural?

DragonGod11 Apr 2023 6:22 UTC

23 points

25 comments2 min readLW link

Uncertainty in all its flavours

Cleo Nardo9 Jan 2024 16:21 UTC

27 points

6 comments35 min readLW link

An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2

Neel Nanda7 Jul 2024 17:39 UTC

134 points

16 comments25 min readLW link

AI Safety − 7 months of discussion in 17 minutes

Zoe Williams15 Mar 2023 23:41 UTC

25 points

0 comments1 min readLW link

Getting rational now or later: navigating procrastination and time-inconsistent preferences for new rationalists

milo_thoughts26 Feb 2024 19:38 UTC

1 point

0 comments8 min readLW link

Dialogue introduction to Singular Learning Theory

Olli Järviniemi8 Jul 2024 16:58 UTC

97 points

14 comments8 min readLW link

Announcing AISafety.info’s Write-a-thon (June 16-18) and Second Distillation Fellowship (July 3-October 2)

steven04613 Jun 2023 2:03 UTC

33 points

1 comment2 min readLW link

Join AISafety.info’s Distillation Hackathon (Oct 6-9th)

smallsilo1 Oct 2023 18:43 UTC

21 points

0 comments2 min readLW link

(forum.effectivealtruism.org)

Poker is a bad game for teaching epistemics. Figgie is a better one.

rossry8 Jul 2024 6:05 UTC

104 points

47 comments11 min readLW link

(blog.rossry.net)

Hiatus: EA and LW post summaries

Zoe Williams17 May 2023 17:17 UTC

14 points

0 comments1 min readLW link

Podcast: “How the Smart Money teaches trading with Ricki Heicklen” (Patrick McKenzie interviewing)

rossry11 Jul 2024 22:49 UTC

20 points

2 comments1 min readLW link

(www.complexsystemspodcast.com)

AI Safety Strategies Landscape

Charbel-Raphaël9 May 2024 17:33 UTC

34 points

1 comment42 min readLW link

Explaining Impact Markets

Saul Munn31 Jan 2024 9:51 UTC

95 points

2 comments3 min readLW link

(www.brasstacks.blog)

Expertise and advice

John_Maxwell27 May 2012 1:49 UTC

25 points

4 comments1 min readLW link

Graceful Degradation

Screwtape5 Nov 2024 23:57 UTC

79 points

8 comments4 min readLW link

Avoid Unnecessarily Political Examples

Raemon11 Jan 2021 5:41 UTC

106 points

42 comments3 min readLW link

Discovery fiction for the Pythagorean theorem

riceissa19 Jan 2021 2:09 UTC

16 points

1 comment4 min readLW link

Inversion of theorems into definitions when generalizing

riceissa4 Aug 2019 17:44 UTC

25 points

3 comments5 min readLW link

Think like an educator about code quality

Adam Zerner27 Mar 2021 5:43 UTC

44 points

8 comments9 min readLW link

99% shorter

philh27 May 2021 19:50 UTC

16 points

0 comments6 min readLW link

(reasonableapproximation.net)

An Apprentice Experiment in Python Programming

konstell and gilch

4 Jul 2021 3:29 UTC

67 points

4 comments9 min readLW link

Learning Math in Time for Alignment

Nicholas / Heather Kross9 Jan 2024 1:02 UTC

32 points

3 comments3 min readLW link

An Apprentice Experiment in Python Programming, Part 2

konstell and gilch

29 Jul 2021 7:39 UTC

30 points

18 comments10 min readLW link

Calibration proverbs

Malmesbury11 Jan 2022 5:11 UTC

76 points

19 comments1 min readLW link

“Deep Learning” Is Function Approximation

Zack_M_Davis21 Mar 2024 17:50 UTC

98 points

28 comments10 min readLW link

(zackmdavis.net)

[Closed] Job Offering: Help Communicate Infrabayesianism

abramdemski, Vanessa Kosoy and Diffractor

23 Mar 2022 18:35 UTC

129 points

22 comments1 min readLW link

Summary: “How to Write Quickly...” by John Wentworth

Pablo Repetto11 Apr 2022 23:26 UTC

4 points

0 comments2 min readLW link

(pabloernesto.github.io)

[Question] What to include in a guest lecture on existential risks from AI?

Aryeh Englander13 Apr 2022 17:03 UTC

20 points

9 comments1 min readLW link

AI Safety 101 : Capabilities—Human Level AI, What? How? and When?

markov and Charbel-Raphaël

7 Mar 2024 17:29 UTC

46 points

8 comments54 min readLW link

A Pedagogical Guide to Corrigibility

A.H.17 Jan 2024 11:45 UTC

6 points

3 comments16 min readLW link

Dreams of “Mathopedia”

Nicholas / Heather Kross2 Jun 2023 1:30 UTC

40 points

16 comments2 min readLW link

(www.thinkingmuchbetter.com)

Observations on Teaching for Four Weeks

ClareChiaraVincent6 May 2024 16:55 UTC

50 points

14 comments3 min readLW link

An Illustrated Summary of “Robust Agents Learn Causal World Model”

Dalcy14 Dec 2024 15:02 UTC

57 points

2 comments10 min readLW link

Failure Modes of Teaching AI Safety

Eleni Angelou25 Jun 2024 19:07 UTC

20 points

0 comments1 min readLW link

Distillation of ‘Do language models plan for future tokens’

TheManxLoiner27 Jun 2024 20:57 UTC

26 points

2 comments6 min readLW link

DIY RLHF: A simple implementation for hands on experience

Mike Vaiana and AE Studio

10 Jul 2024 12:07 UTC

28 points

0 comments6 min readLW link

Video Intro to Guaranteed Safe AI

Mike Vaiana, Diogo de Lucena and AE Studio

11 Jul 2024 17:53 UTC

27 points

0 comments1 min readLW link

(youtu.be)

Proof Explained for “Robust Agents Learn Causal World Model”

Dalcy22 Dec 2024 15:06 UTC

18 points

0 comments15 min readLW link

Distillation Of DeepSeek-Prover V1.5

IvanLin15 Oct 2024 18:53 UTC

4 points

1 comment3 min readLW link

Concrete Methods for Heuristic Estimation on Neural Networks

Oliver Daniels14 Nov 2024 5:07 UTC

28 points

0 comments27 min readLW link

How I got so excited about HowTruthful

Bruce Lewis9 Nov 2023 18:49 UTC

17 points

3 comments5 min readLW link

Does anyone use advanced media projects?

ryan_b20 Jun 2018 23:33 UTC

33 points

5 comments1 min readLW link

Teaching the Unteachable

Eliezer Yudkowsky3 Mar 2009 23:14 UTC

55 points

18 comments6 min readLW link

The Fundamental Question—Rationality computer game design

Kaj_Sotala13 Feb 2013 13:45 UTC

61 points

68 comments9 min readLW link

Zetetic explanation

Benquo27 Aug 2018 0:12 UTC

95 points

138 comments6 min readLW link

(benjaminrosshoffman.com)

Paternal Formats

abramdemski9 Jun 2019 1:26 UTC

51 points

35 comments2 min readLW link

The Stanley Parable: Making philosophy fun

Nathan112322 May 2023 2:15 UTC

6 points

3 comments3 min readLW link

Teachable Rationality Skills

Eliezer Yudkowsky27 May 2011 21:57 UTC

74 points

263 comments1 min readLW link

Five-minute rationality techniques

sketerpot10 Aug 2010 2:24 UTC

72 points

237 comments2 min readLW link

Just One Sentence

Eliezer Yudkowsky5 Jan 2013 1:27 UTC

96 points

143 comments1 min readLW link

Media bias

PhilGoetz5 Jul 2009 16:54 UTC

39 points

47 comments1 min readLW link

The RAIN Framework for Informational Effectiveness

ozziegooen13 Feb 2019 12:54 UTC

37 points

16 comments6 min readLW link

The Up-Goer Five Game: Explaining hard ideas with simple words

Rob Bensinger5 Sep 2013 5:54 UTC

44 points

82 comments2 min readLW link

Tutor-GPT & Pedagogical Reasoning

courtlandleer5 Jun 2023 17:53 UTC

26 points

3 comments4 min readLW link

A comparison of causal scrubbing, causal abstractions, and related methods

Erik Jenner, Adrià Garriga-alonso and Egor Zverev

8 Jun 2023 23:40 UTC

73 points

3 comments22 min readLW link

Rationality Games & Apps Brainstorming

lukeprog9 Jul 2012 3:04 UTC

42 points

59 comments2 min readLW link

How not to be a Naïve Computationalist

diegocaleiro13 Apr 2011 19:45 UTC

39 points

36 comments2 min readLW link

Dense Math Notation

JK_Ravenclaw1 Apr 2011 3:37 UTC

33 points

23 comments1 min readLW link

AIS 101: Task decomposition for scalable oversight

Charbel-Raphaël25 Jul 2023 13:34 UTC

27 points

0 comments19 min readLW link

(docs.google.com)

Numeracy neglect—A personal postmortem

vlad.proex27 Sep 2020 15:12 UTC

81 points

29 comments9 min readLW link

Paper digestion: “May We Have Your Attention Please? Human-Rights NGOs and the Problem of Global Communication”

Klara Helene Nielsen20 Jul 2023 17:08 UTC

4 points

1 comment2 min readLW link

(journals.sagepub.com)

AI Safety 101 : Introduction to Vision Interpretability

jeanne_ and Charbel-Raphaël

28 Jul 2023 17:32 UTC

41 points

0 comments1 min readLW link

(github.com)

Subdivisions for Useful Distillations?

Sharat Jacob Jacob24 Jul 2023 18:55 UTC

8 points

2 comments2 min readLW link

Stampy’s AI Safety Info—New Distillations #4 [July 2023]

markov16 Aug 2023 19:03 UTC

22 points

10 comments1 min readLW link

(aisafety.info)

[Question] What AI Posts Do You Want Distilled?

brook25 Aug 2023 9:01 UTC

11 points

2 comments1 min readLW link

(forum.effectivealtruism.org)

Jan Kulveit’s Corrigibility Thoughts Distilled

brook20 Aug 2023 17:52 UTC

20 points

1 comment5 min readLW link

Mesa-Optimization: Explain it like I’m 10 Edition

brook26 Aug 2023 23:04 UTC

20 points

1 comment6 min readLW link

Moved from Moloch’s Toolbox: Discussion re style of latest Eliezer sequence

habryka5 Nov 2017 2:22 UTC

7 points

2 comments3 min readLW link

Short Primers on Crucial Topics

lukeprog31 May 2012 0:46 UTC

35 points

24 comments1 min readLW link

Graphical tensor notation for interpretability

Jordan Taylor4 Oct 2023 8:04 UTC

140 points

11 comments19 min readLW link

An Elementary Introduction to Infra-Bayesianism

CharlesRW20 Sep 2023 14:29 UTC

16 points

0 comments1 min readLW link

Great Explanations

lukeprog31 Oct 2011 23:58 UTC

34 points

115 comments2 min readLW link

A LessWrong “rationality workbook” idea

jwhendy9 Jan 2011 17:52 UTC

26 points

26 comments3 min readLW link

Debugging the student

Adam Zerner16 Dec 2020 7:07 UTC

46 points

7 comments4 min readLW link

AI Safety 101 : Reward Misspecification

markov18 Oct 2023 20:39 UTC

30 points

4 comments31 min readLW link

Retrospective on Teaching Rationality Workshops

Neel Nanda3 Jan 2021 17:15 UTC

66 points

2 comments31 min readLW link

[Question] What currents of thought on LessWrong do you want to see distilled?

ryan_b8 Jan 2021 21:43 UTC

48 points

19 comments1 min readLW link

An Apprentice Experiment in Python Programming, Part 3

gilch and konstell

16 Aug 2021 4:42 UTC

14 points

10 comments22 min readLW link

Distilling and approaches to the determinant

AprilSR6 Apr 2022 6:34 UTC

6 points

0 comments6 min readLW link

Deriving Conditional Expected Utility from Pareto-Efficient Decisions

Thomas Kwa5 May 2022 3:21 UTC

24 points

1 comment6 min readLW link

How RL Agents Behave When Their Actions Are Modified? [Distillation post]

PabloAMC20 May 2022 18:47 UTC

22 points

0 comments8 min readLW link

Universality Unwrapped

adamShimi21 Aug 2020 18:53 UTC

29 points

2 comments18 min readLW link

Imitative Generalisation (AKA ‘Learning the Prior’)

Beth Barnes10 Jan 2021 0:30 UTC

107 points

15 comments11 min readLW link 1 review

Does SGD Produce Deceptive Alignment?

Mark Xu6 Nov 2020 23:48 UTC

96 points

9 comments16 min readLW link

Explaining inner alignment to myself

Jeremy Gillen24 May 2022 23:10 UTC

9 points

2 comments10 min readLW link

Croesus, Cerberus, and the magpies: a gentle introduction to Eliciting Latent Knowledge

Alexandre Variengien27 May 2022 17:58 UTC

17 points

0 comments16 min readLW link

Deconfusing Landauer’s Principle

EuanMcLean27 May 2022 17:58 UTC

58 points

13 comments15 min readLW link

Understanding Selection Theorems

adamk28 May 2022 1:49 UTC

41 points

3 comments7 min readLW link

Distilled—AGI Safety from First Principles

Harrison G29 May 2022 0:57 UTC

11 points

1 comment14 min readLW link

Abram Demski’s ELK thoughts and proposal—distillation

Rubi J. Hudson19 Jul 2022 6:57 UTC

19 points

8 comments16 min readLW link

AI Safety Cheatsheet / Quick Reference

Zohar Jackson20 Jul 2022 9:39 UTC

3 points

0 comments1 min readLW link

(github.com)

Announcing the Distillation for Alignment Practicum (DAP)

Jonas Hallgren and CallumMcDougall

18 Aug 2022 19:50 UTC

23 points

3 comments3 min readLW link

Epistemic Artefacts of (conceptual) AI alignment research

Nora_Ammann and particlemania

19 Aug 2022 17:18 UTC

31 points

1 comment5 min readLW link

Deep Q-Networks Explained

Jay Bailey13 Sep 2022 12:01 UTC

58 points

8 comments20 min readLW link

Understanding Infra-Bayesianism: A Beginner-Friendly Video Series

Jack Parker and Connall Garrod

22 Sep 2022 13:25 UTC

140 points

6 comments2 min readLW link

Distillation of “How Likely Is Deceptive Alignment?”

NickGabs18 Nov 2022 16:31 UTC

24 points

4 comments10 min readLW link

MIRI’s “Death with Dignity” in 60 seconds.

Cleo Nardo6 Dec 2022 17:18 UTC

58 points

4 comments1 min readLW link

Models Don’t “Get Reward”

Sam Ringer30 Dec 2022 10:37 UTC

313 points

61 comments5 min readLW link 1 review

The AI Control Problem in a wider intellectual context

philosophybear13 Jan 2023 0:28 UTC

11 points

3 comments12 min readLW link

Nothing Is Ever Taught Correctly

LVSN20 Feb 2023 22:31 UTC

5 points

3 comments1 min readLW link

The Benefits of Distillation in Research

Jonas Hallgren4 Mar 2023 17:45 UTC

15 points

2 comments5 min readLW link

[Appendix] Natural Abstractions: Key Claims, Theorems, and Critiques

LawrenceC, Erik Jenner and Leon Lang

16 Mar 2023 16:38 UTC

48 points

0 comments13 min readLW link

No comments.

Distil­la­tion & Pedagogy

Distillation & Pedagogy