scasper

Karma: 2,012

https://stephencasper.com/

EIS IX: Interpretability and Adversaries

scasperFeb 20, 2023, 6:25 PM

30 points

8 comments8 min readLW link

EIS VIII: An Engineer’s Understanding of Deceptive Alignment

scasperFeb 19, 2023, 3:25 PM

30 points

5 comments4 min readLW link

EIS VII: A Challenge for Mechanists

scasperFeb 18, 2023, 6:27 PM

36 points

4 comments3 min readLW link

EIS VI: Critiques of Mechanistic Interpretability Work in AI Safety

scasperFeb 17, 2023, 8:48 PM

49 points

9 comments12 min readLW link

EIS V: Blind Spots In AI Safety Interpretability Research

scasperFeb 16, 2023, 7:09 PM

57 points

24 comments10 min readLW link

EIS IV: A Spotlight on Feature Attribution/Saliency

scasperFeb 15, 2023, 6:46 PM

19 points

1 comment4 min readLW link

EIS III: Broad Critiques of Interpretability Research

scasperFeb 14, 2023, 6:24 PM

20 points

2 comments11 min readLW link

EIS II: What is “Interpretability”?

scasperFeb 9, 2023, 4:48 PM

28 points

6 comments4 min readLW link

The Engineer’s Interpretability Sequence (EIS) I: Intro

scasperFeb 9, 2023, 4:28 PM

46 points

24 comments3 min readLW link

Avoiding perpetual risk from TAI

scasperDec 26, 2022, 10:34 PM

15 points

6 comments5 min readLW link

Existential AI Safety is NOT separate from near-term applications

scasperDec 13, 2022, 2:47 PM

37 points

17 comments3 min readLW link

Where to be an AI Safety Professor

scasperDec 7, 2022, 7:09 AM

31 points

12 comments2 min readLW link

The Slippery Slope from DALLE-2 to Deepfake Anarchy

scasperNov 5, 2022, 2:53 PM

17 points

9 comments11 min readLW link

[Linkpost] A survey on over 300 works about interpretability in deep networks

scasperSep 12, 2022, 7:07 PM

97 points

7 comments2 min readLW link

(arxiv.org)

Pitfalls with Proofs

scasperJul 19, 2022, 10:21 PM

19 points

21 comments8 min readLW link

A daily routine I do for my AI safety research work

scasperJul 19, 2022, 9:58 PM

22 points

7 comments1 min readLW link

Deep Dives: My Advice for Pursuing Work in Research

scasperMar 11, 2022, 5:56 PM

33 points

2 comments3 min readLW link

The Achilles Heel Hypothesis for AI

scasperOct 13, 2020, 2:35 PM

20 points

6 comments1 min readLW link

Procrastination Paradoxes: the Good, the Bad, and the Ugly

scasperAug 6, 2020, 7:47 PM

21 points

4 comments10 min readLW link

Solipsism is Underrated

scasperMar 28, 2020, 6:09 PM

11 points

30 comments3 min readLW link