27 Dec 2023 23:47 UTC

62 points

20 comments21 min readLW link

Free agents

Michele Campolo27 Dec 2023 20:20 UTC

6 points

19 comments13 min readLW link

Merry Christmas Everyone!

johnlawrenceaspden27 Dec 2023 19:49 UTC

14 points

1 comment1 min readLW link

Natural Latents: The Math

johnswentworth and David Lorell

27 Dec 2023 19:03 UTC

120 points

37 comments12 min readLW link

NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts

Mikhail Samin27 Dec 2023 18:44 UTC

42 points

17 comments1 min readLW link

Extropy magazine review

Peter lawless 27 Dec 2023 18:37 UTC

1 point

0 comments1 min readLW link

The Progress Paradox

Ben Turtel27 Dec 2023 18:26 UTC

3 points

3 comments4 min readLW link

(bturtel.substack.com)

The virtuous circle: twelve conjectures about female reproductive agency and cultural self-determination

Miles Saltiel27 Dec 2023 18:25 UTC

0 points

2 comments14 min readLW link

MSP Article Discussion Meetup: The EMH, Long-Term Investing, and Leveraged ETFs

25Hour27 Dec 2023 16:50 UTC

3 points

1 comment1 min readLW link

In Defense of Epistemic Empathy

Kevin Dorst27 Dec 2023 16:27 UTC

55 points

19 comments6 min readLW link

(kevindorst.substack.com)

Critical review of Christiano’s disagreements with Yudkowsky

Vanessa Kosoy27 Dec 2023 16:02 UTC

172 points

40 comments15 min readLW link

AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them

Roman Leventov27 Dec 2023 14:51 UTC

33 points

9 comments4 min readLW link

5. Moral Value for Sentient Animals? Alas, Not Yet

RogerDearnaley27 Dec 2023 6:42 UTC

33 points

41 comments23 min readLW link

Differential Optimization Reframes and Generalizes Utility-Maximization

J Bostock27 Dec 2023 1:54 UTC

30 points

2 comments3 min readLW link

More Thoughts on the Human-AGI War

Seth Ahrenbach27 Dec 2023 1:03 UTC

−3 points

4 comments7 min readLW link

METR is hiring!

Beth Barnes26 Dec 2023 21:00 UTC

65 points

1 comment1 min readLW link

Environmental allergies are curable? (Sublingual immunotherapy)

Chipmonk26 Dec 2023 19:05 UTC

47 points

10 comments1 min readLW link

Picasso in the Gallery of Babel

samhealy26 Dec 2023 16:25 UTC

12 points

12 comments4 min readLW link

Flagging Potentially Unfair Parenting

jefftk26 Dec 2023 12:40 UTC

69 points

1 comment1 min readLW link

(www.jefftk.com)

Link Collection: Impact Markets

Saul Munn26 Dec 2023 9:01 UTC

27 points

0 comments2 min readLW link

(www.brasstacks.blog)

How Emergency Medicine Solves the Alignment Problem

StrivingForLegibility26 Dec 2023 5:24 UTC

41 points

4 comments6 min readLW link

Rationality outreach vs. rationality teaching

Lenmar26 Dec 2023 0:37 UTC

7 points

2 comments1 min readLW link

Exploring the Residual Stream of Transformers for Mechanistic Interpretability — Explained

Zeping Yu26 Dec 2023 0:36 UTC

7 points

1 comment11 min readLW link

[Question] Anki setup best practices?

Sinclair Chen25 Dec 2023 22:34 UTC

11 points

4 comments1 min readLW link

[Question] Why does expected utility matter?

Marco Discendenti25 Dec 2023 14:47 UTC

18 points

21 comments4 min readLW link

Freeze Dried Raspberry Truffles

jefftk25 Dec 2023 14:10 UTC

14 points

0 comments1 min readLW link

(www.jefftk.com)

Pornographic and semi-pornographic ads on mainstream websites as an instance of the AI alignment problem?

greenrd25 Dec 2023 13:19 UTC

−1 points

5 comments12 min readLW link

Defense Against The Dark Arts: An Introduction

Lyrongolem25 Dec 2023 6:36 UTC

24 points

36 comments20 min readLW link

Occlusions of Moral Knowledge

herschel25 Dec 2023 5:55 UTC

−1 points

0 comments2 min readLW link

(brothernin.substack.com)

[Question] Would you have a baby in 2024?

martinkunev25 Dec 2023 1:52 UTC

24 points

76 comments1 min readLW link

align your latent spaces

bhauth24 Dec 2023 16:30 UTC

27 points

8 comments2 min readLW link

(www.bhauth.com)

Viral Guessing Game

jefftk24 Dec 2023 13:10 UTC

19 points

0 comments1 min readLW link

(www.jefftk.com)

The Sugar Alignment Problem

Adam Zerner24 Dec 2023 1:35 UTC

5 points

3 comments7 min readLW link

A Crisper Explanation of Simulacrum Levels

Thane Ruthenis23 Dec 2023 22:13 UTC

89 points

13 comments13 min readLW link

Hyperbolic Discounting and Pascal’s Mugging

Andrew Keenan Richardson23 Dec 2023 21:55 UTC

9 points

0 comments7 min readLW link

AISN #28: Center for AI Safety 2023 Year in Review

aogara and Dan H

23 Dec 2023 21:31 UTC

30 points

1 comment5 min readLW link

(newsletter.safe.ai)

“Inftoxicity” and other new words to describe malicious information and communication thereof

Jáchym Fibír23 Dec 2023 18:15 UTC

−1 points

6 comments3 min readLW link

AI’s impact on biology research: Part I, today

octopocta23 Dec 2023 16:29 UTC

31 points

6 comments2 min readLW link

AI Girlfriends Won’t Matter Much

Maxwell Tabarrok23 Dec 2023 15:58 UTC

42 points

22 comments2 min readLW link

(maximumprogress.substack.com)

The Next Right Token

jefftk23 Dec 2023 3:20 UTC

14 points

0 comments1 min readLW link

(www.jefftk.com)

Fact Finding: Do Early Layers Specialise in Local Processing? (Post 5)

Neel Nanda, Senthooran Rajamanoharan, János Kramár and Rohin Shah

23 Dec 2023 2:46 UTC

18 points

0 comments4 min readLW link

Fact Finding: How to Think About Interpreting Memorisation (Post 4)

Senthooran Rajamanoharan, Neel Nanda, János Kramár and Rohin Shah

23 Dec 2023 2:46 UTC

22 points

0 comments9 min readLW link

Fact Finding: Trying to Mechanistically Understanding Early MLPs (Post 3)

Neel Nanda, Senthooran Rajamanoharan, János Kramár and Rohin Shah

23 Dec 2023 2:46 UTC

10 points

0 comments16 min readLW link

Fact Finding: Simplifying the Circuit (Post 2)

Senthooran Rajamanoharan, Neel Nanda, János Kramár and Rohin Shah

23 Dec 2023 2:45 UTC

25 points

3 comments14 min readLW link

Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)

Neel Nanda, Senthooran Rajamanoharan, János Kramár and Rohin Shah

23 Dec 2023 2:44 UTC

108 points

9 comments22 min readLW link 1 review

Measurement tampering detection as a special case of weak-to-strong generalization

ryan_greenblatt, Fabien Roger and Buck

23 Dec 2023 0:05 UTC

57 points

10 comments4 min readLW link

How does a toy 2 digit subtraction transformer predict the difference?

Evan Anders22 Dec 2023 21:17 UTC

12 points

0 comments10 min readLW link

(evanhanders.blog)

Thoughts on Max Tegmark’s AI verification

Johannes C. Mayer22 Dec 2023 20:38 UTC

10 points

0 comments3 min readLW link

Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)

Thane Ruthenis22 Dec 2023 20:19 UTC

74 points

14 comments6 min readLW link

AI safety advocates should consider providing gentle pushback following the events at OpenAI

civilsociety22 Dec 2023 18:55 UTC

16 points

5 comments3 min readLW link