All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 20232024

All Jan Feb MarAprMay Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 192021 22 23 24 25 26 27 28 29 30

WISDOMISM A Moral Theory for the Age of Information

Peter lawless 19 Apr 2024 23:06 UTC

2 points

0 comments9 min readLW link

Inducing Unprompted Misalignment in LLMs

Sam Svenningsen, evhub and Henry Sleight

19 Apr 2024 20:00 UTC

38 points

6 comments16 min readLW link

Introspection

A*19 Apr 2024 19:10 UTC

7 points

0 comments1 min readLW link

[Full Post] Progress Update #1 from the GDM Mech Interp Team

Neel Nanda, Arthur Conmy, lewis smith, Senthooran Rajamanoharan, Tom Lieberum, János Kramár and Vikrant Varma

19 Apr 2024 19:06 UTC

77 points

10 comments8 min readLW link

[Summary] Progress Update #1 from the GDM Mech Interp Team

Neel Nanda, Arthur Conmy, lewis smith, Senthooran Rajamanoharan, Tom Lieberum, János Kramár and Vikrant Varma

19 Apr 2024 19:06 UTC

72 points

0 comments3 min readLW link

Daniel Dennett has died (1942-2024)

kave19 Apr 2024 16:17 UTC

150 points

5 comments1 min readLW link

(dailynous.com)

Events Booking New Callers?

jefftk19 Apr 2024 15:50 UTC

9 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] What is the best way to talk about probabilities you expect to change with evidence/experiments?

Will_Pearson19 Apr 2024 15:35 UTC

14 points

11 comments1 min readLW link

CTMU insight: maybe consciousness can affect quantum outcomes?

zhukeepa19 Apr 2024 15:23 UTC

12 points

11 comments5 min readLW link

Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon

Esben Kran19 Apr 2024 14:46 UTC

5 points

0 comments1 min readLW link

(www.apartresearch.com)

[Question] How to Model the Future of Open-Source LLMs?

Joel Burget19 Apr 2024 14:28 UTC

25 points

9 comments1 min readLW link

What’s up with all the non-Mormons? Weirdly specific universalities across LLMs

mwatkins19 Apr 2024 13:43 UTC

40 points

13 comments27 min readLW link

[Question] If digital goods in virtual worlds increase GDP, do we actually become richer?

No77e19 Apr 2024 10:06 UTC

6 points

10 comments1 min readLW link

Experiment on repeating choices

KatjaGrace19 Apr 2024 4:20 UTC

56 points

1 comment3 min readLW link

(worldspiritsockpuppet.com)

Effective Altruists and Rationalists Views & The case for using marketing to highlight AI risks.

gilch19 Apr 2024 4:16 UTC

6 points

1 comment1 min readLW link

(youtu.be)

Cohesion and business problems

Adam Zerner19 Apr 2024 0:45 UTC

12 points

8 comments4 min readLW link

The Thermodynamics of Death

Peter lawless 19 Apr 2024 0:36 UTC

6 points

0 comments10 min readLW link

Backyard Office

jefftk19 Apr 2024 0:31 UTC

13 points

0 comments1 min readLW link

(www.jefftk.com)

hydrogen tube transport

bhauth18 Apr 2024 22:47 UTC

34 points

12 comments5 min readLW link

(www.bhauth.com)

LessOnline Festival Updates Thread

Ben Pace18 Apr 2024 21:55 UTC

59 points

26 comments1 min readLW link

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research

alamerton18 Apr 2024 18:29 UTC

25 points

4 comments16 min readLW link

I’m open for projects (sort of)

cousin_it18 Apr 2024 18:05 UTC

46 points

13 comments1 min readLW link

Blessed information, garbage information, cursed information

tailcalled18 Apr 2024 16:56 UTC

23 points

8 comments3 min readLW link

[Fiction] A Confession

Arjun Panickssery18 Apr 2024 16:28 UTC

38 points

2 comments5 min readLW link

(arjunpanickssery.substack.com)

Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight

Sam Marks18 Apr 2024 16:17 UTC

107 points

10 comments12 min readLW link

Cooperation is optimal, with weaker agents too - tldr

Ryo 18 Apr 2024 15:03 UTC

12 points

22 comments4 min readLW link

(medium.com)

How to coordinate despite our biases? - tldr

Ryo 18 Apr 2024 15:03 UTC

3 points

2 comments3 min readLW link

(medium.com)

Knowledge Base 7: Long-tail knowledge and collective intelligence

iwis18 Apr 2024 14:21 UTC

−6 points

0 comments1 min readLW link

AI #60: Oh the Humanity

Zvi18 Apr 2024 14:10 UTC

44 points

7 comments62 min readLW link

(thezvi.wordpress.com)

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)

Diffractor18 Apr 2024 8:39 UTC

33 points

2 comments19 min readLW link

An examination of GPT-2′s boring yet effective glitch

MiguelDev18 Apr 2024 5:26 UTC

5 points

3 comments3 min readLW link

[Question] What if Ethics is Provably Self-Contradictory?

Yitz18 Apr 2024 5:12 UTC

3 points

7 comments2 min readLW link

The Mom Test: Summary and Thoughts

Adam Zerner18 Apr 2024 3:34 UTC

48 points

3 comments10 min readLW link

Express interest in an “FHI of the West”

habryka18 Apr 2024 3:32 UTC

268 points

41 comments3 min readLW link

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

johnswentworth and David Lorell

18 Apr 2024 0:27 UTC

184 points

21 comments7 min readLW link

AXRP Episode 28 - Suing Labs for AI Risk with Gabriel Weil

DanielFilan17 Apr 2024 21:42 UTC

12 points

0 comments65 min readLW link

LLM Evaluators Recognize and Favor Their Own Generations

Arjun Panickssery, Sam Bowman and Shi Feng

17 Apr 2024 21:09 UTC

44 points

1 comment3 min readLW link

(tiny.cc)

SFS: Foundations of Forecasting

MAD217 Apr 2024 17:46 UTC

3 points

0 comments1 min readLW link

An ethical framework to supersede Utilitarianism

metalcrow17 Apr 2024 17:18 UTC

1 point

4 comments4 min readLW link

Moving on from community living

Vika17 Apr 2024 17:02 UTC

63 points

7 comments3 min readLW link

(vkrakovna.wordpress.com)

Staged release

Zach Stein-Perlman17 Apr 2024 16:00 UTC

11 points

4 comments2 min readLW link

[Question] Discomfort Stacking

Lewis O’Brien17 Apr 2024 14:49 UTC

5 points

12 comments1 min readLW link

FHI (Future of Humanity Institute) has shut down (2005–2024)

gwern17 Apr 2024 13:54 UTC

176 points

22 comments1 min readLW link

(www.futureofhumanityinstitute.org)

Childhood and Education Roundup #5

Zvi17 Apr 2024 13:00 UTC

36 points

4 comments25 min readLW link

(thezvi.wordpress.com)

Should we maximize the Geometric Expectation of Utility?

A.H.17 Apr 2024 10:37 UTC

5 points

17 comments9 min readLW link

Claude 3 Opus can operate as a Turing machine

Gunnar_Zarncke17 Apr 2024 8:41 UTC

36 points

2 comments1 min readLW link

(twitter.com)

When is a mind me?

Rob Bensinger17 Apr 2024 5:56 UTC

138 points

125 comments15 min readLW link

Mid-conditional love

KatjaGrace17 Apr 2024 4:00 UTC

76 points

21 comments2 min readLW link

(worldspiritsockpuppet.com)

Spending Update 2024

jefftk17 Apr 2024 2:30 UTC

20 points

2 comments3 min readLW link

(www.jefftk.com)

Anti MMAcevedo Protocol

Logan Zoellner16 Apr 2024 22:32 UTC

1 point

1 comment8 min readLW link