All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May Jun Jul AugSepOct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 141516 17 18 19 20 21 22 23 24 25 26 27 28 29 30

SPAR seeks advisors and students for AI safety projects (Second Wave)

mic14 Sep 2023 23:09 UTC

21 points

0 comments1 min readLW link

“Did you lock it?”

ymeskhout14 Sep 2023 21:10 UTC

33 points

36 comments2 min readLW link

(ymeskhout.substack.com)

Can I take ducks home from the park?

dynomight14 Sep 2023 21:03 UTC

67 points

8 comments3 min readLW link

(dynomight.net)

Inline Plotting in iTerm2

jefftk14 Sep 2023 20:30 UTC

13 points

0 comments1 min readLW link

(www.jefftk.com)

Destroying the fabric of the universe as an instrumental goal.

AI-doom14 Sep 2023 20:04 UTC

−7 points

5 comments1 min readLW link

The PUSA System- Repost

Jaivardhan Nawani14 Sep 2023 18:40 UTC

4 points

1 comment5 min readLW link

# Announcement of AI-Plans.com Critique-a-thon September 2023

Kabir Kumar14 Sep 2023 17:43 UTC

3 points

0 comments2 min readLW link

Cruxes for overhang

Zach Stein-Perlman14 Sep 2023 17:00 UTC

12 points

5 comments6 min readLW link

(blog.aiimpacts.org)

A Theory of Laughter—Follow-Up

Steven Byrnes14 Sep 2023 15:35 UTC

37 points

3 comments8 min readLW link

Eliciting Credit Hacking Behaviours in LLMs

omegastick14 Sep 2023 15:07 UTC

3 points

2 comments7 min readLW link

(github.com)

Instrumental Convergence Bounty

Logan Zoellner14 Sep 2023 14:02 UTC

62 points

24 comments1 min readLW link

[Question] In the age of modern AI (LLMs and beyond), is data still the new oil?

MP14 Sep 2023 13:28 UTC

4 points

1 comment2 min readLW link

AI #29: Take a Deep Breath

Zvi14 Sep 2023 12:00 UTC

65 points

21 comments21 min readLW link

(thezvi.wordpress.com)

The omnizoid—Heighn FDT Debate #3: Contra omnizoid contra me contra omnizoid contra FDT

Heighn14 Sep 2023 11:52 UTC

6 points

0 comments4 min readLW link

Highlights: Wentworth, Shah, and Murphy on “Retargeting the Search”

RobertM14 Sep 2023 2:18 UTC

85 points

4 comments8 min readLW link

Uncovering Latent Human Wellbeing in LLM Embeddings

ChengCheng, Pedro Freire, Dan H and Scott Emmons

14 Sep 2023 1:40 UTC

32 points

7 comments8 min readLW link

(far.ai)

A Call For Community: Scientific Language Learning is Still Language Learning

keltan14 Sep 2023 0:32 UTC

0 points

0 comments2 min readLW link

Mech Interp Challenge: September—Deciphering the Addition Model

CallumMcDougall13 Sep 2023 22:23 UTC

35 points

0 comments4 min readLW link

Linkpost for Jan Leike on Self-Exfiltration

Daniel Kokotajlo13 Sep 2023 21:23 UTC

59 points

1 comment2 min readLW link

(aligned.substack.com)

MLSN: #10 Adversarial Attacks Against Language and Vision Models, Improving LLM Honesty, and Tracing the Influence of LLM Training Data

aogara and Dan H

13 Sep 2023 18:03 UTC

15 points

1 comment5 min readLW link

(newsletter.mlsafety.org)

Expanding the Scope of Superposition

Derek Larson13 Sep 2023 17:38 UTC

10 points

0 comments4 min readLW link

Contra Yudkowsky on Epistemic Conduct for Author Criticism

Zack_M_Davis13 Sep 2023 15:33 UTC

69 points

38 comments7 min readLW link

Apply to lead a project during the next virtual AI Safety Camp

Linda Linsefors and Remmelt

13 Sep 2023 13:29 UTC

19 points

0 comments5 min readLW link

(aisafety.camp)

Is AI Safety dropping the ball on privacy?

markov13 Sep 2023 13:07 UTC

50 points

17 comments7 min readLW link

UDT shows that decision theory is more puzzling than ever

Wei Dai13 Sep 2023 12:26 UTC

203 points

55 comments1 min readLW link

[Question] Alignment & Capabilities: what’s the difference?

johnhalstead13 Sep 2023 11:48 UTC

6 points

3 comments1 min readLW link

Duty to rescue / Non-assistance à personne en danger

Thomas Sepulchre13 Sep 2023 9:49 UTC

15 points

5 comments3 min readLW link

The Flow-Through Fallacy

Chris_Leong13 Sep 2023 4:28 UTC

20 points

7 comments1 min readLW link

Book review: The Importance of What We Care About (Harry G. Frankfurt)

David Gross13 Sep 2023 4:17 UTC

7 points

0 comments4 min readLW link

Padding the Corner

jefftk13 Sep 2023 1:30 UTC

32 points

4 comments1 min readLW link

(www.jefftk.com)

[Question] Should an undergrad avoid a capabilities project?

Double12 Sep 2023 23:16 UTC

4 points

2 comments1 min readLW link

[Linkpost] Contra four-wheeled suitcases, sort of

Gunnar_Zarncke12 Sep 2023 20:36 UTC

18 points

4 comments1 min readLW link

(dynomight.substack.com)

Seeking Feedback on My Mechanistic Interpretability Research Agenda

RGRGRG12 Sep 2023 18:45 UTC

3 points

1 comment3 min readLW link

Automatically finding feature vectors in the OV circuits of Transformers without using probing

Jacob Dunefsky12 Sep 2023 17:38 UTC

13 points

0 comments29 min readLW link

Startup Roundup #1: Happy Demo Day

Zvi12 Sep 2023 13:20 UTC

38 points

5 comments15 min readLW link

(thezvi.wordpress.com)

[Question] Is there something fundamentally wrong with the Universe?

Caerulea-Lawrence12 Sep 2023 12:02 UTC

6 points

80 comments2 min readLW link

Stupidity is also hard

walkthroughwalls12 Sep 2023 2:45 UTC

−8 points

4 comments2 min readLW link

Apple Cider Baklava

jefftk12 Sep 2023 2:10 UTC

15 points

0 comments1 min readLW link

(www.jefftk.com)

How useful is Corrigibility?

martinkunev12 Sep 2023 0:05 UTC

11 points

4 comments5 min readLW link

Contra Heighn Contra Me Contra Functional Decision Theory

omnizoid11 Sep 2023 19:49 UTC

−10 points

14 comments6 min readLW link

Machine Evolution

Justin Bullock, Elliot Mckernon and cwdicarlo

11 Sep 2023 19:29 UTC

11 points

2 comments22 min readLW link

[Question] Is there a hard copy of the sequences available anywhere?

Cole Wyeth11 Sep 2023 19:01 UTC

3 points

1 comment1 min readLW link

Amazon KDP AI content guidelines

ChristianKl11 Sep 2023 18:36 UTC

12 points

0 comments1 min readLW link

A Case for AI Safety via Law

JWJohnston11 Sep 2023 18:26 UTC

17 points

12 comments4 min readLW link

Erdős Problems in Algorithmic Probability

Aidan Rocke11 Sep 2023 16:44 UTC

13 points

4 comments2 min readLW link

PSA: The community is in Berkeley/Oakland, not “the Bay Area”

maia11 Sep 2023 15:59 UTC

103 points

7 comments1 min readLW link

A Bat and Ball made me Sad

Darren McKee11 Sep 2023 13:48 UTC

14 points

26 comments1 min readLW link

Focus on the Hardest Part First

Johannes C. Mayer11 Sep 2023 7:53 UTC

41 points

13 comments1 min readLW link

The Promises and Pitfalls of Long-Term Forecasting

GeoVane11 Sep 2023 5:04 UTC

1 point

0 comments5 min readLW link

Logical Share Splitting

DaemonicSigil11 Sep 2023 4:08 UTC

93 points

16 comments9 min readLW link

(pbement.com)