All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 20232024

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 181920 21 22 23 24 25 26 27 28 29 30 31

Experimentation (Part 7 of “The Sense Of Physical Necessity”)

LoganStrohl18 Mar 2024 21:25 UTC

33 points

0 comments10 min readLW link

INTERVIEW: Round 2 - StakeOut.AI w/ Dr. Peter Park

jacobhaimes18 Mar 2024 21:21 UTC

5 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Neuroscience and Alignment

Garrett Baker18 Mar 2024 21:09 UTC

40 points

25 comments2 min readLW link

GPT, the magical collaboration zone, Lex Fridman and Sam Altman

Bill Benzon18 Mar 2024 20:04 UTC

3 points

1 comment3 min readLW link

Measuring Coherence of Policies in Toy Environments

dx26 and Richard_Ngo

18 Mar 2024 17:59 UTC

59 points

9 comments14 min readLW link

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Neel Nanda, János Kramár, Tom Lieberum and Rohin Shah

18 Mar 2024 17:28 UTC

19 points

0 comments1 min readLW link

(arxiv.org)

Community Notes by X

NicholasKees18 Mar 2024 17:13 UTC

124 points

15 comments7 min readLW link

[Question] Is the Basilisk pretending to be hidden in this simulation so that it can check what I would do if conditioned by a world without the Basilisk?

maybefbi18 Mar 2024 16:05 UTC

−18 points

1 comment1 min readLW link

On Devin

Zvi18 Mar 2024 13:20 UTC

148 points

34 comments11 min readLW link

(thezvi.wordpress.com)

RLLMv10 experiment

MiguelDev18 Mar 2024 8:32 UTC

5 points

0 comments2 min readLW link

Join the AI Evaluation Tasks Bounty Hackathon

Esben Kran18 Mar 2024 8:15 UTC

12 points

1 comment1 min readLW link

5 Physics Problems

DaemonicSigil and Muireall

18 Mar 2024 8:05 UTC

60 points

0 comments15 min readLW link

Inferring the model dimension of API-protected LLMs

Ege Erdil18 Mar 2024 6:19 UTC

34 points

3 comments4 min readLW link

(arxiv.org)

AI strategy given the need for good reflection

owencb18 Mar 2024 0:48 UTC

7 points

0 comments1 min readLW link

XAI releases Grok base model

Jacob G-W18 Mar 2024 0:47 UTC

11 points

3 comments1 min readLW link

(x.ai)

Toki pona FAQ

dkl917 Mar 2024 21:44 UTC

36 points

8 comments1 min readLW link

(dkl9.net)

EA ErFiN Project work

Max_He-Ho17 Mar 2024 20:42 UTC

2 points

0 comments1 min readLW link

EA ErFiN Project work

Max_He-Ho17 Mar 2024 20:37 UTC

2 points

0 comments1 min readLW link

[Question] Alice and Bob is debating on a technique. Alice says Bob should try it before denying it. Is it a fallacy or something similar?

Ooker17 Mar 2024 20:01 UTC

0 points

19 comments2 min readLW link

Is there a way to calculate the P(we are in a 2nd cold war)?

cloak17 Mar 2024 20:01 UTC

−9 points

2 comments1 min readLW link

The Worst Form Of Government (Except For Everything Else We’ve Tried)

johnswentworth17 Mar 2024 18:11 UTC

134 points

47 comments4 min readLW link

Applying simulacrum levels to hobbies, interests and goals

DMMF17 Mar 2024 16:18 UTC

15 points

2 comments4 min readLW link

(danfrank.ca)

What is the best argument that LLMs are shoggoths?

JoshuaFox17 Mar 2024 11:36 UTC

26 points

22 comments1 min readLW link

Invitation to the Princeton AI Alignment and Safety Seminar

Sadhika Malladi17 Mar 2024 1:10 UTC

6 points

1 comment1 min readLW link

Anxiety vs. Depression

Sable17 Mar 2024 0:15 UTC

85 points

35 comments3 min readLW link

(affablyevil.substack.com)

Celiefs

TheLemmaLlama16 Mar 2024 23:56 UTC

3 points

8 comments1 min readLW link

My PhD thesis: Algorithmic Bayesian Epistemology

Eric Neyman16 Mar 2024 22:56 UTC

259 points

14 comments7 min readLW link

(arxiv.org)

How people stopped dying from diarrhea so much (& other life-saving decisions)

Writer16 Mar 2024 16:00 UTC

45 points

0 comments1 min readLW link

(youtu.be)

Transformative trustbuilding via advancements in decentralized lie detection

trevor16 Mar 2024 5:56 UTC

17 points

7 comments38 min readLW link

(www.ncbi.nlm.nih.gov)

Enter the WorldsEnd

Akram Choudhary16 Mar 2024 1:34 UTC

−25 points

8 comments1 min readLW link

Strong-Misalignment: Does Yudkowsky (or Christiano, or TurnTrout, or Wolfram, or…etc.) Have an Elevator Speech I’m Missing?

Benjamin Bourlier15 Mar 2024 23:17 UTC

−4 points

3 comments16 min readLW link

Introducing METR’s Autonomy Evaluation Resources

Megan Kinniment and Beth Barnes

15 Mar 2024 23:16 UTC

90 points

0 comments1 min readLW link

(metr.github.io)

Are AIs conscious? It might depend

Logan Zoellner15 Mar 2024 23:09 UTC

6 points

6 comments3 min readLW link

Beyond Maxipok — good reflective governance as a target for action

owencb15 Mar 2024 22:22 UTC

20 points

0 comments1 min readLW link

Middle Child Phenomenon

PhilosophicalSoul15 Mar 2024 20:47 UTC

3 points

3 comments2 min readLW link

Capability or Alignment? Respect the LLM Base Model’s Capability During Alignment

Jingfeng Yang15 Mar 2024 17:56 UTC

7 points

0 comments24 min readLW link

Rational Animations offers animation production and writing services!

Writer15 Mar 2024 17:26 UTC

33 points

0 comments1 min readLW link

Improving SAE’s by Sqrt()-ing L1 & Removing Lowest Activating Features

Logan Riggs and Jannik Brinkmann

15 Mar 2024 16:30 UTC

26 points

5 comments4 min readLW link

Stuttgart, Germany—ACX Spring Meetups Everywhere 2024

Benjamin R15 Mar 2024 14:59 UTC

2 points

1 comment1 min readLW link

Controlling AGI Risk

TeaSea15 Mar 2024 4:56 UTC

6 points

8 comments4 min readLW link

Ulm, Germany—ACX Spring Meetups Everywhere 2024

Benjamin R15 Mar 2024 1:32 UTC

2 points

1 comment1 min readLW link

Newport News/ Virginia ACX Meetup

Daniel14 Mar 2024 23:46 UTC

1 point

0 comments1 min readLW link

Constructive Cauchy sequences vs. Dedekind cuts

jessicata14 Mar 2024 23:04 UTC

47 points

23 comments4 min readLW link

(unstableontology.com)

A Nail in the Coffin of Exceptionalism

Yeshua God14 Mar 2024 22:41 UTC

−17 points

0 comments3 min readLW link

Toward a Broader Conception of Adverse Selection

Ricki Heicklen14 Mar 2024 22:40 UTC

177 points

61 comments13 min readLW link

(bayesshammai.substack.com)

More people getting into AI safety should do a PhD

AdamGleave14 Mar 2024 22:14 UTC

60 points

24 comments12 min readLW link

(gleave.me)

Collection (Part 6 of “The Sense Of Physical Necessity”)

LoganStrohl14 Mar 2024 21:37 UTC

28 points

0 comments8 min readLW link

Fixed point or oscillate or noise

lemonhope14 Mar 2024 18:37 UTC

3 points

10 comments1 min readLW link

How useful is “AI Control” as a framing on AI X-Risk?

habryka and ryan_greenblatt

14 Mar 2024 18:06 UTC

70 points

4 comments34 min readLW link

Sparse autoencoders find composed features in small toy models

Evan Anders, Clement Neo, Jason Hoelscher-Obermaier and Jessica N. Howard

14 Mar 2024 18:00 UTC

33 points

12 comments15 min readLW link