All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 456 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Underspecified Probabilities: A Thought Experiment

lunatic_at_largeOct 4, 2023, 10:25 PM

8 points

4 comments2 min readLW link

Fraternal Birth Order Effect and the Maternal Immune Hypothesis

BuckyOct 4, 2023, 9:18 PM

20 points

1 comment2 min readLW link

How to solve deception and still fail.

Charlie SteinerOct 4, 2023, 7:56 PM

40 points

7 comments6 min readLW link

PortAudio M1 Latency

jefftkOct 4, 2023, 7:10 PM

8 points

5 comments1 min readLW link

(www.jefftk.com)

Open Philanthropy is hiring for multiple roles across our Global Catastrophic Risks teams

aarongertlerOct 4, 2023, 6:04 PM

6 points

0 comments3 min readLW link

(forum.effectivealtruism.org)

Safeguarding Humanity: Ensuring AI Remains a Servant, Not a Master

kgldeshapriyaOct 4, 2023, 5:52 PM

−20 points

2 comments2 min readLW link

The 5 Pillars of Happiness

Gabi QUENEOct 4, 2023, 5:50 PM

−24 points

5 comments5 min readLW link

[Question] Using Reinforcement Learning to try to control the heating of a building (district heating)

Tony KarlssonOct 4, 2023, 5:47 PM

3 points

5 comments1 min readLW link

rationalistic probability(litterally just throwing shit out there)

NotaSprayer ASprayerOct 4, 2023, 5:46 PM

−30 points

8 comments2 min readLW link

AISN #23: New OpenAI Models, News from Anthropic, and Representation Engineering

aogara and Dan H

Oct 4, 2023, 5:37 PM

15 points

2 comments5 min readLW link

(newsletter.safe.ai)

I don’t find the lie detection results that surprising (by an author of the paper)

JanBOct 4, 2023, 5:10 PM

97 points

8 comments3 min readLW link

[Question] What evidence is there of LLM’s containing world models?

Chris_LeongOct 4, 2023, 2:33 PM

17 points

17 comments1 min readLW link

Entanglement and intuition about words and meaning

Bill BenzonOct 4, 2023, 2:16 PM

4 points

0 comments2 min readLW link

Why a Mars colony would lead to a first strike situation

RemmeltOct 4, 2023, 11:29 AM

−59 points

8 comments1 min readLW link

(mflb.com)

[Question] What are some examples of AIs instantiating the ‘nearest unblocked strategy problem’?

EJTOct 4, 2023, 11:05 AM

6 points

4 comments1 min readLW link

Graphical tensor notation for interpretability

Jordan TaylorOct 4, 2023, 8:04 AM

140 points

11 comments19 min readLW link

[Link] Bay Area Winter Solstice 2023

tcheasdfjkl and TheSkeward

Oct 4, 2023, 2:19 AM

18 points

3 comments1 min readLW link

(fb.me)

[Question] Who determines whether an alignment proposal is the definitive alignment solution?

MiguelDevOct 3, 2023, 10:39 PM

−1 points

6 comments1 min readLW link

AXRP Episode 25 - Cooperative AI with Caspar Oesterheld

DanielFilanOct 3, 2023, 9:50 PM

43 points

0 comments92 min readLW link

When to Get the Booster?

jefftkOct 3, 2023, 9:00 PM

50 points

15 comments2 min readLW link

(www.jefftk.com)

OpenAI-Microsoft partnership

Zach Stein-PerlmanOct 3, 2023, 8:01 PM

51 points

19 comments1 min readLW link

[Question] Current AI safety techniques?

Zach Stein-PerlmanOct 3, 2023, 7:30 PM

30 points

2 comments2 min readLW link

Testing and Automation for Intelligent Systems.

Sai Kiran KammariOct 3, 2023, 5:51 PM

−13 points

0 comments1 min readLW link

(resource-cms.springernature.com)

Metaculus Announces Forecasting Tournament to Evaluate Focused Research Organizations, in Partnership With the Federation of American Scientists

ChristianWilliamsOct 3, 2023, 4:44 PM

13 points

0 comments1 min readLW link

(www.metaculus.com)

What would it mean to understand how a large language model (LLM) works? Some quick notes.

Bill BenzonOct 3, 2023, 3:11 PM

20 points

4 comments8 min readLW link

[Question] Potential alignment targets for a sovereign superintelligent AI

Paul CologneseOct 3, 2023, 3:09 PM

29 points

4 comments1 min readLW link

Monthly Roundup #11: October 2023

ZviOct 3, 2023, 2:10 PM

42 points

12 comments35 min readLW link

(thezvi.wordpress.com)

Why We Use Money? - A Walrasian View

Savio CoelhoOct 3, 2023, 12:02 PM

4 points

3 comments8 min readLW link

Mech Interp Challenge: October—Deciphering the Sorted List Model

CallumMcDougallOct 3, 2023, 10:57 AM

23 points

0 comments3 min readLW link

Early Experiments in Reward Model Interpretation Using Sparse Autoencoders

lukemarks, Amirali Abdullah, Rauno Arike, Fazl and nothoughtsheadempty

Oct 3, 2023, 7:45 AM

17 points

0 comments5 min readLW link

Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”

Miles TurpinOct 3, 2023, 2:22 AM

31 points

0 comments9 min readLW link

My Mid-Career Transition into Biosecurity

jefftkOct 2, 2023, 9:20 PM

26 points

4 comments2 min readLW link

(www.jefftk.com)

Dall-E 3

p.b.Oct 2, 2023, 8:33 PM

37 points

9 comments1 min readLW link

(openai.com)

Thomas Kwa’s MIRI research experience

Thomas Kwa, peterbarnett, Vivek Hebbar, Jeremy Gillen, jacobjacob and Raemon

Oct 2, 2023, 4:42 PM

172 points

53 comments1 min readLW link

Population After a Catastrophe

Stan PinsentOct 2, 2023, 4:06 PM

3 points

5 comments14 min readLW link

Expectations for Gemini: hopefully not a big deal

Maxime RichéOct 2, 2023, 3:38 PM

15 points

5 comments1 min readLW link

A counterexample for measurable factor spaces

Matthias G. MayerOct 2, 2023, 3:16 PM

14 points

0 comments3 min readLW link

Will early transformative AIs primarily use text? [Manifold question]

Fabien RogerOct 2, 2023, 3:05 PM

16 points

0 comments3 min readLW link

energy landscapes of experts

bhauthOct 2, 2023, 2:08 PM

45 points

2 comments3 min readLW link

(www.bhauth.com)

Direction of Fit

NicholasKeesOct 2, 2023, 12:34 PM

34 points

0 comments3 min readLW link

The 99% principle for personal problems

Kaj_SotalaOct 2, 2023, 8:20 AM

135 points

20 comments2 min readLW link

(kajsotala.fi)

Linkpost: They Studied Dishonesty. Was Their Work a Lie?

LinchOct 2, 2023, 8:10 AM

91 points

12 comments2 min readLW link

(www.newyorker.com)

Why I got the smallpox vaccine in 2023

joecOct 2, 2023, 5:11 AM

25 points

6 comments4 min readLW link

Instrumental Convergence and human extinction.

Spiritus DeiOct 2, 2023, 12:41 AM

−10 points

3 comments7 min readLW link

Revisiting the Manifold Hypothesis

Aidan RockeOct 1, 2023, 11:55 PM

13 points

19 comments4 min readLW link

AI Alignment Breakthroughs this Week [new substack]

Logan ZoellnerOct 1, 2023, 10:13 PM

0 points

8 comments2 min readLW link

[Question] Looking for study

Robert FeinsteinOct 1, 2023, 7:52 PM

4 points

0 comments1 min readLW link

Join AISafety.info’s Distillation Hackathon (Oct 6-9th)

smallsiloOct 1, 2023, 6:43 PM

21 points

0 comments2 min readLW link

(forum.effectivealtruism.org)

Fifty Flips

abstractapplicOct 1, 2023, 3:30 PM

32 points

15 comments1 min readLW link 1 review

(h-b-p.github.io)

AI Safety Impact Markets: Your Charity Evaluator for AI Safety

Dawn DrescherOct 1, 2023, 10:47 AM

16 points

5 comments1 min readLW link

(impactmarkets.substack.com)