All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 20232024

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 456 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

EA/ACX/LW February Santa Cruz Meetup

madmail4 Feb 2024 23:26 UTC

1 point

0 comments1 min readLW link

Vitalia Rationality Meetup

veronica4 Feb 2024 19:46 UTC

1 point

0 comments1 min readLW link

Personal predictions

Daniele De Nuntiis4 Feb 2024 3:59 UTC

2 points

2 comments3 min readLW link

A sketch of acausal trade in practice

Richard_Ngo4 Feb 2024 0:32 UTC

35 points

4 comments7 min readLW link

Brute Force Manufactured Consensus is Hiding the Crime of the Century

Roko3 Feb 2024 20:36 UTC

216 points

156 comments9 min readLW link

My thoughts on the Beff Jezos—Connor Leahy debate

Ariel Kwiatkowski3 Feb 2024 19:47 UTC

−5 points

23 comments4 min readLW link

The Journal of Dangerous Ideas

rogersbacon3 Feb 2024 15:40 UTC

−25 points

4 comments5 min readLW link

(www.secretorum.life)

Attitudes about Applied Rationality

Camille Berger 3 Feb 2024 14:42 UTC

108 points

18 comments4 min readLW link

Practicing my Handwriting in 1439

Maxwell Tabarrok3 Feb 2024 13:21 UTC

11 points

0 comments3 min readLW link

(www.maximum-progress.com)

Finite Factored Sets to Bayes Nets Part 2

J Bostock3 Feb 2024 12:25 UTC

6 points

0 comments8 min readLW link

Why I no longer identify as transhumanist

Kaj_Sotala3 Feb 2024 12:00 UTC

55 points

33 comments3 min readLW link

(kajsotala.fi)

Attention SAEs Scale to GPT-2 Small

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

3 Feb 2024 6:50 UTC

77 points

4 comments8 min readLW link

Why do we need RLHF? Imitation, Inverse RL, and the role of reward

Ran W3 Feb 2024 4:00 UTC

14 points

0 comments5 min readLW link

Announcing the London Initiative for Safe AI (LISA)

James Fox, mike_safeAI and Ryan Kidd

2 Feb 2024 23:17 UTC

98 points

0 comments9 min readLW link

Survey for alignment researchers!

Cameron Berg, Judd Rosenblatt and AE Studio

2 Feb 2024 20:41 UTC

71 points

11 comments1 min readLW link

Voting Results for the 2022 Review

Ben Pace2 Feb 2024 20:34 UTC

57 points

3 comments73 min readLW link

On Dwarkesh’s 3rd Podcast With Tyler Cowen

Zvi2 Feb 2024 19:30 UTC

36 points

9 comments21 min readLW link

(thezvi.wordpress.com)

Most experts believe COVID-19 was probably not a lab leak

DanielFilan2 Feb 2024 19:28 UTC

66 points

89 comments2 min readLW link

(gcrinstitute.org)

What Failure Looks Like is not an existential risk (and alignment is not the solution)

otto.barten2 Feb 2024 18:59 UTC

13 points

12 comments9 min readLW link

Solving alignment isn’t enough for a flourishing future

mic2 Feb 2024 18:23 UTC

27 points

0 comments1 min readLW link

(papers.ssrn.com)

Manifold Markets

PeterMcCluskey2 Feb 2024 17:48 UTC

26 points

9 comments4 min readLW link

(bayesianinvestor.com)

Types of subjective welfare

MichaelStJules2 Feb 2024 9:56 UTC

10 points

3 comments1 min readLW link

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small

Joseph Bloom2 Feb 2024 6:54 UTC

102 points

37 comments15 min readLW link

Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities

porby2 Feb 2024 5:49 UTC

47 points

1 comment4 min readLW link

(1drv.ms)

Running a Prediction Market Mafia Game

Arjun Panickssery1 Feb 2024 23:24 UTC

22 points

5 comments1 min readLW link

(arjunpanickssery.substack.com)

Evaluating Stability of Unreflective Alignment

james.lucassen1 Feb 2024 22:15 UTC

49 points

10 comments18 min readLW link

(jlucassen.com)

Davidad’s Provably Safe AI Architecture—ARIA’s Programme Thesis

simeon_c1 Feb 2024 21:30 UTC

69 points

17 comments1 min readLW link

(www.aria.org.uk)

Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis

RogerDearnaley1 Feb 2024 21:15 UTC

15 points

15 comments13 min readLW link

Wrong answer bias

lemonhope1 Feb 2024 20:05 UTC

49 points

24 comments1 min readLW link

On Not Requiring Vaccination

jefftk1 Feb 2024 19:20 UTC

31 points

21 comments1 min readLW link

(www.jefftk.com)

The economy is mostly newbs (strat predictions)

lemonhope1 Feb 2024 19:15 UTC

27 points

6 comments2 min readLW link

Managing risks while trying to do good

Wei Dai1 Feb 2024 18:08 UTC

61 points

26 comments1 min readLW link

Putting multimodal LLMs to the Tetris test

Lovre and gabrielagc

1 Feb 2024 16:02 UTC

30 points

5 comments7 min readLW link

AI #49: Bioweapon Testing Begins

Zvi1 Feb 2024 15:30 UTC

37 points

11 comments42 min readLW link

(thezvi.wordpress.com)

Some Notes on Ethics

Pareto Optimal1 Feb 2024 10:18 UTC

−3 points

0 comments1 min readLW link

(paretooptimal.substack.com)

Increasingly vague interpersonal welfare comparisons

MichaelStJules1 Feb 2024 6:45 UTC

5 points

0 comments1 min readLW link

PIBBSS Speaker events comings up in February

DusanDNesic, Nora_Ammann and Lucas Teixeira

1 Feb 2024 3:28 UTC

10 points

2 comments1 min readLW link

Drone Wars Endgame

RussellThor1 Feb 2024 2:30 UTC

36 points

71 comments8 min readLW link

Sequencing Swabs

jefftk1 Feb 2024 1:50 UTC

19 points

1 comment5 min readLW link

(www.jefftk.com)

Leading The Parade

johnswentworth31 Jan 2024 22:39 UTC

147 points

31 comments9 min readLW link

Proposal for an AI Safety Prize

sweenesm31 Jan 2024 18:35 UTC

3 points

0 comments2 min readLW link

Literally Everything is Infinite

Spiral31 Jan 2024 18:31 UTC

−9 points

8 comments5 min readLW link

What fuels your ambition?

Cissy31 Jan 2024 18:30 UTC

29 points

1 comment5 min readLW link

(www.moremyself.xyz)

“Genlangs” and Zipf’s Law: Do languages generated by ChatGPT statistically look human?

Justin-Diamond31 Jan 2024 18:30 UTC

2 points

2 comments1 min readLW link

(arxiv.org)

AI, Intellectual Property, and the Techno-Optimist Revolution

Justin-Diamond31 Jan 2024 18:30 UTC

1 point

0 comments1 min readLW link

(www.researchgate.net)

A response to an attempted rebuttal of maximising ethics

JacobBowden31 Jan 2024 17:49 UTC

−5 points

8 comments3 min readLW link

My Alignment “Plan”: Avoid Strong Optimisation and Align Economy

VojtaKovarik31 Jan 2024 17:03 UTC

24 points

9 comments7 min readLW link

Where freedom comes from

Logan Kieller31 Jan 2024 16:53 UTC

−5 points

1 comment3 min readLW link

(logankieller.substack.com)

Per protocol analysis as medical malpractice

braces31 Jan 2024 16:22 UTC

53 points

8 comments1 min readLW link

Adam Smith Meets AI Doomers

James_Miller31 Jan 2024 15:53 UTC

34 points

10 comments5 min readLW link