All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 262728 29 30 31

Social Balance through Embracing Social Credit

dhruvv26 Jul 2023 20:07 UTC

−39 points

9 comments3 min readLW link

Why no Roman Industrial Revolution?

jasoncrawford26 Jul 2023 19:34 UTC

62 points

30 comments3 min readLW link

(rootsofprogress.org)

Why you can’t treat decidability and complexity as a constant (Post #1)

Noosphere8926 Jul 2023 17:54 UTC

6 points

13 comments5 min readLW link

A response to the Richards et al.’s “The Illusion of AI’s Existential Risk”

Harrison Fell26 Jul 2023 17:34 UTC

1 point

0 comments10 min readLW link

Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy

Buck and ryan_greenblatt

26 Jul 2023 17:02 UTC

96 points

19 comments1 min readLW link 1 review

Neuronpedia

Johnny Lin26 Jul 2023 16:29 UTC

135 points

51 comments2 min readLW link

(neuronpedia.org)

Frontier Model Forum

Zach Stein-Perlman26 Jul 2023 14:30 UTC

27 points

0 comments4 min readLW link

(blog.google)

Podcasts: Future of Life Institute, Breakthrough Science Summit panel

jasoncrawford26 Jul 2023 14:28 UTC

8 points

0 comments1 min readLW link

(rootsofprogress.org)

Llama We Doing This Again?

Zvi26 Jul 2023 13:00 UTC

48 points

3 comments16 min readLW link

(thezvi.wordpress.com)

Frontier Model Security

Vaniver26 Jul 2023 4:48 UTC

32 points

1 comment3 min readLW link

(www.anthropic.com)

The First Room-Temperature Ambient-Pressure Superconductor

Annapurna26 Jul 2023 2:27 UTC

35 points

28 comments1 min readLW link

(arxiv.org)

Underwater Torture Chambers: The Horror Of Fish Farming

omnizoid26 Jul 2023 0:27 UTC

81 points

50 comments10 min readLW link 1 review

Contra Alexander on the Bitter Lesson and IQ

Andrew Keenan Richardson26 Jul 2023 0:07 UTC

9 points

1 comment4 min readLW link

(mechanisticmind.com)

Overcoming the MWC

Mark Freed25 Jul 2023 17:31 UTC

3 points

0 comments3 min readLW link

Russian parliamentarian: let’s ban personal computers and the Internet

RomanS25 Jul 2023 17:30 UTC

11 points

6 comments2 min readLW link

AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer

Corin Katzke, Dan H and aogara

25 Jul 2023 16:58 UTC

6 points

0 comments6 min readLW link

(newsletter.safe.ai)

“The Universe of Minds”—call for reviewers (Seeds of Science)

rogersbacon25 Jul 2023 16:53 UTC

7 points

0 comments1 min readLW link

Thoughts on Loss Landscapes and why Deep Learning works

beren25 Jul 2023 16:41 UTC

53 points

4 comments18 min readLW link

Should you work at a leading AI lab? (including in non-safety roles)

Benjamin Hilton25 Jul 2023 16:29 UTC

7 points

0 comments12 min readLW link

Whisper’s Word-Level Timestamps are Out

Varshul Gupta25 Jul 2023 14:32 UTC

−18 points

2 comments2 min readLW link

(dubverseblack.substack.com)

AIS 101: Task decomposition for scalable oversight

Charbel-Raphaël25 Jul 2023 13:34 UTC

27 points

0 comments19 min readLW link

(docs.google.com)

Anthropic Observations

Zvi25 Jul 2023 12:50 UTC

104 points

1 comment10 min readLW link

(thezvi.wordpress.com)

Autonomous Alignment Oversight Framework (AAOF)

Justausername25 Jul 2023 10:25 UTC

−9 points

0 comments4 min readLW link

How LLMs are and are not myopic

janus25 Jul 2023 2:19 UTC

134 points

16 comments8 min readLW link

Secure Hand Holding

jefftk25 Jul 2023 1:40 UTC

28 points

43 comments1 min readLW link

(www.jefftk.com)

Open problems in activation engineering

TurnTrout, woog, lisathiergart, Monte M and Ulisse Mini

24 Jul 2023 19:46 UTC

51 points

2 comments1 min readLW link

(coda.io)

Subdivisions for Useful Distillations?

Sharat Jacob Jacob24 Jul 2023 18:55 UTC

8 points

2 comments2 min readLW link

Optimizing For Approval And Disapproval

Thoth Hermes24 Jul 2023 18:46 UTC

−1 points

0 comments12 min readLW link

(thothhermes.substack.com)

An Opinionated Guide to Computability and Complexity (Post #0)

Noosphere8924 Jul 2023 17:53 UTC

10 points

10 comments3 min readLW link

Slowing down AI progress is an underexplored alignment strategy

Norman Borlaug24 Jul 2023 16:56 UTC

42 points

27 comments5 min readLW link

Anticipation in LLMs

derek shiller24 Jul 2023 15:53 UTC

6 points

0 comments13 min readLW link

The cone of freedom (or, freedom might only be instrumentally valuable)

dkl924 Jul 2023 15:38 UTC

−10 points

6 comments2 min readLW link

(dkl9.net)

A reformulation of Finite Factored Sets

Matthias G. Mayer24 Jul 2023 13:02 UTC

76 points

1 comment8 min readLW link

Brain Efficiency Cannell Prize Contest Award Ceremony

Alexander Gietelink Oldenziel24 Jul 2023 11:30 UTC

145 points

12 comments7 min readLW link

[Crosspost] An AI Pause Is Humanity’s Best Bet For Preventing Extinction (TIME)

otto.barten24 Jul 2023 10:07 UTC

12 points

0 comments7 min readLW link

(time.com)

Cryonics and Regret

MvB24 Jul 2023 9:16 UTC

187 points

35 comments2 min readLW link 1 review

Rationality !== Winning

Raemon24 Jul 2023 2:53 UTC

163 points

51 comments4 min readLW link

[Question] Which rationality posts are begging for further practical development?

LoganStrohl23 Jul 2023 22:22 UTC

60 points

17 comments1 min readLW link

Please speak unpredictably

dkl923 Jul 2023 22:09 UTC

10 points

16 comments1 min readLW link

(dkl9.net)

QAPR 5: grokking is maybe not that big a deal?

Quintin Pope23 Jul 2023 20:14 UTC

114 points

15 comments9 min readLW link

My favorite AI governance research this year so far

Zach Stein-Perlman23 Jul 2023 16:30 UTC

26 points

1 comment7 min readLW link

(blog.aiimpacts.org)

“Justice, Cherryl.”

Zack_M_Davis23 Jul 2023 16:16 UTC

85 points

21 comments9 min readLW link 1 review

Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive

Justausername23 Jul 2023 16:08 UTC

4 points

1 comment3 min readLW link

Autogynephilia discourse is so absurdly bad on all sides

tailcalled23 Jul 2023 13:12 UTC

44 points

24 comments2 min readLW link

Examples of Prompts that Make GPT-4 Output Falsehoods

scasper and Luke Bailey

22 Jul 2023 20:21 UTC

21 points

5 comments6 min readLW link

Think like a consultant not a salesperson

Adam Zerner22 Jul 2023 19:31 UTC

16 points

5 comments2 min readLW link

Optimization, loss set at variance in RL

Clairstan22 Jul 2023 18:25 UTC

1 point

1 comment3 min readLW link

Compute Thresholds: proposed rules to mitigate risk of a “lab leak” accident during AI training runs

davidad22 Jul 2023 18:09 UTC

80 points

2 comments2 min readLW link

Apollo Neuro Follow Up

Elizabeth22 Jul 2023 17:20 UTC

28 points

0 comments1 min readLW link

(acesounderglass.com)

Expert trap – Ways out (Part 3 of 3)

Paweł Sysiak22 Jul 2023 13:06 UTC

4 points

0 comments9 min readLW link