All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 252627 28 29 30 31

Overcoming the MWC

Mark Freed25 Jul 2023 17:31 UTC

3 points

0 comments3 min readLW link

Russian parliamentarian: let’s ban personal computers and the Internet

RomanS25 Jul 2023 17:30 UTC

11 points

6 comments2 min readLW link

AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer

Corin Katzke, Dan H and aogara

25 Jul 2023 16:58 UTC

6 points

0 comments6 min readLW link

(newsletter.safe.ai)

“The Universe of Minds”—call for reviewers (Seeds of Science)

rogersbacon25 Jul 2023 16:53 UTC

7 points

0 comments1 min readLW link

Thoughts on Loss Landscapes and why Deep Learning works

beren25 Jul 2023 16:41 UTC

53 points

4 comments18 min readLW link

Should you work at a leading AI lab? (including in non-safety roles)

Benjamin Hilton25 Jul 2023 16:29 UTC

7 points

0 comments12 min readLW link

Whisper’s Word-Level Timestamps are Out

Varshul Gupta25 Jul 2023 14:32 UTC

−18 points

2 comments2 min readLW link

(dubverseblack.substack.com)

AIS 101: Task decomposition for scalable oversight

Charbel-Raphaël25 Jul 2023 13:34 UTC

27 points

0 comments19 min readLW link

(docs.google.com)

Anthropic Observations

Zvi25 Jul 2023 12:50 UTC

104 points

1 comment10 min readLW link

(thezvi.wordpress.com)

Autonomous Alignment Oversight Framework (AAOF)

Justausername25 Jul 2023 10:25 UTC

−9 points

0 comments4 min readLW link

How LLMs are and are not myopic

janus25 Jul 2023 2:19 UTC

134 points

16 comments8 min readLW link

Secure Hand Holding

jefftk25 Jul 2023 1:40 UTC

28 points

43 comments1 min readLW link

(www.jefftk.com)

Open problems in activation engineering

TurnTrout, woog, lisathiergart, Monte M and Ulisse Mini

24 Jul 2023 19:46 UTC

51 points

2 comments1 min readLW link

(coda.io)

Subdivisions for Useful Distillations?

Sharat Jacob Jacob24 Jul 2023 18:55 UTC

8 points

2 comments2 min readLW link

Optimizing For Approval And Disapproval

Thoth Hermes24 Jul 2023 18:46 UTC

−1 points

0 comments12 min readLW link

(thothhermes.substack.com)

An Opinionated Guide to Computability and Complexity (Post #0)

Noosphere8924 Jul 2023 17:53 UTC

10 points

10 comments3 min readLW link

Slowing down AI progress is an underexplored alignment strategy

Norman Borlaug24 Jul 2023 16:56 UTC

42 points

27 comments5 min readLW link

Anticipation in LLMs

derek shiller24 Jul 2023 15:53 UTC

6 points

0 comments13 min readLW link

The cone of freedom (or, freedom might only be instrumentally valuable)

dkl924 Jul 2023 15:38 UTC

−10 points

6 comments2 min readLW link

(dkl9.net)

A reformulation of Finite Factored Sets

Matthias G. Mayer24 Jul 2023 13:02 UTC

76 points

1 comment8 min readLW link

Brain Efficiency Cannell Prize Contest Award Ceremony

Alexander Gietelink Oldenziel24 Jul 2023 11:30 UTC

145 points

12 comments7 min readLW link

[Crosspost] An AI Pause Is Humanity’s Best Bet For Preventing Extinction (TIME)

otto.barten24 Jul 2023 10:07 UTC

12 points

0 comments7 min readLW link

(time.com)

Cryonics and Regret

MvB24 Jul 2023 9:16 UTC

187 points

35 comments2 min readLW link 1 review

Rationality !== Winning

Raemon24 Jul 2023 2:53 UTC

163 points

51 comments4 min readLW link

[Question] Which rationality posts are begging for further practical development?

LoganStrohl23 Jul 2023 22:22 UTC

60 points

17 comments1 min readLW link

Please speak unpredictably

dkl923 Jul 2023 22:09 UTC

10 points

16 comments1 min readLW link

(dkl9.net)

QAPR 5: grokking is maybe not that big a deal?

Quintin Pope23 Jul 2023 20:14 UTC

114 points

15 comments9 min readLW link

My favorite AI governance research this year so far

Zach Stein-Perlman23 Jul 2023 16:30 UTC

26 points

1 comment7 min readLW link

(blog.aiimpacts.org)

“Justice, Cherryl.”

Zack_M_Davis23 Jul 2023 16:16 UTC

85 points

21 comments9 min readLW link 1 review

Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive

Justausername23 Jul 2023 16:08 UTC

4 points

1 comment3 min readLW link

Autogynephilia discourse is so absurdly bad on all sides

tailcalled23 Jul 2023 13:12 UTC

44 points

24 comments2 min readLW link

Examples of Prompts that Make GPT-4 Output Falsehoods

scasper and Luke Bailey

22 Jul 2023 20:21 UTC

21 points

5 comments6 min readLW link

Think like a consultant not a salesperson

Adam Zerner22 Jul 2023 19:31 UTC

16 points

5 comments2 min readLW link

Optimization, loss set at variance in RL

Clairstan22 Jul 2023 18:25 UTC

1 point

1 comment3 min readLW link

Compute Thresholds: proposed rules to mitigate risk of a “lab leak” accident during AI training runs

davidad22 Jul 2023 18:09 UTC

80 points

2 comments2 min readLW link

Apollo Neuro Follow Up

Elizabeth22 Jul 2023 17:20 UTC

28 points

0 comments1 min readLW link

(acesounderglass.com)

Expert trap – Ways out (Part 3 of 3)

Paweł Sysiak22 Jul 2023 13:06 UTC

4 points

0 comments9 min readLW link

GPTs’ ability to keep a secret is weirdly prompt-dependent

Mateusz Bagiński, Filip Sondej and Marcel Windys

22 Jul 2023 12:21 UTC

31 points

0 comments9 min readLW link

Replacing the Big Air Purifier

jefftk22 Jul 2023 12:10 UTC

10 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] I’m consistently overwhelmed by basic obligations. Are there any paradigm shifts or other rationality-based tips that would be helpful?

Benjamin Hendricks21 Jul 2023 21:10 UTC

66 points

40 comments2 min readLW link

Fundamentally Fuzzy Concepts Can’t Have Crisp Definitions: Cooperation and Alignment vs Math and Physics

VojtaKovarik21 Jul 2023 21:03 UTC

12 points

18 comments3 min readLW link

Cooking Air Quality

jefftk21 Jul 2023 19:30 UTC

16 points

1 comment2 min readLW link

(www.jefftk.com)

Reward Hacking from a Causal Perspective

tom4everitt, Francis Rhys Ward, sbenthall, James Fox, mattmacdermott and RyanCarey

21 Jul 2023 18:27 UTC

29 points

6 comments7 min readLW link

News : Biden-⁠Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI

Jonathan Claybrough21 Jul 2023 18:00 UTC

65 points

10 comments2 min readLW link

(www.whitehouse.gov)

The UAP Disclosure Act of 2023 and its implications

andeslodes21 Jul 2023 17:21 UTC

36 points

47 comments20 min readLW link

(www.congress.gov)

To use computers well, learn their rules

dkl921 Jul 2023 17:00 UTC

4 points

6 comments4 min readLW link

(dkl9.net)

BCIs and the ecosystem of modular minds

beren21 Jul 2023 15:58 UTC

88 points

14 comments11 min readLW link

Priorities for the UK Foundation Models Taskforce

Andrea_Miotti21 Jul 2023 15:23 UTC

105 points

4 comments5 min readLW link

(www.conjecture.dev)

Training Process Transparency through Gradient Interpretability: Early experiments on toy language models

robertzk and evhub

21 Jul 2023 14:52 UTC

56 points

1 comment1 min readLW link

[Question] Can AI Alignment please create a Reddit-like platform that would make it much easier for alignment researchers to find and help each other?

Georgeo5721 Jul 2023 14:03 UTC

−5 points

2 comments1 min readLW link