24 Jul 2023 19:46 UTC

51 points

2 comments1 min readLW link

(coda.io)

Subdivisions for Useful Distillations?

Sharat Jacob Jacob24 Jul 2023 18:55 UTC

8 points

2 comments2 min readLW link

Optimizing For Approval And Disapproval

Thoth Hermes24 Jul 2023 18:46 UTC

−1 points

0 comments12 min readLW link

(thothhermes.substack.com)

An Opinionated Guide to Computability and Complexity (Post #0)

Noosphere8924 Jul 2023 17:53 UTC

10 points

10 comments3 min readLW link

Slowing down AI progress is an underexplored alignment strategy

Norman Borlaug24 Jul 2023 16:56 UTC

42 points

27 comments5 min readLW link

Anticipation in LLMs

derek shiller24 Jul 2023 15:53 UTC

6 points

0 comments13 min readLW link

The cone of freedom (or, freedom might only be instrumentally valuable)

dkl924 Jul 2023 15:38 UTC

−10 points

6 comments2 min readLW link

(dkl9.net)

A reformulation of Finite Factored Sets

Matthias G. Mayer24 Jul 2023 13:02 UTC

76 points

1 comment8 min readLW link

Brain Efficiency Cannell Prize Contest Award Ceremony

Alexander Gietelink Oldenziel24 Jul 2023 11:30 UTC

145 points

12 comments7 min readLW link

[Crosspost] An AI Pause Is Humanity’s Best Bet For Preventing Extinction (TIME)

otto.barten24 Jul 2023 10:07 UTC

12 points

0 comments7 min readLW link

(time.com)

Cryonics and Regret

MvB24 Jul 2023 9:16 UTC

187 points

35 comments2 min readLW link 1 review

Rationality !== Winning

Raemon24 Jul 2023 2:53 UTC

163 points

51 comments4 min readLW link

[Question] Which rationality posts are begging for further practical development?

LoganStrohl23 Jul 2023 22:22 UTC

60 points

17 comments1 min readLW link

Please speak unpredictably

dkl923 Jul 2023 22:09 UTC

10 points

16 comments1 min readLW link

(dkl9.net)

QAPR 5: grokking is maybe not that big a deal?

Quintin Pope23 Jul 2023 20:14 UTC

114 points

15 comments9 min readLW link

My favorite AI governance research this year so far

Zach Stein-Perlman23 Jul 2023 16:30 UTC

26 points

1 comment7 min readLW link

(blog.aiimpacts.org)

“Justice, Cherryl.”

Zack_M_Davis23 Jul 2023 16:16 UTC

85 points

21 comments9 min readLW link 1 review

Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive

Justausername23 Jul 2023 16:08 UTC

4 points

1 comment3 min readLW link

Autogynephilia discourse is so absurdly bad on all sides

tailcalled23 Jul 2023 13:12 UTC

44 points

24 comments2 min readLW link

Examples of Prompts that Make GPT-4 Output Falsehoods

scasper and Luke Bailey

22 Jul 2023 20:21 UTC

21 points

5 comments6 min readLW link

Think like a consultant not a salesperson

Adam Zerner22 Jul 2023 19:31 UTC

16 points

5 comments2 min readLW link

Optimization, loss set at variance in RL

Clairstan22 Jul 2023 18:25 UTC

1 point

1 comment3 min readLW link

Compute Thresholds: proposed rules to mitigate risk of a “lab leak” accident during AI training runs

davidad22 Jul 2023 18:09 UTC

80 points

2 comments2 min readLW link

Apollo Neuro Follow Up

Elizabeth22 Jul 2023 17:20 UTC

28 points

0 comments1 min readLW link

(acesounderglass.com)

Expert trap – Ways out (Part 3 of 3)

Paweł Sysiak22 Jul 2023 13:06 UTC

4 points

0 comments9 min readLW link

GPTs’ ability to keep a secret is weirdly prompt-dependent

Mateusz Bagiński, Filip Sondej and Marcel Windys

22 Jul 2023 12:21 UTC

31 points

0 comments9 min readLW link

Replacing the Big Air Purifier

jefftk22 Jul 2023 12:10 UTC

10 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] I’m consistently overwhelmed by basic obligations. Are there any paradigm shifts or other rationality-based tips that would be helpful?

Benjamin Hendricks21 Jul 2023 21:10 UTC

66 points

40 comments2 min readLW link

Fundamentally Fuzzy Concepts Can’t Have Crisp Definitions: Cooperation and Alignment vs Math and Physics

VojtaKovarik21 Jul 2023 21:03 UTC

12 points

18 comments3 min readLW link

Cooking Air Quality

jefftk21 Jul 2023 19:30 UTC

16 points

1 comment2 min readLW link

(www.jefftk.com)

Reward Hacking from a Causal Perspective

tom4everitt, Francis Rhys Ward, sbenthall, James Fox, mattmacdermott and RyanCarey

21 Jul 2023 18:27 UTC

29 points

6 comments7 min readLW link

News : Biden-⁠Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI

Jonathan Claybrough21 Jul 2023 18:00 UTC

65 points

10 comments2 min readLW link

(www.whitehouse.gov)

The UAP Disclosure Act of 2023 and its implications

andeslodes21 Jul 2023 17:21 UTC

36 points

47 comments20 min readLW link

(www.congress.gov)

To use computers well, learn their rules

dkl921 Jul 2023 17:00 UTC

4 points

6 comments4 min readLW link

(dkl9.net)

BCIs and the ecosystem of modular minds

beren21 Jul 2023 15:58 UTC

88 points

14 comments11 min readLW link

Priorities for the UK Foundation Models Taskforce

Andrea_Miotti21 Jul 2023 15:23 UTC

105 points

4 comments5 min readLW link

(www.conjecture.dev)

Training Process Transparency through Gradient Interpretability: Early experiments on toy language models

robertzk and evhub

21 Jul 2023 14:52 UTC

56 points

1 comment1 min readLW link

[Question] Can AI Alignment please create a Reddit-like platform that would make it much easier for alignment researchers to find and help each other?

Georgeo5721 Jul 2023 14:03 UTC

−5 points

2 comments1 min readLW link

Case for Foundation Models beyond English

Varshul Gupta21 Jul 2023 13:59 UTC

1 point

0 comments3 min readLW link

(dubverseblack.substack.com)

Meta is hiring for LLM red teaming position

Michael Tontchev21 Jul 2023 13:57 UTC

7 points

0 comments1 min readLW link

(us.meta.talentnet.community)

[Linkpost] Interpreting Multimodal Video Transformers Using Brain Recordings

Bogdan Ionut Cirstea21 Jul 2023 11:26 UTC

5 points

0 comments1 min readLW link

Berlin AI Alignment Open Meetup August 2023

GuyP21 Jul 2023 10:58 UTC

1 point

0 comments1 min readLW link

Decoding intermediate activations in llama-2-7b

Nina Panickssery21 Jul 2023 5:35 UTC

37 points

3 comments4 min readLW link

GPT-2′s positional embedding matrix is a helix

AdamYedidia21 Jul 2023 4:16 UTC

44 points

21 comments4 min readLW link

Problems with predictive history classes

dkl920 Jul 2023 23:28 UTC

15 points

5 comments1 min readLW link

Announcement: AI Narrations Available for All New LessWrong Posts

Solenoid_Entity, Ruby, Raemon, PeterH and TYPE III AUDIO

20 Jul 2023 22:17 UTC

71 points

28 comments1 min readLW link

AI #21: The Cup Overfloweth

Zvi20 Jul 2023 21:30 UTC

47 points

4 comments64 min readLW link

(thezvi.wordpress.com)

All AGI Safety questions welcome (especially basic ones) [July 2023]

smallsilo20 Jul 2023 20:20 UTC

38 points

40 comments2 min readLW link

(forum.effectivealtruism.org)

Growth of Publicly Available Genetic Sequencing Data

jefftk20 Jul 2023 19:50 UTC

11 points

2 comments1 min readLW link

(www.jefftk.com)

Progress links and tweets, 2023-07-20: “A goddess enthroned on a car”

jasoncrawford20 Jul 2023 18:28 UTC

12 points

4 comments2 min readLW link

(rootsofprogress.org)