All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

AllJanFeb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 141516 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

[Question] When to mention irrelevant accusations?

philh14 Jan 2023 21:58 UTC

20 points

50 comments1 min readLW link

World-Model Interpretability Is All We Need

Thane Ruthenis14 Jan 2023 19:37 UTC

35 points

22 comments21 min readLW link

Current AI Models Seem Sufficient for Low-Risk, Beneficial AI

harsimony14 Jan 2023 18:55 UTC

17 points

1 comment2 min readLW link

[Question] Basic Question about LLMs: how do they know what task to perform

Garak14 Jan 2023 13:13 UTC

1 point

3 comments1 min readLW link

Aligned with what?

Program Den14 Jan 2023 10:28 UTC

3 points

41 comments1 min readLW link

Wokism, rethinking priorities and the Bostrom case

Arturo Macias14 Jan 2023 2:27 UTC

−31 points

2 comments4 min readLW link

A general comment on discussions of genetic group differences

anonymous810114 Jan 2023 2:11 UTC

70 points

46 comments3 min readLW link

Abstractions as morphisms between (co)algebras

Erik Jenner14 Jan 2023 1:51 UTC

17 points

1 comment8 min readLW link

Concrete Reasons for Hope about AI

Zac Hatfield-Dodds14 Jan 2023 1:22 UTC

100 points

13 comments1 min readLW link

Negative Expertise

Jonas Kgomo14 Jan 2023 0:51 UTC

4 points

0 comments1 min readLW link

(twitter.com)

Mid-Atlantic AI Alignment Alliance Unconference

Quinn13 Jan 2023 20:33 UTC

7 points

2 comments1 min readLW link

Smallpox vaccines are widely available, for now

David Hornbein13 Jan 2023 20:02 UTC

26 points

5 comments1 min readLW link

How does GPT-3 spend its 175B parameters?

Robert_AIZI13 Jan 2023 19:21 UTC

41 points

14 comments6 min readLW link

(aizi.substack.com)

[ASoT] Simulators show us behavioural properties by default

Jozdien13 Jan 2023 18:42 UTC

35 points

3 comments3 min readLW link

Wheel of Consent Theory for Rationalists and Effective Altruists

adamwilder13 Jan 2023 17:59 UTC

1 point

0 comments2 min readLW link

Money is a way of thanking strangers

DirectedEvolution13 Jan 2023 17:06 UTC

13 points

5 comments4 min readLW link

Tracr: Compiled Transformers as a Laboratory for Interpretability | DeepMind

DragonGod13 Jan 2023 16:53 UTC

62 points

12 comments1 min readLW link

(arxiv.org)

How we could stumble into AI catastrophe

HoldenKarnofsky13 Jan 2023 16:20 UTC

71 points

18 comments18 min readLW link

(www.cold-takes.com)

Robustness & Evolution [MLAISU W02]

Esben Kran13 Jan 2023 15:47 UTC

10 points

0 comments3 min readLW link

(newsletter.apartresearch.com)

On Cooking With Gas

Zvi13 Jan 2023 14:20 UTC

38 points

60 comments6 min readLW link

(thezvi.wordpress.com)

Beware safety-washing

Lizka13 Jan 2023 13:59 UTC

43 points

2 comments4 min readLW link

Some Arguments Against Strong Scaling

Joar Skalse13 Jan 2023 12:04 UTC

26 points

21 comments16 min readLW link

[Question] Where do you find people who actually do things?

Ulisse Mini13 Jan 2023 6:57 UTC

7 points

12 comments1 min readLW link

[Question] Could Simulating an AGI Taking Over the World Actually Lead to a LLM Taking Over the World?

simeon_c13 Jan 2023 6:33 UTC

15 points

1 comment1 min readLW link

Burning Uptime: When your Sandbox of Empathy is Leaky and also an Hourglass

Cedar13 Jan 2023 5:18 UTC

12 points

2 comments3 min readLW link

Disentangling Shard Theory into Atomic Claims

Leon Lang13 Jan 2023 4:23 UTC

86 points

6 comments18 min readLW link

AGISF adaptation for in-person groups

Sam Marks, Xander Davies and Richard_Ngo

13 Jan 2023 3:24 UTC

44 points

2 comments3 min readLW link

Actions and Flows

Alok Singh13 Jan 2023 3:20 UTC

5 points

0 comments1 min readLW link

(alok.github.io)

A Thorough Introduction to Abstraction

RohanS13 Jan 2023 0:30 UTC

9 points

1 comment18 min readLW link

The AI Control Problem in a wider intellectual context

philosophybear13 Jan 2023 0:28 UTC

11 points

3 comments12 min readLW link

The Alignment Problems

Martín Soto12 Jan 2023 22:29 UTC

20 points

0 comments4 min readLW link

Proposal for Inducing Steganography in LMs

Logan Riggs12 Jan 2023 22:15 UTC

22 points

3 comments2 min readLW link

Announcing the 2023 PIBBSS Summer Research Fellowship

Nora_Ammann and DusanDNesic

12 Jan 2023 21:31 UTC

32 points

0 comments1 min readLW link

Victoria Krakovna on AGI Ruin, The Sharp Left Turn and Paradigms of AI Alignment

Michaël Trazzi12 Jan 2023 17:09 UTC

40 points

3 comments4 min readLW link

(www.theinsideview.ai)

[Question] What is a disagreement you have around AI safety?

tailcalled12 Jan 2023 16:58 UTC

16 points

7 comments1 min readLW link

Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning

Roman Leventov12 Jan 2023 16:43 UTC

17 points

2 comments2 min readLW link

(arxiv.org)

ChatGPT struggles to respond to the real world

Alex Flint12 Jan 2023 16:02 UTC

31 points

9 comments24 min readLW link

Covid 1/12/23: Unexpected Spike in Deaths

Zvi12 Jan 2023 14:30 UTC

31 points

2 comments8 min readLW link

(thezvi.wordpress.com)

[Linkpost] Scaling Laws for Generative Mixed-Modal Language Models

Amal 12 Jan 2023 14:24 UTC

15 points

2 comments1 min readLW link

(arxiv.org)

ea.domains—Domains Free to a Good Home

plex12 Jan 2023 13:32 UTC

24 points

0 comments1 min readLW link

VIRTUA: a novel about AI alignment

Karl von Wendt12 Jan 2023 9:37 UTC

46 points

12 comments1 min readLW link

Iron deficiencies are very bad and you should treat them

Elizabeth12 Jan 2023 9:10 UTC

108 points

34 comments11 min readLW link 1 review

(acesounderglass.com)

Nonstandard analysis in ethics

Alok Singh12 Jan 2023 5:58 UTC

−1 points

0 comments78 min readLW link

(nickbostrom.com)

Example of the nameless rationalist virtue

Alok Singh12 Jan 2023 5:45 UTC

−9 points

2 comments1 min readLW link

FFMI Gains: A List of Vitalities

porby12 Jan 2023 4:48 UTC

26 points

3 comments7 min readLW link

[Linkpost] DreamerV3: A General RL Architecture

simeon_c12 Jan 2023 3:55 UTC

23 points

3 comments1 min readLW link

(arxiv.org)

Microsoft Plans to Invest $10B in OpenAI; $3B Invested to Date | Fortune

DragonGod12 Jan 2023 3:55 UTC

23 points

10 comments2 min readLW link

(fortune.com)

Progress and research disruptiveness

Eleni Angelou12 Jan 2023 3:51 UTC

3 points

2 comments1 min readLW link

(www.nature.com)

The Fable of the AI Coomer: Why the Social Prowess of Machines is AI’s Most Proximal Threat

Ace Delgado12 Jan 2023 1:15 UTC

−10 points

4 comments4 min readLW link

Write to Think

Michael Samoilov12 Jan 2023 0:33 UTC

10 points

2 comments2 min readLW link