All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 20232024

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Exploring the Evolution and Migration of Different Layer Embedding in LLMs

Ruixuan Huang8 Mar 2024 15:01 UTC

6 points

0 comments8 min readLW link

[Question] When and why did ‘training’ become ‘pretraining’?

beren8 Mar 2024 14:29 UTC

16 points

6 comments1 min readLW link

A T-o-M test: ‘popcorn’ or ‘chocolate’

MiguelDev8 Mar 2024 4:24 UTC

20 points

13 comments1 min readLW link

Scenario Forecasting Workshop: Materials and Learnings

elifland and charlie_griffin

8 Mar 2024 2:30 UTC

50 points

3 comments2 min readLW link

Forecasting future gains due to post-training enhancements

elifland, Joel Becker and simeon_c

8 Mar 2024 2:11 UTC

31 points

2 comments1 min readLW link

(docs.google.com)

Do LLMs sometime simulate something akin to a dream?

Nezek8 Mar 2024 1:25 UTC

8 points

4 comments1 min readLW link

Community norms poll (2 mins)

Nathan Young7 Mar 2024 21:45 UTC

11 points

1 comment1 min readLW link

Announcing Convergence Analysis: An Institute for AI Scenario & Governance Research

David_Kristoffersson and Deric Cheng

7 Mar 2024 21:37 UTC

23 points

1 comment4 min readLW link

Woods’ new preprint on object permanence

Steven Byrnes7 Mar 2024 21:29 UTC

58 points

1 comment6 min readLW link

MATS AI Safety Strategy Curriculum

Ronny Fernandez and Ryan Kidd

7 Mar 2024 19:59 UTC

74 points

2 comments16 min readLW link

Political Biases in LLMs: Literature Review & Current Uses of AI in Elections

Yashvardhan Sharma, Robayet Hossain and Ariana Gamarra

7 Mar 2024 19:17 UTC

6 points

0 comments6 min readLW link

Evidential Correlations are Subjective, and it might be a problem

Martín Soto7 Mar 2024 18:37 UTC

26 points

6 comments14 min readLW link

AI Safety 101 : Capabilities—Human Level AI, What? How? and When?

markov and Charbel-Raphaël

7 Mar 2024 17:29 UTC

46 points

8 comments54 min readLW link

A Review of Weak to Strong Generalization [AI Safety Camp]

sevdeawesome7 Mar 2024 17:16 UTC

13 points

0 comments9 min readLW link

AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs Plus, Forecasting the Future with LLMs, and Regulatory Markets

aogara, Corin Katzke and Dan H

7 Mar 2024 16:39 UTC

8 points

0 comments8 min readLW link

(newsletter.safe.ai)

AI #54: Clauding Along

Zvi7 Mar 2024 16:00 UTC

45 points

11 comments51 min readLW link

(thezvi.wordpress.com)

Being Interested in Other People

Jonathan Moregård7 Mar 2024 10:13 UTC

14 points

1 comment3 min readLW link

(youbutbetter.substack.com)

Talking to Congress: Can constituents contacting their legislator influence policy?

Tristan Williams7 Mar 2024 9:24 UTC

14 points

0 comments1 min readLW link

Explaining the AI Alignment Problem to Tibetan Buddhist Monks

Paul Colognese7 Mar 2024 9:00 UTC

20 points

3 comments6 min readLW link

What if Alignment is Not Enough?

WillPetillo7 Mar 2024 8:10 UTC

15 points

24 comments9 min readLW link

Sparks of AGI prompts on GPT2XL and its variant, RLLMv3

MiguelDev7 Mar 2024 6:33 UTC

4 points

0 comments4 min readLW link

An AI, a box, and a threat

jwfiredragon7 Mar 2024 6:15 UTC

9 points

0 comments6 min readLW link

Mud and Despair (Part 4 of “The Sense Of Physical Necessity”)

LoganStrohl7 Mar 2024 0:14 UTC

38 points

0 comments2 min readLW link

introduction to thermal conductivity and noise management

bhauth6 Mar 2024 23:14 UTC

31 points

1 comment4 min readLW link

(www.bhauth.com)

Essaying Other Plans

Screwtape6 Mar 2024 22:59 UTC

26 points

4 comments7 min readLW link

Invest in ACX Grants projects!

Saul Munn6 Mar 2024 20:27 UTC

23 points

0 comments1 min readLW link

Vote on Anthropic Topics to Discuss

Ben Pace6 Mar 2024 19:43 UTC

75 points

55 comments1 min readLW link

Simple Kelly betting in prediction markets

jessicata6 Mar 2024 18:59 UTC

38 points

3 comments3 min readLW link

(unstablerontology.substack.com)

On Claude 3.0

Zvi6 Mar 2024 18:50 UTC

76 points

5 comments31 min readLW link

(thezvi.wordpress.com)

[Question] Why correlation, though?

numpyNaN6 Mar 2024 16:53 UTC

22 points

7 comments1 min readLW link

Using axis lines for good or evil

dynomight6 Mar 2024 14:47 UTC

150 points

39 comments4 min readLW link

(dynomight.net)

Let’s build definitely-not-conscious AI

lemonhope6 Mar 2024 7:50 UTC

4 points

18 comments1 min readLW link

Movie posters

KatjaGrace6 Mar 2024 6:20 UTC

40 points

0 comments2 min readLW link

(worldspiritsockpuppet.com)

We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To

robertzk, Connor Kissane, Arthur Conmy and Neel Nanda

6 Mar 2024 5:03 UTC

61 points

0 comments12 min readLW link

[Question] Does anyone know good essays on how different AI timelines will affect asset prices?

Tim Liptrot6 Mar 2024 4:21 UTC

8 points

2 comments1 min readLW link

Twin Cities ACX Meetup—March 2024

Timothy M.5 Mar 2024 21:15 UTC

1 point

0 comments1 min readLW link

My Clients, The Liars

ymeskhout5 Mar 2024 21:06 UTC

248 points

85 comments7 min readLW link

A conversation with Claude3 about its consciousness

rife5 Mar 2024 19:44 UTC

−4 points

3 comments1 min readLW link

(i.imgur.com)

If Ukraine fails, the world will reap fatal consequences

Danylo Zhyrko5 Mar 2024 19:42 UTC

−17 points

14 comments5 min readLW link

Making Connections with ChatGPT: The Macksey Game

Bill Benzon5 Mar 2024 18:15 UTC

5 points

2 comments11 min readLW link

[Question] Good taxonomies of all risks (small or large) from AI?

Aryeh Englander5 Mar 2024 18:15 UTC

6 points

1 comment1 min readLW link

[Question] Making 2023 ACX Prediction Results Public

Legionnaire5 Mar 2024 17:56 UTC

3 points

9 comments1 min readLW link

Social status part 2/2: everything else

Steven Byrnes5 Mar 2024 16:29 UTC

61 points

2 comments23 min readLW link

Social status part 1/2: negotiations over object-level preferences

Steven Byrnes5 Mar 2024 16:29 UTC

115 points

15 comments21 min readLW link

Two Tales of AI Takeover: My Doubts

Violet Hour5 Mar 2024 15:51 UTC

30 points

8 comments29 min readLW link

Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT

Robert_AIZI5 Mar 2024 13:55 UTC

61 points

24 comments10 min readLW link

(aizi.substack.com)

Read the Roon

Zvi5 Mar 2024 13:50 UTC

136 points

6 comments19 min readLW link

(thezvi.wordpress.com)

In defense of anthropically updating EDT

Anthony DiGiovanni5 Mar 2024 6:21 UTC

18 points

17 comments13 min readLW link

Claude Doesn’t Want to Die

garrison5 Mar 2024 6:00 UTC

22 points

3 comments1 min readLW link

(garrisonlovely.substack.com)

Many arguments for AI x-risk are wrong

TurnTrout5 Mar 2024 2:31 UTC

167 points

86 comments12 min readLW link