16 Jun 2024 21:19 UTC

56 points

1 comment16 min readLW link

YM’s Shortform

YM16 Jun 2024 20:57 UTC

3 points

1 comment1 min readLW link

“Is-Ought” is Fraught

MiSteR Kittty16 Jun 2024 17:27 UTC

−5 points

2 comments1 min readLW link

The type of AI humanity has chosen to create so far is unsafe, for soft social reasons and not technical ones.

l8c16 Jun 2024 13:31 UTC

−6 points

2 comments1 min readLW link

Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

Henry Cai16 Jun 2024 13:01 UTC

7 points

0 comments7 min readLW link

(arxiv.org)

CIV: a story

Richard_Ngo15 Jun 2024 22:36 UTC

98 points

6 comments9 min readLW link

(www.narrativeark.xyz)

Yann LeCun: We only design machines that minimize costs [therefore they are safe]

tailcalled15 Jun 2024 17:25 UTC

19 points

8 comments1 min readLW link

(twitter.com)

(Appetitive, Consummatory) ≈ (RL, reflex)

Steven Byrnes15 Jun 2024 15:57 UTC

36 points

1 comment3 min readLW link

Two LessWrong speed friending experiments

mikko and sanyer

15 Jun 2024 10:52 UTC

52 points

3 comments4 min readLW link

Claude’s dark spiritual AI futurism

jessicata15 Jun 2024 0:57 UTC

22 points

7 comments43 min readLW link

(unstableontology.com)

[Question] When is “unfalsifiable implies false” incorrect?

VojtaKovarik15 Jun 2024 0:28 UTC

3 points

11 comments1 min readLW link

MIRI’s June 2024 Newsletter

Harlan14 Jun 2024 23:02 UTC

74 points

18 comments2 min readLW link

(intelligence.org)

Language for Goal Misgeneralization: Some Formalisms from my MSc Thesis

Giulio14 Jun 2024 19:35 UTC

4 points

0 comments8 min readLW link

(www.giuliostarace.com)

Shard Theory—is it true for humans?

Rishika14 Jun 2024 19:21 UTC

68 points

7 comments15 min readLW link

When fine-tuning fails to elicit GPT-3.5′s chess abilities

Theodore Chapman14 Jun 2024 18:50 UTC

42 points

3 comments9 min readLW link

Results from the AI x Democracy Research Sprint

Esben Kran, jordine and Jason Hoelscher-Obermaier

14 Jun 2024 16:40 UTC

13 points

0 comments6 min readLW link

Rational Animations’ intro to mechanistic interpretability

Writer14 Jun 2024 16:10 UTC

45 points

1 comment11 min readLW link

(youtu.be)

Why keep a diary, and why wish for large language models

DanielFilan14 Jun 2024 16:10 UTC

9 points

1 comment2 min readLW link

(danielfilan.com)

The Leopold Model: Analysis and Reactions

Zvi14 Jun 2024 15:10 UTC

108 points

19 comments57 min readLW link

(thezvi.wordpress.com)

[Question] Thoughts on Francois Chollet’s belief that LLMs are far away from AGI?

O O14 Jun 2024 6:32 UTC

26 points

17 comments1 min readLW link

Research Report: Alternative sparsity methods for sparse autoencoders with OthelloGPT.

Andrew Quaisley14 Jun 2024 0:57 UTC

17 points

5 comments12 min readLW link

Slowed ASI—a possible technical strategy for alignment

Lester Leong14 Jun 2024 0:57 UTC

5 points

2 comments3 min readLW link

Conceptual Typography “spells it out”

milanrosko14 Jun 2024 0:39 UTC

15 points

0 comments1 min readLW link

Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)

Andrew_Critch14 Jun 2024 0:16 UTC

338 points

38 comments4 min readLW link

OpenAI appoints Retired U.S. Army General Paul M. Nakasone to Board of Directors

Joel Burget13 Jun 2024 21:28 UTC

35 points

10 comments1 min readLW link

(openai.com)

AI #68: Remarkably Reasonable Reactions

Zvi13 Jun 2024 16:30 UTC

46 points

11 comments50 min readLW link

(thezvi.wordpress.com)

Four Futures For Cognitive Labor

Maxwell Tabarrok13 Jun 2024 12:56 UTC

14 points

10 comments4 min readLW link

(www.maximum-progress.com)

Underrated Proverbs

Arjun Panickssery13 Jun 2024 12:30 UTC

10 points

9 comments1 min readLW link

(arjunpanickssery.substack.com)

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Teun van der Weij, Felix Hofstätter, Ollie J, Sam F. Brown and Francis Rhys Ward

13 Jun 2024 10:04 UTC

84 points

10 comments2 min readLW link

(arxiv.org)

AI-created simulations, nature of DOOM

amelia13 Jun 2024 3:44 UTC

1 point

0 comments1 min readLW link

Probably Not a Ghost Story

George Ingebretsen12 Jun 2024 22:55 UTC

27 points

4 comments3 min readLW link

AiPhone

Zvi12 Jun 2024 22:20 UTC

63 points

4 comments14 min readLW link

(thezvi.wordpress.com)

microwave drilling is impractical

bhauth12 Jun 2024 22:16 UTC

58 points

14 comments4 min readLW link

(www.bhauth.com)

Phonosemantic Duplication

bitcoinssg12 Jun 2024 20:19 UTC

5 points

0 comments1 min readLW link

My AI Model Delta Compared To Christiano

johnswentworth12 Jun 2024 18:19 UTC

190 points

73 comments4 min readLW link

AI: 4 levels of impact [micropost]

Mati_Roy12 Jun 2024 16:58 UTC

8 points

0 comments1 min readLW link

Aggregative principles approximate utilitarian principles

Cleo Nardo12 Jun 2024 16:27 UTC

28 points

3 comments23 min readLW link

Sticker Shortcut Fallacy — The Real Worst Argument in the World

ymeskhout12 Jun 2024 14:52 UTC

25 points

15 comments4 min readLW link

(www.ymeskhout.com)

Long-Term Future Fund: May 2023 to March 2024 Payout recommendations

Linch12 Jun 2024 13:46 UTC

40 points

0 comments1 min readLW link

Anthropic’s Certificate of Incorporation

Zach Stein-Perlman12 Jun 2024 13:00 UTC

115 points

4 comments4 min readLW link

Calculance: A “Core” Ability

milanrosko12 Jun 2024 7:21 UTC

4 points

0 comments1 min readLW link

AXRP Episode 33 - RLHF Problems with Scott Emmons

DanielFilan12 Jun 2024 3:30 UTC

34 points

0 comments56 min readLW link

[New Feature] Your Subscribed Feed

Ruby and RobertM

11 Jun 2024 22:45 UTC

69 points

8 comments4 min readLW link

Open Thread Summer 2024

habryka11 Jun 2024 20:57 UTC

22 points

98 comments1 min readLW link

Can efficiency-adjustable reporting thresholds close a loophole in Biden’s executive order on AI?

Jemal Young11 Jun 2024 20:56 UTC

4 points

1 comment2 min readLW link

“Full Automation” is a Slippery Metric

ozziegooen11 Jun 2024 19:56 UTC

30 points

1 comment1 min readLW link

AI takeoff and nuclear war

owencb11 Jun 2024 19:36 UTC

76 points

6 comments11 min readLW link

(strangecities.substack.com)

[Question] What do people think about the polymarket Eth Etf resolution?

edge_retainer11 Jun 2024 18:34 UTC

1 point

0 comments1 min readLW link

Let’s Design A School, Part 3.1: Bringing it all together with the Sieve Model

Sable11 Jun 2024 17:03 UTC

13 points

2 comments7 min readLW link

(affablyevil.substack.com)

How to eliminate cut?

jessicata11 Jun 2024 15:54 UTC

21 points

0 comments14 min readLW link

(unstableontology.com)