11 Jun 2024 22:45 UTC

69 points

9 comments4 min readLW link

Open Thread Summer 2024

habryka11 Jun 2024 20:57 UTC

22 points

99 comments1 min readLW link

Can efficiency-adjustable reporting thresholds close a loophole in Biden’s executive order on AI?

Jemal Young11 Jun 2024 20:56 UTC

4 points

1 comment2 min readLW link

“Full Automation” is a Slippery Metric

ozziegooen11 Jun 2024 19:56 UTC

30 points

1 comment1 min readLW link

AI takeoff and nuclear war

owencb11 Jun 2024 19:36 UTC

78 points

6 comments11 min readLW link

(strangecities.substack.com)

[Question] What do people think about the polymarket Eth Etf resolution?

edge_retainer11 Jun 2024 18:34 UTC

1 point

0 comments1 min readLW link

Let’s Design A School, Part 3.1: Bringing it all together with the Sieve Model

Sable11 Jun 2024 17:03 UTC

13 points

2 comments7 min readLW link

(affablyevil.substack.com)

How to eliminate cut?

jessicata11 Jun 2024 15:54 UTC

21 points

0 comments14 min readLW link

(unstableontology.com)

my favourite Scott Sumner blog posts

DMMF11 Jun 2024 14:40 UTC

26 points

0 comments3 min readLW link

(danfrank.ca)

[Question] Is anyone developing optimisation-robust interpretability methods?

Jono11 Jun 2024 13:14 UTC

6 points

0 comments1 min readLW link

Keep the Grass Guessing

JackOfAllTrades11 Jun 2024 7:29 UTC

4 points

0 comments2 min readLW link

“Metastrategic Brainstorming”, a core building-block skill

Raemon11 Jun 2024 4:27 UTC

59 points

5 comments6 min readLW link

AI Debate Stability: Addressing Self-Defeating Responses

Annie Sorkin11 Jun 2024 3:03 UTC

9 points

0 comments3 min readLW link

Corrigibility could make things worse

ThomasCederborg11 Jun 2024 0:55 UTC

9 points

6 comments6 min readLW link

Emotional issues often have an immediate payoff

Chipmonk10 Jun 2024 23:39 UTC

26 points

2 comments4 min readLW link

(chrislakin.blog)

DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking

tailcalled10 Jun 2024 21:20 UTC

29 points

13 comments2 min readLW link

Plop! Goes the Concept

Jonathan Moregård10 Jun 2024 19:23 UTC

6 points

0 comments8 min readLW link

(honestliving.substack.com)

What can we learn from orcas?

Jonasb10 Jun 2024 18:01 UTC

1 point

0 comments8 min readLW link

(www.denominations.io)

How to build a data center, by Construction Physics

TheManxLoiner10 Jun 2024 17:38 UTC

2 points

0 comments1 min readLW link

(www.construction-physics.com)

Observations for doing debate with models behind APIs

PoD12310 Jun 2024 16:22 UTC

3 points

0 comments3 min readLW link

My AI Model Delta Compared To Yudkowsky

johnswentworth10 Jun 2024 16:12 UTC

277 points

102 comments4 min readLW link

[Question] Good ways to monetarily profit from the increasing demand for power?

Matt Goldenberg10 Jun 2024 15:29 UTC

12 points

5 comments1 min readLW link

The Evolution towards the Blank Slate

Arturo Macias10 Jun 2024 15:20 UTC

−7 points

0 comments3 min readLW link

10 Public “I was wrong” Admissions by Scientists and Intellectuals

Hashem ElAssad10 Jun 2024 14:19 UTC

0 points

3 comments1 min readLW link

[Valence series] 4. Valence & Liking / Admiring

Steven Byrnes10 Jun 2024 14:19 UTC

46 points

12 comments14 min readLW link

5. Open Corrigibility Questions

Max Harms10 Jun 2024 14:09 UTC

29 points

0 comments7 min readLW link

4. Existing Writing on Corrigibility

Max Harms10 Jun 2024 14:08 UTC

47 points

15 comments106 min readLW link

On Dwarksh’s Podcast with Leopold Aschenbrenner

Zvi10 Jun 2024 12:40 UTC

101 points

7 comments59 min readLW link

(thezvi.wordpress.com)

Summary of Situational Awareness—The Decade Ahead

Oscar10 Jun 2024 8:44 UTC

6 points

2 comments1 min readLW link

(forum.effectivealtruism.org)

Why I don’t believe in the placebo effect

transhumanist_atom_understander10 Jun 2024 2:37 UTC

131 points

22 comments9 min readLW link

Soviet comedy film recommendations

Nina Panickssery9 Jun 2024 23:40 UTC

42 points

11 comments2 min readLW link

(open.substack.com)

The Data Wall is Important

JustisMills9 Jun 2024 22:54 UTC

40 points

20 comments2 min readLW link

(justismills.substack.com)

Two Family Dance Flyers

jefftk9 Jun 2024 20:50 UTC

13 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] What happens to existing life sentences under LEV?

O O9 Jun 2024 17:49 UTC

5 points

7 comments1 min readLW link

3b. Formal (Faux) Corrigibility

Max Harms9 Jun 2024 17:18 UTC

21 points

13 comments17 min readLW link

3a. Towards Formal Corrigibility

Max Harms9 Jun 2024 16:53 UTC

22 points

2 comments19 min readLW link

Introducing SARA: a new activation steering technique

Alejandro Tlaie9 Jun 2024 15:33 UTC

17 points

7 comments6 min readLW link

“What the hell is a representation, anyway?” | Clarifying AI interpretability with tools from philosophy of cognitive science | Part 1: Vehicles vs. contents

IwanWilliams9 Jun 2024 14:19 UTC

9 points

1 comment4 min readLW link

Exploring Llama-3-8B MLP Neurons

ntt1239 Jun 2024 14:19 UTC

10 points

0 comments4 min readLW link

(neuralblog.github.io)

Demystifying “Alignment” through a Comic

milanrosko9 Jun 2024 8:24 UTC

106 points

19 comments1 min readLW link

Dumbing down

Martin Sustrik9 Jun 2024 6:50 UTC

70 points

0 comments4 min readLW link

What if a tech company forced you to move to NYC?

KatjaGrace9 Jun 2024 6:30 UTC

56 points

22 comments1 min readLW link

(worldspiritsockpuppet.com)

[Question] What should I do? (long term plan about starting an AI lab)

not_a_cat9 Jun 2024 0:45 UTC

2 points

1 comment2 min readLW link

Searching for the Root of the Tree of Evil

Ivan Vendrov8 Jun 2024 17:05 UTC

36 points

14 comments5 min readLW link

(nothinghuman.substack.com)

2. Corrigibility Intuition

Max Harms8 Jun 2024 15:52 UTC

65 points

10 comments33 min readLW link

Two easy things that maybe Just Work to improve AI discourse

jacobjacob8 Jun 2024 15:51 UTC

190 points

35 comments2 min readLW link

I made an AI safety fellowship. What I wish I knew.

Ruben Castaing8 Jun 2024 15:23 UTC

12 points

0 comments2 min readLW link

Alignment Gaps

kcyras8 Jun 2024 15:23 UTC

10 points

3 comments8 min readLW link

The Slack Double Crux, or how to negotiate with yourself

Thac08 Jun 2024 15:22 UTC

6 points

2 comments4 min readLW link

The Perils of Popularity: A Critical Examination of LessWrong’s Rational Discourse

BubbaJoeLouis8 Jun 2024 15:22 UTC

−24 points

3 comments2 min readLW link