25 Sep 2024 22:35 UTC

40 points

2 comments1 min readLW link

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control

Sodium25 Sep 2024 21:15 UTC

58 points

4 comments2 min readLW link

Comparing Forecasting Track Records for AI Benchmarking and Beyond

ChristianWilliams25 Sep 2024 21:01 UTC

11 points

0 comments1 min readLW link

(www.metaculus.com)

Extending the Off-Switch Game: Toward a Robust Framework for AI Corrigibility

OwenChen25 Sep 2024 20:38 UTC

1 point

0 comments4 min readLW link

Evaluating Synthetic Activations composed of SAE Latents in GPT-2

Giorgi Giglemiani, nlpet, Chatrik, Jett Janiak and StefanHex

25 Sep 2024 20:37 UTC

27 points

0 comments3 min readLW link

(arxiv.org)

Climate Change And Global Warming

Zero Contradictions25 Sep 2024 19:13 UTC

−7 points

0 comments1 min readLW link

(zerocontradictions.net)

How to prevent collusion when using untrusted models to monitor each other

Buck25 Sep 2024 18:58 UTC

81 points

6 comments22 min readLW link

Alignment by default: the simulation hypothesis

gb25 Sep 2024 16:26 UTC

21 points

39 comments1 min readLW link

A Dialogue on Deceptive Alignment Risks

Rauno Arike25 Sep 2024 16:10 UTC

11 points

0 comments18 min readLW link

[Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs

Yohan Mathew, joanv, robert mccarthy, ollie, Nandi and Dylan Cope

25 Sep 2024 14:52 UTC

30 points

2 comments4 min readLW link

(arxiv.org)

AIS Hungary Operations Officer role, Deadline: 2024 October 6th

gergogaspar25 Sep 2024 13:54 UTC

1 point

0 comments1 min readLW link

[Intuitive self-models] 2. Conscious Awareness

Steven Byrnes25 Sep 2024 13:29 UTC

79 points

48 comments16 min readLW link

Book Review: On the Edge: The Business

Zvi25 Sep 2024 12:20 UTC

38 points

0 comments36 min readLW link

(thezvi.wordpress.com)

Join the $10K AutoHack 2024 Tournament

Paul Bricman25 Sep 2024 11:54 UTC

5 points

0 comments1 min readLW link

(noemaresearch.com)

[Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

chanind, TomasD, hrdkbhatnagar and Joseph Bloom

25 Sep 2024 9:31 UTC

69 points

15 comments3 min readLW link

(arxiv.org)

[Question] Non-human centric view of existence

ZY25 Sep 2024 5:47 UTC

−3 points

14 comments1 min readLW link

How to Live Well: My Philosophy of Life

Philosofer12325 Sep 2024 1:13 UTC

−8 points

0 comments1 min readLW link

(philosofer123.wordpress.com)

An open response to Wittkotter and Yampolskiy

Donald Hobson24 Sep 2024 22:27 UTC

8 points

0 comments4 min readLW link

A Path out of Insufficient Views

Unreal24 Sep 2024 20:00 UTC

55 points

46 comments9 min readLW link

Why the 2024 election matters, the AI risk case for Harris, & what you can do to help

Alex Lintz24 Sep 2024 19:32 UTC

23 points

7 comments20 min readLW link

How to give effectively to US Dems

Hauke Hillebrandt24 Sep 2024 14:38 UTC

2 points

0 comments1 min readLW link

(www.slowboring.com)

[Question] How do you follow AI (safety) news?

PeterH24 Sep 2024 13:58 UTC

4 points

2 comments1 min readLW link

Instruction Following without Instruction Tuning

Bogdan Ionut Cirstea24 Sep 2024 13:49 UTC

17 points

0 comments1 min readLW link

(arxiv.org)

Book Review: On the Edge: The Gamblers

Zvi24 Sep 2024 11:50 UTC

35 points

1 comment89 min readLW link

(thezvi.wordpress.com)

Editing at the Take Level

jefftk24 Sep 2024 11:30 UTC

12 points

1 comment1 min readLW link

(www.jefftk.com)

Using LLM’s for AI Foundation research and the Simple Solution assumption

Donald Hobson24 Sep 2024 11:00 UTC

5 points

0 comments2 min readLW link

When to join a respectability cascade

B Jacobs24 Sep 2024 7:54 UTC

10 points

1 comment2 min readLW link

(bobjacobs.substack.com)

Sampling Effects on Strategic Behavior in Supervised Learning Models

Phil Bland24 Sep 2024 7:44 UTC

1 point

0 comments6 min readLW link

In Praise of the Beatitudes

robotelvis24 Sep 2024 5:08 UTC

9 points

7 comments3 min readLW link

(messyprogress.substack.com)

[Question] What are the best arguments for/against AIs being “slightly ‘nice’”?

Raemon24 Sep 2024 2:00 UTC

93 points

54 comments31 min readLW link

Struggling like a Shadowmoth

Raemon24 Sep 2024 0:47 UTC

175 points

38 comments7 min readLW link

Bounty for Evidence on Some of Palisade Research’s Beliefs

benwr and Jeffrey Ladish

23 Sep 2024 20:01 UTC

46 points

4 comments2 min readLW link

Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data

jefftk23 Sep 2024 17:25 UTC

27 points

0 comments4 min readLW link

(naobservatory.org)

A primer on ML in antibody engineering

Abhishaike Mahajan23 Sep 2024 17:03 UTC

11 points

0 comments25 min readLW link

(www.owlposting.com)

[Question] On the subject of in-house large language models versus implementing frontier models

Annapurna23 Sep 2024 15:00 UTC

7 points

1 comment1 min readLW link

A basic systems architecture for AI agents that do autonomous research

Buck23 Sep 2024 13:58 UTC

187 points

15 comments8 min readLW link

Book Review: On the Edge: The Fundamentals

Zvi23 Sep 2024 13:40 UTC

64 points

3 comments31 min readLW link

(thezvi.wordpress.com)

Switching to a 4GB SD

jefftk23 Sep 2024 11:20 UTC

11 points

1 comment1 min readLW link

(www.jefftk.com)

Model evals for dangerous capabilities

Zach Stein-Perlman23 Sep 2024 11:00 UTC

51 points

9 comments3 min readLW link

Foundations—Why Britain has stagnated [crosspost]

Nathan Young23 Sep 2024 10:43 UTC

23 points

1 comment57 min readLW link

(ukfoundations.co)

Boons and banes

dkl923 Sep 2024 6:18 UTC

7 points

0 comments2 min readLW link

(dkl9.net)

The Sun is big, but superintelligences will not spare Earth a little sunlight

Eliezer Yudkowsky23 Sep 2024 3:39 UTC

203 points

141 comments13 min readLW link

GPT4o is still sensitive to user-induced bias when writing code

Reed and LorenzoPacchiardi

22 Sep 2024 21:04 UTC

6 points

0 comments4 min readLW link

My 10-year retrospective on trying SSRIs

Kaj_Sotala22 Sep 2024 20:30 UTC

76 points

10 comments2 min readLW link

(kajsotala.fi)

Making Eggs Without Ovaries

Niko_McCarty and Metacelsus

22 Sep 2024 17:44 UTC

56 points

3 comments16 min readLW link

(www.asimov.press)

Becket First

jefftk22 Sep 2024 17:10 UTC

9 points

0 comments2 min readLW link

(www.jefftk.com)

On the Role of Proto-Languages

adamShimi22 Sep 2024 16:50 UTC

54 points

1 comment4 min readLW link

(epistemologicalfascinations.substack.com)

I’m creating a deep dive podcast episode about the original Leverage Research—would you like to take part?

spencerg22 Sep 2024 14:03 UTC

37 points

2 comments1 min readLW link

Who Feels More Alone?

marvinscheffold22 Sep 2024 11:54 UTC

−8 points

2 comments39 min readLW link

Another argument against maximizer-centric alignment paradigms

Fiora from Rosebloom22 Sep 2024 7:28 UTC

63 points

39 comments8 min readLW link