All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 3 4 5 6 7 8 9 10 111213 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Not all biases are equal—a study of sycophancy and bias in fine-tuned LLMs

jakub_krys11 Nov 2024 23:11 UTC

8 points

0 comments7 min readLW link

AI Craftsmanship

abramdemski11 Nov 2024 22:17 UTC

65 points

7 comments4 min readLW link

Electric Grid Cyberattack: An AI-Informed Threat Model

moonlightmaze11 Nov 2024 21:34 UTC

22 points

0 comments29 min readLW link

o1 is a bad idea

abramdemski11 Nov 2024 21:20 UTC

160 points

39 comments2 min readLW link

Inferential Game: The Foraging (Ex-)Bandit

abstractapplic11 Nov 2024 16:59 UTC

27 points

4 comments1 min readLW link

The Evals Gap

Marius Hobbhahn11 Nov 2024 16:42 UTC

55 points

7 comments7 min readLW link

(www.apolloresearch.ai)

Summary: “Imagining and building wise machines: The centrality of AI metacognition” by Johnson, Karimi, Bengio, et al.

Chris_Leong11 Nov 2024 16:13 UTC

25 points

6 comments5 min readLW link

(arxiv.org)

The Online Sports Gambling Experiment Has Failed

Zvi11 Nov 2024 14:30 UTC

284 points

59 comments11 min readLW link

(thezvi.wordpress.com)

How I Learned That You Should Push Children Into Ponds

omnizoid11 Nov 2024 14:20 UTC

−3 points

3 comments4 min readLW link

The new ruling philosophy regarding AI

Mitchell_Porter11 Nov 2024 13:28 UTC

29 points

0 comments5 min readLW link

What Ketamine Therapy Is Like

Sable11 Nov 2024 11:09 UTC

47 points

8 comments6 min readLW link

(affablyevil.substack.com)

Spherical cow

dkl911 Nov 2024 3:10 UTC

7 points

0 comments1 min readLW link

(dkl9.net)

[Question] how to truly feel my beliefs?

KvmanThinking11 Nov 2024 0:04 UTC

6 points

6 comments1 min readLW link

Bay Winter Solstice 2024: song leading auditions

tcheasdfjkl10 Nov 2024 23:59 UTC

27 points

0 comments1 min readLW link

[Question] A Coordination Cookbook?

azergante10 Nov 2024 23:20 UTC

2 points

0 comments1 min readLW link

Towards a Clever Hans Test: Unmasking Sentience Biases in Chatbot Interactions

glykokalyx10 Nov 2024 22:34 UTC

4 points

0 comments1 min readLW link

Urbit New England Meetup

Conquerer Cohen10 Nov 2024 17:56 UTC

−4 points

0 comments1 min readLW link

Personal AI Planning

jefftk10 Nov 2024 14:00 UTC

68 points

11 comments2 min readLW link

(www.jefftk.com)

AI alignment via civilizational cognitive updates

AtillaYasar10 Nov 2024 9:33 UTC

1 point

10 comments6 min readLW link

[Question] How should vegans think about Methionine needs?

ChristianKl10 Nov 2024 9:28 UTC

32 points

3 comments1 min readLW link

Is P(Doom) Meaningful? Bayesian vs. Popperian Epistemology Debate

Liron9 Nov 2024 23:39 UTC

5 points

0 comments124 min readLW link

(www.youtube.com)

Bellevue Library Meetup—Nov 23

Cedar9 Nov 2024 23:05 UTC

5 points

3 comments1 min readLW link

LifeKeeper Diaries: Exploring Misaligned AI Through Interactive Fiction

Tristan Tran, stijn and Mose Wintner

9 Nov 2024 20:58 UTC

15 points

5 comments2 min readLW link

[Question] Poll: what’s your impression of altruism?

David Gross9 Nov 2024 20:28 UTC

2 points

4 comments1 min readLW link

Chaos Theory in Ecology

Elizabeth9 Nov 2024 17:50 UTC

15 points

4 comments20 min readLW link

(acesounderglass.com)

Some Comments on Recent AI Safety Developments

testingthewaters9 Nov 2024 16:44 UTC

4 points

0 comments8 min readLW link

Formalize the Hashiness Model of AGI Uncontainability

Remmelt9 Nov 2024 16:10 UTC

3 points

0 comments1 min readLW link

(docs.google.com)

Agenda Manipulation

Pazzaz9 Nov 2024 14:13 UTC

2 points

0 comments3 min readLW link

Force Sequential Output with SCP?

jefftk9 Nov 2024 12:40 UTC

9 points

4 comments1 min readLW link

(www.jefftk.com)

Anthropic teams up with Palantir and AWS to sell AI to defense customers

Matrice Jacobine9 Nov 2024 11:50 UTC

9 points

0 comments2 min readLW link

(techcrunch.com)

GPT-4o Can In Some Cases Solve Moderately Complicated Captchas

dirk9 Nov 2024 4:04 UTC

12 points

2 comments1 min readLW link

Stone Age Herbalist’s notes on ant warfare and slavery

trevor9 Nov 2024 2:40 UTC

32 points

0 comments3 min readLW link

(x.com)

LLMs Look Increasingly Like General Reasoners

eggsyntax8 Nov 2024 23:47 UTC

92 points

45 comments3 min readLW link

overengineered air filter shelving

bhauth8 Nov 2024 22:04 UTC

26 points

2 comments5 min readLW link

(bhauth.com)

Bigger Livers?

sarahconstantin8 Nov 2024 21:50 UTC

98 points

13 comments6 min readLW link

(sarahconstantin.substack.com)

New UChicago Rationality Group

Noah Birnbaum8 Nov 2024 21:20 UTC

9 points

0 comments1 min readLW link

Active Recall and Spaced Repetition are Different Things

Saul Munn8 Nov 2024 20:14 UTC

48 points

2 comments3 min readLW link

(www.brasstacks.blog)

What AI safety researchers can learn from Mahatma Gandhi

Lysandre Terrisse8 Nov 2024 19:49 UTC

−6 points

0 comments3 min readLW link

The King and the Golem—The Animation

Writer8 Nov 2024 18:23 UTC

70 points

0 comments1 min readLW link

Boring & straightforward trauma explanation

lemonhope8 Nov 2024 9:45 UTC

24 points

7 comments2 min readLW link

Curriculum of Ascension

andrew sauer7 Nov 2024 23:54 UTC

13 points

0 comments18 min readLW link

Analyzing how SAE features evolve across a forward pass

bensenberner, danibalcells, Michael Oesterle, Ediz Ucar and StefanHex

7 Nov 2024 22:07 UTC

47 points

0 comments1 min readLW link

(arxiv.org)

Markets Are Information—Beating the Sportsbooks at Their Own Game

JJXW7 Nov 2024 20:58 UTC

9 points

1 comment2 min readLW link

(thehobbyist.substack.com)

Signaling with Small Orange Diamonds

jefftk7 Nov 2024 20:20 UTC

39 points

1 comment1 min readLW link

(www.jefftk.com)

Fundamental Uncertainty: Chapter 9 - How do we live with uncertainty?

Gordon Seidoh Worley7 Nov 2024 18:15 UTC

11 points

2 comments15 min readLW link

AI #89: Trump Card

Zvi7 Nov 2024 16:30 UTC

42 points

12 comments42 min readLW link

(thezvi.wordpress.com)

Quantum Immortality: A Perspective if AI Doomers are Probably Right

avturchin and James_Miller

7 Nov 2024 16:06 UTC

10 points

55 comments14 min readLW link

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

Marcus Williams, micahcarroll, Adhyyan Narang, Constantin Weisser and Brendan Murphy

7 Nov 2024 15:39 UTC

50 points

7 comments11 min readLW link

In the Name of All That Needs Saving

pleiotroth7 Nov 2024 15:26 UTC

18 points

2 comments22 min readLW link

Agency overhang as a proxy for Sharp left turn

Eris and Iuliia Levin

7 Nov 2024 12:14 UTC

6 points

0 comments5 min readLW link