17 Dec 2024 23:58 UTC

108 points

1 comment2 min readLW link

bending light

Recurrented17 Dec 2024 22:40 UTC

1 point

5 comments3 min readLW link

(futuring.substack.com)

Careless thinking: A theory of bad thinking

Nathan Young17 Dec 2024 18:23 UTC

44 points

17 comments9 min readLW link

(nathanpmyoung.substack.com)

The Second Gemini

Zvi17 Dec 2024 15:50 UTC

23 points

0 comments11 min readLW link

(thezvi.wordpress.com)

AIS Hungary is hiring a part-time Technical Lead! (Deadline: Dec 31st)

gergogaspar17 Dec 2024 14:12 UTC

1 point

0 comments2 min readLW link

Everything you care about is in the map

Tahp17 Dec 2024 14:05 UTC

17 points

27 comments3 min readLW link

Reality is Fractal-Shaped

silentbob17 Dec 2024 13:52 UTC

17 points

1 comment8 min readLW link

Trying to translate when people talk past each other

Kaj_Sotala17 Dec 2024 9:40 UTC

41 points

12 comments6 min readLW link

(kajsotala.fi)

What is “wireheading”?

Vishakha17 Dec 2024 7:49 UTC

10 points

0 comments1 min readLW link

(aisafety.info)

3 What If We Could Map Our Motivation as Channels of Flow?

P. João17 Dec 2024 7:47 UTC

3 points

0 comments6 min readLW link

2 What if Life Comes with a Natural Calibration to Estimate you?

P. João17 Dec 2024 7:47 UTC

1 point

0 comments10 min readLW link

1 What If We Rebuild Motivation with the Fermi ESTIMATion?

P. João17 Dec 2024 7:46 UTC

5 points

0 comments3 min readLW link

Where do you put your ideas?

CstineSublime17 Dec 2024 7:26 UTC

8 points

20 comments1 min readLW link

Elevating Air Purifiers

jefftk17 Dec 2024 1:40 UTC

25 points

0 comments1 min readLW link

(www.jefftk.com)

0 Motivation Mapping through Information Theory

P. João16 Dec 2024 23:17 UTC

8 points

0 comments28 min readLW link

A dataset of questions on decision-theoretic reasoning in Newcomb-like problems

Caspar Oesterheld, Ethan Perez and Chi Nguyen

16 Dec 2024 22:42 UTC

47 points

1 comment2 min readLW link

(arxiv.org)

A practical guide to tiling the universe with hedonium

Vittu Perkele16 Dec 2024 21:25 UTC

−9 points

1 comment1 min readLW link

(perkeleperusing.substack.com)

AI Safety Seed Funding Network—Join as a Donor or Investor

Alexandra Bos16 Dec 2024 19:30 UTC

30 points

0 comments1 min readLW link

Is this a better way to do matchmaking?

Chipmonk16 Dec 2024 19:06 UTC

9 points

4 comments1 min readLW link

I read every major AI lab’s safety plan so you don’t have to

sarahhw16 Dec 2024 18:51 UTC

20 points

0 comments12 min readLW link

(longerramblings.substack.com)

Grokking revisited: reverse engineering grokking modulo addition in LSTM

Nikita Khomich and Danik

16 Dec 2024 18:48 UTC

4 points

0 comments6 min readLW link

[Question] Do infinite alternatives make AI alignment impossible?

Dakara16 Dec 2024 18:11 UTC

11 points

2 comments1 min readLW link

Progress links and short notes, 2024-12-16

jasoncrawford16 Dec 2024 17:24 UTC

7 points

0 comments2 min readLW link

(newsletter.rootsofprogress.org)

Effective Altruism FAQ

omnizoid16 Dec 2024 16:27 UTC

0 points

7 comments12 min readLW link

Variably compressibly studies are fun

dkl916 Dec 2024 16:00 UTC

0 points

0 comments2 min readLW link

(dkl9.net)

AIs Will Increasingly Attempt Shenanigans

Zvi16 Dec 2024 15:20 UTC

108 points

2 comments26 min readLW link

(thezvi.wordpress.com)

Testing which LLM architectures can do hidden serial reasoning

Filip Sondej16 Dec 2024 13:48 UTC

80 points

9 comments4 min readLW link

NeuroAI for AI safety: A Differential Path

nz and Patrick Mineault

16 Dec 2024 13:17 UTC

14 points

0 comments7 min readLW link

(arxiv.org)

Circling as practice for “just be yourself”

Kaj_Sotala16 Dec 2024 7:40 UTC

86 points

5 comments4 min readLW link

(kajsotala.fi)

Reanalyzing the 2023 Expert Survey on Progress in AI

AI Impacts16 Dec 2024 6:10 UTC

8 points

0 comments1 min readLW link

(blog.aiimpacts.org)

Ideas for benchmarking LLM creativity

gwern16 Dec 2024 5:18 UTC

53 points

11 comments1 min readLW link

(gwern.net)

Comparing the AirFanta 3Pro to the Coway AP-1512

jefftk16 Dec 2024 1:40 UTC

13 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] are IQ tests a good measure of intelligence?

KvmanThinking15 Dec 2024 23:06 UTC

0 points

5 comments1 min readLW link

Madison Secular Solstice

svfritz15 Dec 2024 21:52 UTC

1 point

0 comments1 min readLW link

[Question] Is AI alignment a purely functional property?

Roko15 Dec 2024 21:42 UTC

13 points

7 comments1 min readLW link

[Question] How counterfactual are logical counterfactuals?

Donald Hobson15 Dec 2024 21:16 UTC

11 points

10 comments1 min readLW link

Debunking the myth of safe AI

henophilia15 Dec 2024 17:44 UTC

−11 points

7 comments1 min readLW link

(henophilia.substack.com)

Introducing Avatarism: A Rational Framework for Building actual Heaven

ratiba ro15 Dec 2024 17:17 UTC

2 points

2 comments2 min readLW link

A Public Choice Take on Effective Altruism

vaishnav9215 Dec 2024 16:58 UTC

9 points

4 comments3 min readLW link

(www.optimaloutliers.com)

World Models I’m Currently Building

temporary15 Dec 2024 16:29 UTC

5 points

1 comment1 min readLW link

(samuelshadrach.com)

Dress Up For Secular Solstice

Gordon H.S.15 Dec 2024 16:28 UTC

33 points

13 comments7 min readLW link

Remap your caps lock key

bilalchughtai15 Dec 2024 14:03 UTC

82 points

17 comments1 min readLW link

Effective Evil’s AI Misalignment Plan

lsusr15 Dec 2024 7:39 UTC

78 points

9 comments3 min readLW link

Write Good Enough Code, Quickly

Oliver Daniels15 Dec 2024 4:45 UTC

19 points

10 comments8 min readLW link

How to Edit an Essay into a Solstice Speech?

Czynski15 Dec 2024 4:30 UTC

5 points

1 comment1 min readLW link

(thepdv.wordpress.com)

How Your Physiology Affects the Mind’s Projection Fallacy

YanLyutnev14 Dec 2024 21:10 UTC

2 points

0 comments6 min readLW link

Introducing the Evidence Color Wheel

Larry Lee14 Dec 2024 16:08 UTC

6 points

0 comments3 min readLW link

An Illustrated Summary of “Robust Agents Learn Causal World Model”

Dalcy14 Dec 2024 15:02 UTC

57 points

2 comments10 min readLW link

Best-of-N Jailbreaking

John Hughes, saraprice, Aengus Lynch, Rylan Schaeffer, Fazl, Henry Sleight, Ethan Perez and mrinank_sharma

14 Dec 2024 4:58 UTC

78 points

6 comments2 min readLW link

(arxiv.org)

D&D.Sci Dungeonbuilding: the Dungeon Tournament

aphyer14 Dec 2024 4:30 UTC

47 points

14 comments3 min readLW link