All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

AllJanFeb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Hiring decisions are not suitable for prediction markets

SimonM8 Jan 2024 21:11 UTC

12 points

6 comments1 min readLW link

Better Anomia

jefftk8 Jan 2024 18:40 UTC

8 points

0 comments1 min readLW link

(www.jefftk.com)

A starter guide for evals

Marius Hobbhahn, Jérémy Scheurer, Mikita Balesni, rusheb and AlexMeinke

8 Jan 2024 18:24 UTC

51 points

2 comments12 min readLW link

(www.apolloresearch.ai)

Is it justifiable for non-experts to have strong opinions about Gaza?

Yair Halberstadt and Adam Zerner

8 Jan 2024 17:31 UTC

23 points

12 comments30 min readLW link

Project ideas: Backup plans & Cooperative AI

Lukas Finnveden8 Jan 2024 17:19 UTC

18 points

0 comments1 min readLW link

(lukasfinnveden.substack.com)

Hackathon and Staying Up-to-Date in AI

jacobhaimes8 Jan 2024 17:10 UTC

11 points

0 comments1 min readLW link

(into-ai-safety.github.io)

When “yang” goes wrong

Joe Carlsmith8 Jan 2024 16:35 UTC

72 points

6 comments13 min readLW link

Task vectors & analogy making in LLMs

Sergii8 Jan 2024 15:17 UTC

9 points

1 comment4 min readLW link

(grgv.xyz)

[Question] How to find translations of a book?

Viliam8 Jan 2024 14:57 UTC

9 points

8 comments1 min readLW link

[Question] Why aren’t Yudkowsky & Bostrom getting more attention now?

JoshuaFox8 Jan 2024 14:42 UTC

14 points

8 comments1 min readLW link

2023 Prediction Evaluations

Zvi8 Jan 2024 14:40 UTC

47 points

0 comments28 min readLW link

(thezvi.wordpress.com)

There is no sharp boundary between deontology and consequentialism

quetzal_rainbow8 Jan 2024 11:01 UTC

8 points

2 comments1 min readLW link

Reflections on my first year of AI safety research

Jay Bailey8 Jan 2024 7:49 UTC

53 points

3 comments1 min readLW link

Why There Is Hope For An Alignment Solution

Darklight8 Jan 2024 6:58 UTC

10 points

0 comments12 min readLW link

Sledding Among Hazards

jefftk8 Jan 2024 3:30 UTC

19 points

5 comments1 min readLW link

(www.jefftk.com)

Utility is relative

CrimsonChin8 Jan 2024 2:31 UTC

2 points

4 comments2 min readLW link

A model of research skill

L Rudolf L8 Jan 2024 0:13 UTC

60 points

6 comments12 min readLW link

(www.strataoftheworld.com)

We shouldn’t fear superintelligence because it already exists

Spencer Chubb7 Jan 2024 17:59 UTC

−22 points

14 comments1 min readLW link

(Partial) failure in replicating deceptive alignment experiment

claudia.biancotti7 Jan 2024 17:56 UTC

1 point

0 comments1 min readLW link

Project ideas: Sentience and rights of digital minds

Lukas Finnveden7 Jan 2024 17:34 UTC

20 points

0 comments1 min readLW link

(lukasfinnveden.substack.com)

Deceptive AI ≠ Deceptively-aligned AI

Steven Byrnes7 Jan 2024 16:55 UTC

96 points

19 comments6 min readLW link

Bayesians Commit the Gambler’s Fallacy

Kevin Dorst7 Jan 2024 12:54 UTC

48 points

30 comments8 min readLW link

(kevindorst.substack.com)

Towards AI Safety Infrastructure: Talk & Outline

Paul Bricman7 Jan 2024 9:31 UTC

11 points

0 comments2 min readLW link

(www.youtube.com)

Defending against hypothetical moon life during Apollo 11

eukaryote7 Jan 2024 4:49 UTC

57 points

9 comments32 min readLW link

(eukaryotewritesblog.com)

The Sequences on YouTube

Neil 7 Jan 2024 1:44 UTC

26 points

9 comments2 min readLW link

AI Risk and the US Presidential Candidates

Zane6 Jan 2024 20:18 UTC

41 points

22 comments6 min readLW link

A Challenge to Effective Altruism’s Premises

False Name6 Jan 2024 18:46 UTC

−26 points

3 comments3 min readLW link

Lack of Spider-Man is evidence against the simulation hypothesis

RamblinDash6 Jan 2024 18:17 UTC

7 points

22 comments1 min readLW link

A Land Tax For Britain

A.H.6 Jan 2024 15:52 UTC

6 points

9 comments4 min readLW link

Book review: Trick or treatment (2008)

Fleece Minutia6 Jan 2024 15:40 UTC

1 point

0 comments2 min readLW link

Are we inside a black hole?

Jay6 Jan 2024 13:30 UTC

2 points

5 comments1 min readLW link

Survey of 2,778 AI authors: six parts in pictures

KatjaGrace6 Jan 2024 4:43 UTC

80 points

1 comment2 min readLW link

Project ideas: Epistemics

Lukas Finnveden5 Jan 2024 23:41 UTC

43 points

4 comments1 min readLW link

(lukasfinnveden.substack.com)

Almost everyone I’ve met would be well-served thinking more about what to focus on

Henrik Karlsson5 Jan 2024 21:01 UTC

96 points

8 comments11 min readLW link

(www.henrikkarlsson.xyz)

The Next ChatGPT Moment: AI Avatars

kolmplex and southpaw

5 Jan 2024 20:14 UTC

43 points

10 comments1 min readLW link

AI Impacts 2023 Expert Survey on Progress in AI

habryka5 Jan 2024 19:42 UTC

28 points

2 comments7 min readLW link

(wiki.aiimpacts.org)

Technology path dependence and evaluating expertise

bhauth and Muireall

5 Jan 2024 19:21 UTC

24 points

2 comments15 min readLW link

The Hippie Rabbit Hole -Nuggets of Gold in Rivers of Bullshit

Jonathan Moregård5 Jan 2024 18:27 UTC

38 points

20 comments8 min readLW link

(honestliving.substack.com)

[Question] What technical topics could help with boundaries/membranes?

Chipmonk5 Jan 2024 18:14 UTC

15 points

25 comments1 min readLW link

Catching AIs red-handed

ryan_greenblatt and Buck

5 Jan 2024 17:43 UTC

110 points

27 comments17 min readLW link

AI Impacts Survey: December 2023 Edition

Zvi5 Jan 2024 14:40 UTC

34 points

6 comments10 min readLW link

(thezvi.wordpress.com)

Forecast your 2024 with Fatebook

Sage Future5 Jan 2024 14:07 UTC

19 points

0 comments1 min readLW link

(fatebook.io)

Predictive model agents are sort of corrigible

Raymond D5 Jan 2024 14:05 UTC

35 points

6 comments3 min readLW link

Striking Implications for Learning Theory, Interpretability — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC

37 points

4 comments2 min readLW link

If I ran the zoo

Optimization Process5 Jan 2024 5:14 UTC

18 points

0 comments2 min readLW link

Does AI care about reality or just its own perception?

RedFishBlueFish5 Jan 2024 4:05 UTC

−6 points

8 comments1 min readLW link

MIRI 2024 Mission and Strategy Update

Malo5 Jan 2024 0:20 UTC

222 points

44 comments8 min readLW link

Project ideas: Governance during explosive technological growth

Lukas Finnveden4 Jan 2024 23:51 UTC

14 points

0 comments1 min readLW link

(lukasfinnveden.substack.com)

Hello

S Benfield4 Jan 2024 23:35 UTC

6 points

0 comments2 min readLW link

Using Threats to Achieve Socially Optimal Outcomes

StrivingForLegibility4 Jan 2024 23:30 UTC

8 points

0 comments3 min readLW link