All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 242526 27 28 29 30 31

align your latent spaces

bhauth24 Dec 2023 16:30 UTC

27 points

8 comments2 min readLW link

(www.bhauth.com)

Viral Guessing Game

jefftk24 Dec 2023 13:10 UTC

19 points

0 comments1 min readLW link

(www.jefftk.com)

The Sugar Alignment Problem

Adam Zerner24 Dec 2023 1:35 UTC

5 points

3 comments7 min readLW link

A Crisper Explanation of Simulacrum Levels

Thane Ruthenis23 Dec 2023 22:13 UTC

89 points

13 comments13 min readLW link

Hyperbolic Discounting and Pascal’s Mugging

Andrew Keenan Richardson23 Dec 2023 21:55 UTC

9 points

0 comments7 min readLW link

AISN #28: Center for AI Safety 2023 Year in Review

aogara and Dan H

23 Dec 2023 21:31 UTC

30 points

1 comment5 min readLW link

(newsletter.safe.ai)

“Inftoxicity” and other new words to describe malicious information and communication thereof

Jáchym Fibír23 Dec 2023 18:15 UTC

−1 points

6 comments3 min readLW link

AI’s impact on biology research: Part I, today

octopocta23 Dec 2023 16:29 UTC

31 points

6 comments2 min readLW link

AI Girlfriends Won’t Matter Much

Maxwell Tabarrok23 Dec 2023 15:58 UTC

42 points

22 comments2 min readLW link

(maximumprogress.substack.com)

The Next Right Token

jefftk23 Dec 2023 3:20 UTC

14 points

0 comments1 min readLW link

(www.jefftk.com)

Fact Finding: Do Early Layers Specialise in Local Processing? (Post 5)

Neel Nanda, Senthooran Rajamanoharan, János Kramár and Rohin Shah

23 Dec 2023 2:46 UTC

18 points

0 comments4 min readLW link

Fact Finding: How to Think About Interpreting Memorisation (Post 4)

Senthooran Rajamanoharan, Neel Nanda, János Kramár and Rohin Shah

23 Dec 2023 2:46 UTC

22 points

0 comments9 min readLW link

Fact Finding: Trying to Mechanistically Understanding Early MLPs (Post 3)

Neel Nanda, Senthooran Rajamanoharan, János Kramár and Rohin Shah

23 Dec 2023 2:46 UTC

10 points

0 comments16 min readLW link

Fact Finding: Simplifying the Circuit (Post 2)

Senthooran Rajamanoharan, Neel Nanda, János Kramár and Rohin Shah

23 Dec 2023 2:45 UTC

25 points

3 comments14 min readLW link

Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)

Neel Nanda, Senthooran Rajamanoharan, János Kramár and Rohin Shah

23 Dec 2023 2:44 UTC

108 points

9 comments22 min readLW link 1 review

Measurement tampering detection as a special case of weak-to-strong generalization

ryan_greenblatt, Fabien Roger and Buck

23 Dec 2023 0:05 UTC

57 points

10 comments4 min readLW link

How does a toy 2 digit subtraction transformer predict the difference?

Evan Anders22 Dec 2023 21:17 UTC

12 points

0 comments10 min readLW link

(evanhanders.blog)

Thoughts on Max Tegmark’s AI verification

Johannes C. Mayer22 Dec 2023 20:38 UTC

10 points

0 comments3 min readLW link

Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)

Thane Ruthenis22 Dec 2023 20:19 UTC

74 points

14 comments6 min readLW link

AI safety advocates should consider providing gentle pushback following the events at OpenAI

civilsociety22 Dec 2023 18:55 UTC

16 points

5 comments3 min readLW link

“Destroy humanity” as an immediate subgoal

Seth Ahrenbach22 Dec 2023 18:52 UTC

3 points

13 comments3 min readLW link

Synthetic Restrictions

nano_brasca22 Dec 2023 18:50 UTC

10 points

0 comments4 min readLW link

Review Report of Davidson on Takeoff Speeds (2023)

Trent Kannegieter22 Dec 2023 18:48 UTC

37 points

11 comments38 min readLW link

The problems with the concept of an infohazard as used by the LW community [Linkpost]

Noosphere8922 Dec 2023 16:13 UTC

75 points

43 comments3 min readLW link

(www.beren.io)

Employee Incentives Make AGI Lab Pauses More Costly

Nikola Jurkovic22 Dec 2023 5:04 UTC

28 points

12 comments3 min readLW link

The LessWrong 2022 Review: Review Phase

RobertM22 Dec 2023 3:23 UTC

58 points

7 comments2 min readLW link

The absence of self-rejection is self-acceptance

Chipmonk21 Dec 2023 21:54 UTC

24 points

1 comment1 min readLW link

(chipmonk.substack.com)

A Decision Theory Can Be Rational or Computable, but Not Both

StrivingForLegibility21 Dec 2023 21:02 UTC

9 points

4 comments1 min readLW link

Most People Don’t Realize We Have No Idea How Our AIs Work

Thane Ruthenis21 Dec 2023 20:02 UTC

159 points

42 comments1 min readLW link

Pseudonymity and Accusations

jefftk21 Dec 2023 19:20 UTC

52 points

20 comments3 min readLW link

(www.jefftk.com)

Attention on AI X-Risk Likely Hasn’t Distracted from Current Harms from AI

Erich_Grunewald21 Dec 2023 17:24 UTC

26 points

2 comments17 min readLW link

(www.erichgrunewald.com)

“Alignment” is one of six words of the year in the Harvard Gazette

Nikola Jurkovic21 Dec 2023 15:54 UTC

14 points

1 comment1 min readLW link

(news.harvard.edu)

AI #43: Functional Discoveries

Zvi21 Dec 2023 15:50 UTC

52 points

26 comments49 min readLW link

(thezvi.wordpress.com)

Rating my AI Predictions

Robert_AIZI21 Dec 2023 14:07 UTC

22 points

5 comments2 min readLW link

(aizi.substack.com)

AI Safety Chatbot

markov and Robert Miles

21 Dec 2023 14:06 UTC

61 points

11 comments4 min readLW link

On OpenAI’s Preparedness Framework

Zvi21 Dec 2023 14:00 UTC

51 points

4 comments21 min readLW link

(thezvi.wordpress.com)

Prediction Markets aren’t Magic

SimonM21 Dec 2023 12:54 UTC

90 points

29 comments3 min readLW link

[Question] Why is capnometry biofeedback not more widely known?

riceissa21 Dec 2023 2:42 UTC

20 points

22 comments4 min readLW link

My best guess at the important tricks for training 1L SAEs

Arthur Conmy21 Dec 2023 1:59 UTC

37 points

4 comments3 min readLW link

Seattle Winter Solstice

a7x20 Dec 2023 20:30 UTC

6 points

1 comment1 min readLW link

How Would an Utopia-Maximizer Look Like?

Thane Ruthenis20 Dec 2023 20:01 UTC

31 points

23 comments10 min readLW link

Succession

Richard_Ngo20 Dec 2023 19:25 UTC

159 points

48 comments11 min readLW link

(www.narrativeark.xyz)

Metaculus Introduces Multiple Choice Questions

ChristianWilliams20 Dec 2023 19:00 UTC

4 points

0 comments1 min readLW link

(www.metaculus.com)

Brighter Than Today Versions

jefftk20 Dec 2023 18:20 UTC

16 points

2 comments2 min readLW link

(www.jefftk.com)

Gaia Network: a practical, incremental pathway to Open Agency Architecture

Roman Leventov and Rafael Kaufmann Nedal

20 Dec 2023 17:11 UTC

22 points

8 comments16 min readLW link

On the future of language models

owencb20 Dec 2023 16:58 UTC

105 points

17 comments1 min readLW link

[Valence series] Appendix A: Hedonic tone / (dis)pleasure / (dis)liking

Steven Byrnes20 Dec 2023 15:54 UTC

18 points

0 comments13 min readLW link

Matrix completion prize results

paulfchristiano20 Dec 2023 15:40 UTC

41 points

0 comments2 min readLW link

(www.alignment.org)

[Question] What’s the minimal additive constant for Kolmogorov Complexity that a programming language can achieve?

Noosphere8920 Dec 2023 15:36 UTC

11 points

15 comments1 min readLW link

Legalize butanol?

bhauth20 Dec 2023 14:24 UTC

39 points

20 comments5 min readLW link

(www.bhauth.com)