All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 222324 25 26 27 28 29 30 31

How does a toy 2 digit subtraction transformer predict the difference?

Evan Anders22 Dec 2023 21:17 UTC

12 points

0 comments10 min readLW link

(evanhanders.blog)

Thoughts on Max Tegmark’s AI verification

Johannes C. Mayer22 Dec 2023 20:38 UTC

10 points

0 comments3 min readLW link

Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)

Thane Ruthenis22 Dec 2023 20:19 UTC

74 points

14 comments6 min readLW link

AI safety advocates should consider providing gentle pushback following the events at OpenAI

civilsociety22 Dec 2023 18:55 UTC

16 points

5 comments3 min readLW link

“Destroy humanity” as an immediate subgoal

Seth Ahrenbach22 Dec 2023 18:52 UTC

3 points

13 comments3 min readLW link

Synthetic Restrictions

nano_brasca22 Dec 2023 18:50 UTC

10 points

0 comments4 min readLW link

Review Report of Davidson on Takeoff Speeds (2023)

Trent Kannegieter22 Dec 2023 18:48 UTC

37 points

11 comments38 min readLW link

Open positions: Research Analyst at the AI Standards Lab

Koen.Holtman, Jonathan_H and Ariel G.

22 Dec 2023 16:31 UTC

17 points

0 comments1 min readLW link

The problems with the concept of an infohazard as used by the LW community [Linkpost]

Noosphere8922 Dec 2023 16:13 UTC

75 points

43 comments3 min readLW link

(www.beren.io)

Employee Incentives Make AGI Lab Pauses More Costly

nikola22 Dec 2023 5:04 UTC

28 points

12 comments3 min readLW link

The LessWrong 2022 Review: Review Phase

RobertM22 Dec 2023 3:23 UTC

58 points

7 comments2 min readLW link

The absence of self-rejection is self-acceptance

Chipmonk21 Dec 2023 21:54 UTC

24 points

1 comment1 min readLW link

(chipmonk.substack.com)

A Decision Theory Can Be Rational or Computable, but Not Both

StrivingForLegibility21 Dec 2023 21:02 UTC

9 points

4 comments1 min readLW link

Most People Don’t Realize We Have No Idea How Our AIs Work

Thane Ruthenis21 Dec 2023 20:02 UTC

158 points

42 comments1 min readLW link

Pseudonymity and Accusations

jefftk21 Dec 2023 19:20 UTC

52 points

20 comments3 min readLW link

(www.jefftk.com)

Attention on AI X-Risk Likely Hasn’t Distracted from Current Harms from AI

Erich_Grunewald21 Dec 2023 17:24 UTC

26 points

2 comments17 min readLW link

(www.erichgrunewald.com)

“Alignment” is one of six words of the year in the Harvard Gazette

nikola21 Dec 2023 15:54 UTC

14 points

1 comment1 min readLW link

(news.harvard.edu)

AI #43: Functional Discoveries

Zvi21 Dec 2023 15:50 UTC

52 points

26 comments49 min readLW link

(thezvi.wordpress.com)

Rating my AI Predictions

Robert_AIZI21 Dec 2023 14:07 UTC

22 points

5 comments2 min readLW link

(aizi.substack.com)

AI Safety Chatbot

markov and Robert Miles

21 Dec 2023 14:06 UTC

61 points

11 comments4 min readLW link

On OpenAI’s Preparedness Framework

Zvi21 Dec 2023 14:00 UTC

51 points

4 comments21 min readLW link

(thezvi.wordpress.com)

Prediction Markets aren’t Magic

SimonM21 Dec 2023 12:54 UTC

90 points

29 comments3 min readLW link

[Question] Why is capnometry biofeedback not more widely known?

riceissa21 Dec 2023 2:42 UTC

20 points

22 comments4 min readLW link

My best guess at the important tricks for training 1L SAEs

Arthur Conmy21 Dec 2023 1:59 UTC

37 points

4 comments3 min readLW link

Seattle Winter Solstice

a7x20 Dec 2023 20:30 UTC

6 points

1 comment1 min readLW link

How Would an Utopia-Maximizer Look Like?

Thane Ruthenis20 Dec 2023 20:01 UTC

31 points

23 comments10 min readLW link

Succession

Richard_Ngo20 Dec 2023 19:25 UTC

158 points

48 comments11 min readLW link

(www.narrativeark.xyz)

Metaculus Introduces Multiple Choice Questions

ChristianWilliams20 Dec 2023 19:00 UTC

4 points

0 comments1 min readLW link

(www.metaculus.com)

Brighter Than Today Versions

jefftk20 Dec 2023 18:20 UTC

16 points

2 comments2 min readLW link

(www.jefftk.com)

Gaia Network: a practical, incremental pathway to Open Agency Architecture

Roman Leventov and Rafael Kaufmann Nedal

20 Dec 2023 17:11 UTC

22 points

8 comments16 min readLW link

On the future of language models

owencb20 Dec 2023 16:58 UTC

105 points

17 comments1 min readLW link

[Valence series] Appendix A: Hedonic tone / (dis)pleasure / (dis)liking

Steven Byrnes20 Dec 2023 15:54 UTC

18 points

0 comments13 min readLW link

Matrix completion prize results

paulfchristiano20 Dec 2023 15:40 UTC

41 points

0 comments2 min readLW link

(www.alignment.org)

[Question] What’s the minimal additive constant for Kolmogorov Complexity that a programming language can achieve?

Noosphere8920 Dec 2023 15:36 UTC

11 points

15 comments1 min readLW link

Legalize butanol?

bhauth20 Dec 2023 14:24 UTC

39 points

20 comments5 min readLW link

(www.bhauth.com)

A short dialogue on comparability of values

cousin_it20 Dec 2023 14:08 UTC

27 points

7 comments1 min readLW link

Inside View, Outside View… And Opposing View

chaosmage20 Dec 2023 12:35 UTC

21 points

1 comment5 min readLW link

Heuristics for preventing major life mistakes

SK220 Dec 2023 8:01 UTC

28 points

2 comments3 min readLW link

What should be reified?

herschel20 Dec 2023 4:52 UTC

4 points

2 comments2 min readLW link

(brothernin.substack.com)

(In)appropriate (De)reification

herschel20 Dec 2023 4:51 UTC

10 points

1 comment4 min readLW link

(brothernin.substack.com)

Escaping Skeuomorphism

Stuart Johnson20 Dec 2023 3:51 UTC

28 points

0 comments8 min readLW link

Ronny and Nate discuss what sorts of minds humanity is likely to find by Machine Learning

So8res and Ronny Fernandez

19 Dec 2023 23:39 UTC

40 points

30 comments25 min readLW link

[Question] What are the best Siderea posts?

mike_hawke19 Dec 2023 23:07 UTC

17 points

2 comments1 min readLW link

Meaning & Agency

abramdemski19 Dec 2023 22:27 UTC

91 points

17 comments14 min readLW link

s/acc: Safe Accelerationism Manifesto

lorepieri19 Dec 2023 22:19 UTC

−4 points

5 comments2 min readLW link

(lorenzopieri.com)

Don’t Share Information Exfohazardous on Others’ AI-Risk Models

Thane Ruthenis19 Dec 2023 20:09 UTC

67 points

11 comments1 min readLW link

Paper: Tell, Don’t Show- Declarative facts influence how LLMs generalize

Owain_Evans and AlexMeinke

19 Dec 2023 19:14 UTC

45 points

4 comments6 min readLW link

(arxiv.org)

Interview: Applications w/ Alice Rigg

jacobhaimes19 Dec 2023 19:03 UTC

12 points

0 comments1 min readLW link

(into-ai-safety.github.io)

How does a toy 2 digit subtraction transformer predict the sign of the output?

Evan Anders19 Dec 2023 18:56 UTC

14 points

0 comments8 min readLW link

(evanhanders.blog)

Incremental AI Risks from Proxy-Simulations

kmenou19 Dec 2023 18:56 UTC

2 points

0 comments1 min readLW link

(individual.utoronto.ca)