19 Dec 2023 23:39 UTC

40 points

30 comments25 min readLW link

[Question] What are the best Siderea posts?

mike_hawke19 Dec 2023 23:07 UTC

17 points

2 comments1 min readLW link

Meaning & Agency

abramdemski19 Dec 2023 22:27 UTC

91 points

17 comments14 min readLW link

s/acc: Safe Accelerationism Manifesto

lorepieri19 Dec 2023 22:19 UTC

−4 points

5 comments2 min readLW link

(lorenzopieri.com)

Don’t Share Information Exfohazardous on Others’ AI-Risk Models

Thane Ruthenis19 Dec 2023 20:09 UTC

67 points

11 comments1 min readLW link

Paper: Tell, Don’t Show- Declarative facts influence how LLMs generalize

Owain_Evans and AlexMeinke

19 Dec 2023 19:14 UTC

45 points

4 comments6 min readLW link

(arxiv.org)

Interview: Applications w/ Alice Rigg

jacobhaimes19 Dec 2023 19:03 UTC

12 points

0 comments1 min readLW link

(into-ai-safety.github.io)

How does a toy 2 digit subtraction transformer predict the sign of the output?

Evan Anders19 Dec 2023 18:56 UTC

14 points

0 comments8 min readLW link

(evanhanders.blog)

Incremental AI Risks from Proxy-Simulations

kmenou19 Dec 2023 18:56 UTC

2 points

0 comments1 min readLW link

(individual.utoronto.ca)

A proposition for the modification of our epistemology

JacobBowden19 Dec 2023 18:55 UTC

−4 points

2 comments4 min readLW link

Goal-Completeness is like Turing-Completeness for AGI

Liron19 Dec 2023 18:12 UTC

50 points

26 comments3 min readLW link

SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research

Roman Leventov19 Dec 2023 16:49 UTC

17 points

5 comments3 min readLW link

Chording “The Next Right Thing”

jefftk19 Dec 2023 15:40 UTC

11 points

0 comments2 min readLW link

(www.jefftk.com)

Monthly Roundup #13: December 2023

Zvi19 Dec 2023 15:10 UTC

32 points

5 comments26 min readLW link

(thezvi.wordpress.com)

Effective Aspersions: How the Nonlinear Investigation Went Wrong

TracingWoodgrains19 Dec 2023 12:00 UTC

175 points

170 comments1 min readLW link

A Universal Emergent Decomposition of Retrieval Tasks in Language Models

Alexandre Variengien and Eric Winsor

19 Dec 2023 11:52 UTC

84 points

3 comments10 min readLW link

(arxiv.org)

Assessment of AI safety agendas: think about the downside risk

Roman Leventov19 Dec 2023 9:00 UTC

13 points

1 comment1 min readLW link

Constellations are Younger than Continents

Jeffrey Heninger19 Dec 2023 6:12 UTC

260 points

22 comments2 min readLW link

The Dark Arts

lsusr and Lyrongolem

19 Dec 2023 4:41 UTC

135 points

49 comments9 min readLW link

When scientists consider whether their research will end the world

Harlan19 Dec 2023 3:47 UTC

30 points

4 comments11 min readLW link

(blog.aiimpacts.org)

Is the far future inevitably zero sum?

Srdjan Miletic19 Dec 2023 1:45 UTC

8 points

2 comments2 min readLW link

(dissent.blog)

The ‘Neglected Approaches’ Approach: AE Studio’s Alignment Agenda

Cameron Berg, Judd Rosenblatt, AE Studio and Marc Carauleanu

18 Dec 2023 20:35 UTC

166 points

21 comments12 min readLW link

The Shortest Path Between Scylla and Charybdis

Thane Ruthenis18 Dec 2023 20:08 UTC

50 points

8 comments5 min readLW link

OpenAI: Preparedness framework

Zach Stein-Perlman18 Dec 2023 18:30 UTC

70 points

23 comments4 min readLW link

(openai.com)

[Valence series] 5. “Valence Disorders” in Mental Health & Personality

Steven Byrnes18 Dec 2023 15:26 UTC

42 points

12 comments13 min readLW link

Discussion: Challenges with Unsupervised LLM Knowledge Discovery

Seb Farquhar, Vikrant Varma, zac_kenton, gasteigerjo, Vlad Mikulik and Rohin Shah

18 Dec 2023 11:58 UTC

147 points

21 comments10 min readLW link

Interpreting the Learning of Deceit

RogerDearnaley18 Dec 2023 8:12 UTC

30 points

14 comments9 min readLW link

Talk: “AI Would Be A Lot Less Alarming If We Understood Agents”

johnswentworth17 Dec 2023 23:46 UTC

58 points

3 comments1 min readLW link

(www.youtube.com)

∀: a story

Richard_Ngo17 Dec 2023 22:42 UTC

37 points

1 comment8 min readLW link

(www.narrativeark.xyz)

Reviving a 2015 MacBook

jefftk17 Dec 2023 21:00 UTC

11 points

0 comments1 min readLW link

(www.jefftk.com)

A Common-Sense Case For Mutually-Misaligned AGIs Allying Against Humans

Thane Ruthenis17 Dec 2023 20:28 UTC

29 points

7 comments11 min readLW link

OpenAI, DeepMind, Anthropic, etc. should shut down.

Tamsin Leake17 Dec 2023 20:01 UTC

32 points

48 comments3 min readLW link

(carado.moe)

The Limits of Artificial Consciousness: A Biology-Based Critique of Chalmers’ Fading Qualia Argument

Štěpán Los17 Dec 2023 19:11 UTC

−6 points

9 comments17 min readLW link

What makes teaching math special

Viliam17 Dec 2023 14:15 UTC

41 points

27 comments11 min readLW link

The predictive power of dissipative adaptation

dr_s17 Dec 2023 14:01 UTC

46 points

14 comments19 min readLW link

Linkpost: Francesca v Harvard

Linch17 Dec 2023 6:18 UTC

5 points

5 comments2 min readLW link

(www.francesca-v-harvard.org)

Lessons from massaging myself, others, dogs, and cats

Chipmonk17 Dec 2023 4:28 UTC

2 points

27 comments5 min readLW link

(chipmonk.blog)

The Serendipity of Density

jefftk17 Dec 2023 3:50 UTC

40 points

4 comments1 min readLW link

(www.jefftk.com)

Bounty: Diverse hard tasks for LLM agents

Beth Barnes and Megan Kinniment

17 Dec 2023 1:04 UTC

49 points

31 comments16 min readLW link

2022 (and All Time) Posts by Pingback Count

Raemon16 Dec 2023 21:17 UTC

53 points

14 comments6 min readLW link

“Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity

Thane Ruthenis16 Dec 2023 20:08 UTC

180 points

34 comments5 min readLW link

Alignment work in anomalous worlds

Tamsin Leake16 Dec 2023 19:34 UTC

24 points

4 comments3 min readLW link

(carado.moe)

A visual analogy for text generation by LLMs?

Bill Benzon16 Dec 2023 17:58 UTC

3 points

0 comments1 min readLW link

Upgrading the AI Safety Community

trevor and Nicholas / Heather Kross

16 Dec 2023 15:34 UTC

42 points

9 comments42 min readLW link

cold aluminum for medicine

bhauth16 Dec 2023 14:38 UTC

42 points

4 comments4 min readLW link

(www.bhauth.com)

Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem

Ansh Radhakrishnan, Buck, ryan_greenblatt and Fabien Roger

16 Dec 2023 5:49 UTC

73 points

3 comments6 min readLW link

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

leogao16 Dec 2023 5:39 UTC

54 points

5 comments1 min readLW link

Pope Francis shares thoughts on responsible AI development

corruptedCatapillar16 Dec 2023 3:49 UTC

15 points

4 comments1 min readLW link

(www.vatican.va)

Current AIs Provide Nearly No Data Relevant to AGI Alignment

Thane Ruthenis15 Dec 2023 20:16 UTC

114 points

155 comments8 min readLW link

Agglomeration of ‘Ought’

DavidAndresBloom15 Dec 2023 19:07 UTC

1 point

1 comment11 min readLW link