All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 101112 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Deeply Cover Car Crashes?

jefftk10 Dec 2023 22:20 UTC

36 points

32 comments1 min readLW link

(www.jefftk.com)

Principles For Product Liability (With Application To AI)

johnswentworth10 Dec 2023 21:27 UTC

37 points

55 comments10 min readLW link

[Question] What do you do to remember and reference the LessWrong posts that were most personally significant to you, in terms of intellectual development or general usefulness?

lillybaeum10 Dec 2023 17:52 UTC

5 points

7 comments1 min readLW link

[Question] Do websites and apps actually generally get worse after updates, or is it just an effect of the fear of change?

lillybaeum10 Dec 2023 17:26 UTC

36 points

35 comments2 min readLW link 1 review

How LDT helps reduce the AI arms race

Tamsin Leake10 Dec 2023 16:21 UTC

65 points

13 comments4 min readLW link

(carado.moe)

Understanding Subjective Probabilities

Isaac King10 Dec 2023 6:03 UTC

30 points

16 comments10 min readLW link

Send us example gnarly bugs

Beth Barnes, Megan Kinniment and Tao Lin

10 Dec 2023 5:23 UTC

77 points

10 comments2 min readLW link

Conceptual coherence for concrete categories in humans and LLMs

Bill Benzon9 Dec 2023 23:49 UTC

13 points

1 comment2 min readLW link

2d ai-partners as a comprehensive motivation tool

AiresJL9 Dec 2023 21:59 UTC

3 points

0 comments1 min readLW link

Without—MicroFiction 250 words

Carissa Cassiel9 Dec 2023 21:49 UTC

20 points

1 comment1 min readLW link

Some negative steganography results

Fabien Roger9 Dec 2023 20:22 UTC

59 points

5 comments2 min readLW link

Summing up “Scheming AIs” (Section 5)

Joe Carlsmith9 Dec 2023 15:48 UTC

2 points

1 comment11 min readLW link

The Offense-Defense Balance Rarely Changes

Maxwell Tabarrok9 Dec 2023 15:21 UTC

75 points

23 comments3 min readLW link

(maximumprogress.substack.com)

A Philosophical Tautology

Nox ML9 Dec 2023 14:06 UTC

−2 points

45 comments2 min readLW link

Unpicking Extinction

ukc100149 Dec 2023 9:15 UTC

35 points

10 comments10 min readLW link

Finding Sparse Linear Connections between Features in LLMs

Logan Riggs, Sam Mitchell and Adam Kaufman

9 Dec 2023 2:27 UTC

69 points

5 comments10 min readLW link

[Question] Option Space Nomenclature

SilverFlame8 Dec 2023 23:14 UTC

1 point

0 comments1 min readLW link

“Model UN Solutions”

Arjun Panickssery8 Dec 2023 23:06 UTC

36 points

5 comments1 min readLW link

(open.substack.com)

Speed arguments against scheming (Section 4.4-4.7 of “Scheming AIs”)

Joe Carlsmith8 Dec 2023 21:09 UTC

9 points

0 comments15 min readLW link

Modeling incentives at scale using LLMs

Bruno Marnette, pzahn and cmck

8 Dec 2023 18:46 UTC

7 points

3 comments13 min readLW link

Refusal mechanisms: initial experiments with Llama-2-7b-chat

Andy Arditi and Oscar Obeso

8 Dec 2023 17:08 UTC

81 points

7 comments7 min readLW link

Colour versus Shape Goal Misgeneralization in Reinforcement Learning: A Case Study

Karolis Jucys8 Dec 2023 13:18 UTC

13 points

1 comment4 min readLW link

(arxiv.org)

What I Would Do If I Were Working On AI Governance

johnswentworth8 Dec 2023 6:43 UTC

110 points

32 comments10 min readLW link

Whither Prison Abolition?

MadHatter8 Dec 2023 5:27 UTC

−7 points

0 comments16 min readLW link

(bittertruths.substack.com)

Class consciousness for those against the class system

TekhneMakre8 Dec 2023 1:02 UTC

11 points

9 comments1 min readLW link

Building selfless agents to avoid instrumental self-preservation.

blallo7 Dec 2023 18:59 UTC

14 points

2 comments6 min readLW link

Does Chat-GPT display ‘Scope Insensitivity’?

callum7 Dec 2023 18:58 UTC

11 points

0 comments3 min readLW link

LLM keys—A Proposal of a Solution to Prompt Injection Attacks

Peter Hroššo7 Dec 2023 17:36 UTC

1 point

2 comments1 min readLW link

Meetup Tip: Heartbeat Messages

Screwtape7 Dec 2023 17:18 UTC

68 points

4 comments3 min readLW link

[Valence series] 2. Valence & Normativity

Steven Byrnes7 Dec 2023 16:43 UTC

88 points

6 comments28 min readLW link

AISN #27: Defensive Accelerationism, A Retrospective On The OpenAI Board Saga, And A New AI Bill From Senators Thune And Klobuchar

aogara, Dan H, Corin Katzke and allison huang

7 Dec 2023 15:59 UTC

13 points

0 comments6 min readLW link

(newsletter.safe.ai)

AI #41: Bring in the Other Gemini

Zvi7 Dec 2023 15:10 UTC

46 points

16 comments52 min readLW link

(thezvi.wordpress.com)

Simplicity arguments for scheming (Section 4.3 of “Scheming AIs”)

Joe Carlsmith7 Dec 2023 15:05 UTC

10 points

1 comment19 min readLW link

Results from the Turing Seminar hackathon

Charbel-Raphaël, jeanne_ and WCargo

7 Dec 2023 14:50 UTC

29 points

1 comment6 min readLW link

Gemini 1.0

Zvi7 Dec 2023 14:40 UTC

50 points

7 comments9 min readLW link

(thezvi.wordpress.com)

Random Musings on Theory of Impact for Activation Vectors

Chris_Leong7 Dec 2023 13:07 UTC

8 points

0 comments1 min readLW link

[Question] Is AlphaGo actually a consequentialist utility maximizer?

faul_sname7 Dec 2023 12:41 UTC

36 points

8 comments3 min readLW link

(Report) Evaluating Taiwan’s Tactics to Safeguard its Semiconductor Assets Against a Chinese Invasion

Gauraventh7 Dec 2023 11:50 UTC

15 points

5 comments22 min readLW link

(bristolaisafety.org)

Would AIs trapped in the Metaverse pine to enter the real world and would the ramifications cause trouble?

ProfessorFalken7 Dec 2023 10:17 UTC

−2 points

1 comment1 min readLW link

The GiveWiki’s Top Picks in AI Safety for the Giving Season of 2023

Dawn Drescher7 Dec 2023 9:23 UTC

4 points

10 comments1 min readLW link

(impactmarkets.substack.com)

Language Model Memorization, Copyright Law, and Conditional Pretraining Alignment

RogerDearnaley7 Dec 2023 6:14 UTC

9 points

0 comments11 min readLW link

Reflective consistency, randomized decisions, and the dangers of unrealistic thought experiments

Radford Neal7 Dec 2023 3:33 UTC

34 points

25 comments6 min readLW link

[Question] For fun: How long can you hold your breath?

exanova6 Dec 2023 23:36 UTC

1 point

7 comments1 min readLW link

Mathematics As Physics

Nox ML6 Dec 2023 22:27 UTC

−2 points

10 comments5 min readLW link

The counting argument for scheming (Sections 4.1 and 4.2 of “Scheming AIs”)

Joe Carlsmith6 Dec 2023 19:28 UTC

10 points

0 comments10 min readLW link

On Trust

johnswentworth6 Dec 2023 19:19 UTC

44 points

26 comments4 min readLW link

Originality vs. Correctness

alkjash and habryka

6 Dec 2023 18:51 UTC

60 points

17 comments25 min readLW link

Proposal for improving the global online discourse through personalised comment ordering on all websites

Roman Leventov6 Dec 2023 18:51 UTC

35 points

21 comments6 min readLW link

Google Gemini Announced

Jacob G-W6 Dec 2023 16:14 UTC

54 points

22 comments1 min readLW link

(blog.google)

Based Beff Jezos and the Accelerationists

Zvi6 Dec 2023 16:00 UTC

89 points

29 comments12 min readLW link

(thezvi.wordpress.com)