All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 20232024

AllJanFeb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151617 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Goals selected from learned knowledge: an alternative to RL alignment

Seth Herd15 Jan 2024 21:52 UTC

42 points

18 comments7 min readLW link

Introducing REBUS: A Robust Evaluation Benchmark of Understanding Symbols

Arjun Panickssery and agg

15 Jan 2024 21:21 UTC

33 points

0 comments1 min readLW link

Live Sound: Big-O Improvements

jefftk15 Jan 2024 19:50 UTC

8 points

0 comments1 min readLW link

(www.jefftk.com)

Investigating Bias Representations in LLMs via Activation Steering

DawnLu15 Jan 2024 19:39 UTC

29 points

4 comments5 min readLW link

Sparse MLP Distillation

slavachalnev15 Jan 2024 19:39 UTC

30 points

3 comments6 min readLW link

Review of Alignment Plan Critiques- December AI-Plans Critique-a-Thon Results

Iknownothing15 Jan 2024 19:37 UTC

24 points

0 comments25 min readLW link

(aiplans.substack.com)

[Question] What does it look like for AI to significantly improve human coordination, before superintelligence?

jacobjacob15 Jan 2024 19:22 UTC

22 points

2 comments1 min readLW link

Now Accepting Player Applications for Band of Blades

Joe Rogero15 Jan 2024 17:58 UTC

2 points

0 comments3 min readLW link

Three Types of Constraints in the Space of Agents

Nora_Ammann and Mateusz Bagiński

15 Jan 2024 17:27 UTC

26 points

3 comments17 min readLW link

The case for training frontier AIs on Sumerian-only corpus

Alexandre Variengien, Charbel-Raphaël and Jonathan Claybrough

15 Jan 2024 16:40 UTC

130 points

15 comments3 min readLW link

How to Promote More Productive Dialogue Outside of LessWrong

sweenesm15 Jan 2024 14:16 UTC

16 points

4 comments2 min readLW link

[Question] Come and daydream with me about science reform

TeaTieAndHat15 Jan 2024 11:09 UTC

9 points

1 comment1 min readLW link

AI doing philosophy = AI generating hands?

Wei Dai15 Jan 2024 9:04 UTC

46 points

22 comments1 min readLW link

Even if we lose, we win

Morphism15 Jan 2024 2:15 UTC

24 points

17 comments4 min readLW link

Detachment vs attachment [AI risk and mental health]

Neil 15 Jan 2024 0:41 UTC

14 points

4 comments3 min readLW link

Making up statistics to establish priority on Land Value Tax vs Earned Income Tax Credit vs Social Media Dynamic Regulation

Canucklug14 Jan 2024 23:57 UTC

−5 points

2 comments7 min readLW link

Is the universe all there is? ‘Evidence’ for objects outside the universe...

JonathanHall14 Jan 2024 23:56 UTC

−4 points

27 comments11 min readLW link

[Question] What is the minimum amount of time travel and resources needed to secure the future?

Perhaps14 Jan 2024 22:01 UTC

−3 points

5 comments1 min readLW link

Gothenburg LW / ACX meetup

Stefan14 Jan 2024 21:21 UTC

1 point

0 comments1 min readLW link

Gothenburg LW / ACX meetup

Stefan14 Jan 2024 21:20 UTC

1 point

1 comment1 min readLW link

D&D.Sci Hypersphere Analysis Part 2: Nonlinear Effects & Interactions

aphyer14 Jan 2024 19:59 UTC

24 points

0 comments7 min readLW link

Gender Exploration

sapphire14 Jan 2024 18:57 UTC

113 points

25 comments5 min readLW link

(open.substack.com)

List of projects that seem impactful for AI Governance

JaimeRV and Teun van der Weij

14 Jan 2024 16:53 UTC

14 points

0 comments13 min readLW link

The Leeroy Jenkins principle: How faulty AI could guarantee “warning shots”

titotal14 Jan 2024 15:03 UTC

46 points

6 comments1 min readLW link

(titotal.substack.com)

Notice When People Are Directionally Correct

Chris_Leong14 Jan 2024 14:12 UTC

129 points

8 comments2 min readLW link

Corrosive Mnemonics

Epirito14 Jan 2024 12:44 UTC

7 points

0 comments2 min readLW link

Against most, but not all, AI risk analogies

Matthew Barnett14 Jan 2024 3:36 UTC

63 points

41 comments7 min readLW link

Vote With Your Face

jefftk14 Jan 2024 3:30 UTC

11 points

0 comments1 min readLW link

(www.jefftk.com)

Case Studies in Reverse-Engineering Sparse Autoencoder Features by Using MLP Linearization

Jacob Dunefsky, Philippe Chlenski, Senthooran Rajamanoharan and Neel Nanda

14 Jan 2024 2:06 UTC

23 points

0 comments42 min readLW link

D&D.Sci Hypersphere Analysis Part 1: Datafields & Preliminary Analysis

aphyer13 Jan 2024 20:16 UTC

29 points

1 comment5 min readLW link

Some additional SAE thoughts

Hoagy13 Jan 2024 19:31 UTC

30 points

4 comments13 min readLW link

(4 min read) An intuitive explanation of the AI influence situation

trevor13 Jan 2024 17:34 UTC

12 points

26 comments4 min readLW link

AI #47: Meet the New Year

Zvi13 Jan 2024 16:20 UTC

36 points

7 comments57 min readLW link

(thezvi.wordpress.com)

Takeaways from the NeurIPS 2023 Trojan Detection Competition

mikes13 Jan 2024 12:35 UTC

20 points

2 comments1 min readLW link

(confirmlabs.org)

[Question] Why do so many think deception in AI is important?

Prometheus13 Jan 2024 8:14 UTC

23 points

12 comments1 min readLW link

Eliminating Cookie Banners is Hard

jefftk13 Jan 2024 3:00 UTC

23 points

15 comments3 min readLW link

(www.jefftk.com)

Introducing Alignment Stress-Testing at Anthropic

evhub12 Jan 2024 23:51 UTC

182 points

23 comments2 min readLW link

D&D.Sci(-fi): Colonizing the SuperHyperSphere

abstractapplic12 Jan 2024 23:36 UTC

48 points

23 comments2 min readLW link

Commonwealth Fusion Systems is the Same Scale as OpenAI

Jeffrey Heninger12 Jan 2024 21:43 UTC

22 points

13 comments2 min readLW link

Throughput vs. Latency

alkjash and Ruby

12 Jan 2024 21:37 UTC

29 points

2 comments13 min readLW link

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

evhub, Carson Denison, Meg, Monte M, David Duvenaud, Nicholas Schiefer and Ethan Perez

12 Jan 2024 19:51 UTC

305 points

95 comments3 min readLW link

(arxiv.org)

METAPHILOSOPHY—A Philosophizing through logical consequences

Seremonia12 Jan 2024 18:47 UTC

−7 points

7 comments1 min readLW link

Idealism, Realistic & Pragmatic

Seremonia12 Jan 2024 18:16 UTC

−7 points

3 comments1 min readLW link

The existential threat of humans.

Spiritus Dei12 Jan 2024 17:50 UTC

−24 points

0 comments3 min readLW link

[Question] Concrete examples of doing agentic things?

Jacob G-W12 Jan 2024 15:59 UTC

13 points

10 comments1 min readLW link

Land Reclamation is in the 9th Circle of Stagnation Hell

Maxwell Tabarrok12 Jan 2024 13:36 UTC

54 points

6 comments2 min readLW link

(maximumprogress.substack.com)

What good is G-factor if you’re dumped in the woods? A field report from a camp counselor.

Hastings12 Jan 2024 13:17 UTC

137 points

22 comments1 min readLW link

A Chinese Room Containing a Stack of Stochastic Parrots

RogerDearnaley12 Jan 2024 6:29 UTC

20 points

3 comments5 min readLW link

Decent plan prize announcement (1 paragraph, $1k)

lemonhope12 Jan 2024 6:27 UTC

25 points

19 comments1 min readLW link

introduction to solid oxide electrolytes

bhauth12 Jan 2024 5:35 UTC

17 points

0 comments4 min readLW link

(www.bhauth.com)