All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 234 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Quick takes on “AI is easy to control”

So8res2 Dec 2023 22:31 UTC

26 points

49 comments4 min readLW link

The goal-guarding hypothesis (Section 2.3.1.1 of “Scheming AIs”)

Joe Carlsmith2 Dec 2023 15:20 UTC

8 points

1 comment15 min readLW link

The Method of Loci: With some brief remarks, including transformers and evaluating AIs

Bill Benzon2 Dec 2023 14:36 UTC

6 points

0 comments3 min readLW link

Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition

Adrià Moret2 Dec 2023 14:07 UTC

26 points

31 comments42 min readLW link

Out-of-distribution Bioattacks

jefftk2 Dec 2023 12:20 UTC

66 points

15 comments2 min readLW link

(www.jefftk.com)

After Alignment — Dialogue between RogerDearnaley and Seth Herd

RogerDearnaley and Seth Herd

2 Dec 2023 6:03 UTC

15 points

2 comments25 min readLW link

List of strategies for mitigating deceptive alignment

joshc2 Dec 2023 5:56 UTC

36 points

2 comments6 min readLW link

[Question] What is known about invariants in self-modifying systems?

mishka2 Dec 2023 5:04 UTC

9 points

2 comments1 min readLW link

2023 Unofficial LessWrong Census/Survey

Screwtape2 Dec 2023 4:41 UTC

169 points

81 comments1 min readLW link

Protecting against sudden capability jumps during training

Nikola Jurkovic2 Dec 2023 4:22 UTC

15 points

2 comments2 min readLW link

South Bay Pre-Holiday Gathering

IS2 Dec 2023 3:21 UTC

10 points

2 comments1 min readLW link

MATS Summer 2023 Retrospective

utilistrutil, Juan Gil, Ryan Kidd, Christian Smith, McKennaFitzgerald and LauraVaughan

1 Dec 2023 23:29 UTC

77 points

34 comments26 min readLW link

Complex systems research as a field (and its relevance to AI Alignment)

Nora_Ammann and habryka

1 Dec 2023 22:10 UTC

64 points

11 comments19 min readLW link

[Question] Could there be “natural impact regularization” or “impact regularization by default”?

tailcalled1 Dec 2023 22:01 UTC

24 points

6 comments1 min readLW link

Benchmarking Bowtie2 Threading

jefftk1 Dec 2023 20:20 UTC

9 points

0 comments1 min readLW link

(www.jefftk.com)

Please Bet On My Quantified Self Decision Markets

niplav1 Dec 2023 20:07 UTC

36 points

6 comments6 min readLW link

Specification Gaming: How AI Can Turn Your Wishes Against You [RA Video]

Writer1 Dec 2023 19:30 UTC

19 points

0 comments5 min readLW link

(youtu.be)

Carving up problems at their joints

Jakub Smékal1 Dec 2023 18:48 UTC

1 point

0 comments2 min readLW link

(jakubsmekal.com)

Queuing theory: Benefits of operating at 60% capacity

ampdot1 Dec 2023 18:48 UTC

40 points

4 comments1 min readLW link

(less.works)

Researchers and writers can apply for proxy access to the GPT-3.5 base model (code-davinci-002)

ampdot1 Dec 2023 18:48 UTC

14 points

0 comments1 min readLW link

(airtable.com)

Kolmogorov Complexity Lays Bare the Soul

jakej1 Dec 2023 18:29 UTC

5 points

8 comments2 min readLW link

Thoughts on “AI is easy to control” by Pope & Belrose

Steven Byrnes1 Dec 2023 17:30 UTC

197 points

64 comments14 min readLW link 1 review

Why Did NEPA Peak in 2016?

Maxwell Tabarrok1 Dec 2023 16:18 UTC

10 points

0 comments3 min readLW link

(maximumprogress.substack.com)

Worlds where I wouldn’t worry about AI risk

adekcz1 Dec 2023 16:06 UTC

2 points

0 comments4 min readLW link

How useful for alignment-relevant work are AIs with short-term goals? (Section 2.2.4.3 of “Scheming AIs”)

Joe Carlsmith1 Dec 2023 14:51 UTC

10 points

1 comment7 min readLW link

Reality is whatever you can get away with.

sometimesperson1 Dec 2023 7:50 UTC

−5 points

0 comments1 min readLW link

Reinforcement Learning using Layered Morphology (RLLM)

MiguelDev1 Dec 2023 5:18 UTC

7 points

0 comments29 min readLW link

[Question] Is OpenAI losing money on each request?

thenoviceoof1 Dec 2023 3:27 UTC

8 points

8 comments5 min readLW link

How useful is mechanistic interpretability?

ryan_greenblatt, Neel Nanda, Buck and habryka

1 Dec 2023 2:54 UTC

165 points

54 comments25 min readLW link

FixDT

abramdemski30 Nov 2023 21:57 UTC

59 points

15 comments14 min readLW link 1 review

Generalization, from thermodynamics to statistical physics

Jesse Hoogland30 Nov 2023 21:28 UTC

63 points

9 comments28 min readLW link

What’s next for the field of Agent Foundations?

Nora_Ammann, Alexander Gietelink Oldenziel and mattmacdermott

30 Nov 2023 17:55 UTC

59 points

23 comments10 min readLW link

A Proposed Cure for Alzheimer’s Disease???

MadHatter30 Nov 2023 17:37 UTC

4 points

30 comments2 min readLW link

AI #40: A Vision from Vitalik

Zvi30 Nov 2023 17:30 UTC

53 points

12 comments42 min readLW link

(thezvi.wordpress.com)

Is scheming more likely in models trained to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of “Scheming AIs”)

Joe Carlsmith30 Nov 2023 16:43 UTC

8 points

0 comments6 min readLW link

A Formula for Violence (and Its Antidote)

MadHatter30 Nov 2023 16:04 UTC

−22 points

6 comments1 min readLW link

(blog.simpleheart.org)

Enkrateia: a safe model-based reinforcement learning algorithm

MadHatter30 Nov 2023 15:51 UTC

−15 points

4 comments2 min readLW link

(github.com)

Normative Ethics vs Utilitarianism

Logan Zoellner30 Nov 2023 15:36 UTC

6 points

0 comments2 min readLW link

(midwitalignment.substack.com)

Information-Theoretic Boxing of Superintelligences

JustinShovelain and Elliot Mckernon

30 Nov 2023 14:31 UTC

30 points

0 comments7 min readLW link

OpenAI: Altman Returns

Zvi30 Nov 2023 14:10 UTC

66 points

12 comments11 min readLW link

(thezvi.wordpress.com)

[Linkpost] Remarks on the Convergence in Distribution of Random Neural Networks to Gaussian Processes in the Infinite Width Limit

carboniferous_umbraculum 30 Nov 2023 14:01 UTC

9 points

0 comments1 min readLW link

(drive.google.com)

[Question] Buy Nothing Day is a great idea with a terrible app— why has nobody built a killer app for crowdsourced ‘effective communism’ yet?

lillybaeum30 Nov 2023 13:47 UTC

8 points

17 comments1 min readLW link

[Question] Comprehensible Input is the only way people learn languages—is it the only way people learn?

lillybaeum30 Nov 2023 13:31 UTC

8 points

2 comments3 min readLW link

Some Intuitions for the Ethicophysics

MadHatter and mishka

30 Nov 2023 6:47 UTC

2 points

4 comments8 min readLW link

The Alignment Agenda THEY Don’t Want You to Know About

MadHatter30 Nov 2023 4:29 UTC

−18 points

16 comments1 min readLW link

Cis fragility

[deactivated]30 Nov 2023 4:14 UTC

−51 points

9 comments3 min readLW link

Homework Answer: Glicko Ratings for War

MadHatter30 Nov 2023 4:08 UTC

−43 points

1 comment77 min readLW link

(gist.github.com)

[Question] Feature Request for LessWrong

MadHatter30 Nov 2023 3:19 UTC

11 points

8 comments1 min readLW link

My Alignment Research Agenda (“the Ethicophysics”)

MadHatter30 Nov 2023 2:57 UTC

−13 points

0 comments1 min readLW link

[Question] Stupid Question: Why am I getting consistently downvoted?

MadHatter30 Nov 2023 0:21 UTC

28 points

132 comments1 min readLW link