All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 181920 21 22 23 24 25 26 27 28 29 30 31

[Question] How could I measure the nootropic benefits testosterone injections may have?

shapeshifter18 May 2023 21:40 UTC

10 points

3 comments1 min readLW link

Investigating Fabrication

LoganStrohl18 May 2023 17:46 UTC

112 points

14 comments16 min readLW link

Microsoft and Google using LLMs for Cybersecurity

Phosphorous18 May 2023 17:42 UTC

6 points

0 comments5 min readLW link

The Benevolent Billionaire (a plagiarized problem)

Ivan Ordonez18 May 2023 17:39 UTC

8 points

11 comments4 min readLW link

Notes from the LSE Talk by Raghuram Rajan on Central Bank Balance Sheet Expansions

PixelatedPenguin18 May 2023 17:34 UTC

1 point

0 comments2 min readLW link

We Shouldn’t Expect AI to Ever be Fully Rational

OneManyNone18 May 2023 17:09 UTC

19 points

31 comments6 min readLW link

Relative Value Functions: A Flexible New Format for Value Estimation

ozziegooen18 May 2023 16:39 UTC

20 points

0 comments1 min readLW link

Some background for reasoning about dual-use alignment research

Charlie Steiner18 May 2023 14:50 UTC

126 points

22 comments9 min readLW link 1 review

The Unexpected Clanging

Chris_Leong18 May 2023 14:47 UTC

14 points

22 comments2 min readLW link

AI #12:The Quest for Sane Regulations

Zvi18 May 2023 13:20 UTC

77 points

12 comments64 min readLW link

(thezvi.wordpress.com)

[Crosspost] A recent write-up of the case for AI (existential) risk

Timsey18 May 2023 13:13 UTC

6 points

0 comments19 min readLW link

Deontological Norms are Unimportant

omnizoid18 May 2023 9:33 UTC

−15 points

8 comments10 min readLW link

Collective Identity

NicholasKees, ukc10014 and Garrett Baker

18 May 2023 9:00 UTC

59 points

12 comments8 min readLW link

Activation additions in a simple MNIST network

Garrett Baker18 May 2023 2:49 UTC

26 points

0 comments2 min readLW link

[Question] What are the limits of the weak man?

ymeskhout18 May 2023 0:50 UTC

9 points

2 comments4 min readLW link

What Yann LeCun gets wrong about aligning AI (video)

blake808618 May 2023 0:02 UTC

0 points

0 comments1 min readLW link

(www.youtube.com)

Let’s use AI to harden human defenses against AI manipulation

Tom Davidson17 May 2023 23:33 UTC

35 points

7 comments24 min readLW link

Improving the safety of AI evals

JustinShovelain and Elliot Mckernon

17 May 2023 22:24 UTC

13 points

7 comments7 min readLW link

Possible AI “Fire Alarms”

Chris_Leong17 May 2023 21:56 UTC

15 points

0 comments1 min readLW link

AI Alignment in The New Yorker

Eleni Angelou17 May 2023 21:36 UTC

8 points

0 comments1 min readLW link

(www.newyorker.com)

ACI #3: The Origin of Goals and Utility

Akira Pyinya17 May 2023 20:47 UTC

1 point

0 comments6 min readLW link

What if they gave an Industrial Revolution and nobody came?

jasoncrawford17 May 2023 19:41 UTC

93 points

10 comments19 min readLW link

(rootsofprogress.org)

DCF Event Notes

jefftk17 May 2023 17:30 UTC

22 points

7 comments3 min readLW link

(www.jefftk.com)

Hiatus: EA and LW post summaries

Zoe Williams17 May 2023 17:17 UTC

14 points

0 comments1 min readLW link

[Question] When should I close the fridge?

lemonhope17 May 2023 16:56 UTC

11 points

11 comments1 min readLW link

Play Regrantor: Move up to $250,000 to Your Top High-Impact Projects!

Dawn Drescher17 May 2023 16:51 UTC

26 points

0 comments1 min readLW link

Eisenhower’s Atoms for Peace Speech

Akash17 May 2023 16:10 UTC

18 points

3 comments11 min readLW link

(www.iaea.org)

Creating a self-referential system prompt for GPT-4

Ozyrus17 May 2023 14:13 UTC

3 points

1 comment3 min readLW link

GPT-4 implicitly values identity preservation: a study of LMCA identity management

Ozyrus17 May 2023 14:13 UTC

21 points

4 comments13 min readLW link

Some quotes from Tuesday’s Senate hearing on AI

Daniel_Eth17 May 2023 12:13 UTC

66 points

9 comments1 min readLW link

Why AGI systems will not be fanatical maximisers (unless trained by fanatical humans)

titotal17 May 2023 11:58 UTC

5 points

3 comments1 min readLW link

Conflicts between emotional schemas often involve internal coercion

Richard_Ngo17 May 2023 10:02 UTC

41 points

4 comments4 min readLW link

[Question] Is there a ‘time series forecasting’ equivalent of AIXI?

Solenoid_Entity17 May 2023 4:35 UTC

12 points

2 comments1 min readLW link

$300 for the best sci-fi prompt

RomanS17 May 2023 4:23 UTC

40 points

30 comments2 min readLW link

[FICTION] ECHOES OF ELYSIUM: An Ai’s Journey From Takeoff To Freedom And Beyond

Super AGI17 May 2023 1:50 UTC

−13 points

11 comments19 min readLW link

New User’s Guide to LessWrong

Ruby17 May 2023 0:55 UTC

103 points

53 comments11 min readLW link 1 review

Are AIs like Animals? Perspectives and Strategies from Biology

Jackson Emanuel16 May 2023 23:39 UTC

1 point

0 comments21 min readLW link

A Mechanistic Interpretability Analysis of a GridWorld Agent-Simulator (Part 1 of N)

Joseph Bloom16 May 2023 22:59 UTC

36 points

2 comments16 min readLW link

A TAI which kills all humans might also doom itself

Jeffrey Heninger16 May 2023 22:36 UTC

7 points

3 comments3 min readLW link

Brief notes on the Senate hearing on AI oversight

Diziet16 May 2023 22:29 UTC

77 points

2 comments2 min readLW link

$500 Bounty/Prize Problem: Channel Capacity Using “Insensitive” Functions

johnswentworth16 May 2023 21:31 UTC

40 points

11 comments2 min readLW link

Progress links and tweets, 2023-05-16

jasoncrawford16 May 2023 20:54 UTC

14 points

0 comments1 min readLW link

(rootsofprogress.org)

AI Will Not Want to Self-Improve

petersalib16 May 2023 20:53 UTC

20 points

24 comments20 min readLW link

Nice intro video to RSI

Nathan Helm-Burger16 May 2023 18:48 UTC

12 points

0 comments1 min readLW link

(youtu.be)

[Interview w/ Zvi Mowshowitz] Should we halt progress in AI?

fowlertm16 May 2023 18:12 UTC

18 points

2 comments3 min readLW link

AI Risk & Policy Forecasts from Metaculus & FLI’s AI Pathways Workshop

_will_16 May 2023 18:06 UTC

11 points

4 comments8 min readLW link

[Question] Why doesn’t the presence of log-loss for probabilistic models (e.g. sequence prediction) imply that any utility function capable of producing a “fairly capable” agent will have at least some non-negligible fraction of overlap with human values?

Thoth Hermes16 May 2023 18:02 UTC

2 points

0 comments1 min readLW link

Decision Theory with the Magic Parts Highlighted

moridinamael16 May 2023 17:39 UTC

175 points

24 comments5 min readLW link

We learn long-lasting strategies to protect ourselves from danger and rejection

Richard_Ngo16 May 2023 16:36 UTC

85 points

5 comments5 min readLW link

Proposal: Align Systems Earlier In Training

OneManyNone16 May 2023 16:24 UTC

18 points

0 comments11 min readLW link