All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718 19 20 21 22 23 24 25 26 27 28 29 30 31

Are AIs like Animals? Perspectives and Strategies from Biology

Jackson Emanuel16 May 2023 23:39 UTC

1 point

0 comments21 min readLW link

A Mechanistic Interpretability Analysis of a GridWorld Agent-Simulator (Part 1 of N)

Joseph Bloom16 May 2023 22:59 UTC

36 points

2 comments16 min readLW link

A TAI which kills all humans might also doom itself

Jeffrey Heninger16 May 2023 22:36 UTC

7 points

3 comments3 min readLW link

Brief notes on the Senate hearing on AI oversight

Diziet16 May 2023 22:29 UTC

77 points

2 comments2 min readLW link

$500 Bounty/Prize Problem: Channel Capacity Using “Insensitive” Functions

johnswentworth16 May 2023 21:31 UTC

40 points

11 comments2 min readLW link

Progress links and tweets, 2023-05-16

jasoncrawford16 May 2023 20:54 UTC

14 points

0 comments1 min readLW link

(rootsofprogress.org)

AI Will Not Want to Self-Improve

petersalib16 May 2023 20:53 UTC

20 points

24 comments20 min readLW link

Nice intro video to RSI

Nathan Helm-Burger16 May 2023 18:48 UTC

12 points

0 comments1 min readLW link

(youtu.be)

[Interview w/ Zvi Mowshowitz] Should we halt progress in AI?

fowlertm16 May 2023 18:12 UTC

18 points

2 comments3 min readLW link

AI Risk & Policy Forecasts from Metaculus & FLI’s AI Pathways Workshop

_will_16 May 2023 18:06 UTC

11 points

4 comments8 min readLW link

[Question] Why doesn’t the presence of log-loss for probabilistic models (e.g. sequence prediction) imply that any utility function capable of producing a “fairly capable” agent will have at least some non-negligible fraction of overlap with human values?

Thoth Hermes16 May 2023 18:02 UTC

2 points

0 comments1 min readLW link

Decision Theory with the Magic Parts Highlighted

moridinamael16 May 2023 17:39 UTC

175 points

24 comments5 min readLW link

We learn long-lasting strategies to protect ourselves from danger and rejection

Richard_Ngo16 May 2023 16:36 UTC

84 points

5 comments5 min readLW link

Proposal: Align Systems Earlier In Training

OneManyNone16 May 2023 16:24 UTC

18 points

0 comments11 min readLW link

Procedural Executive Function, Part 2

DaystarEld16 May 2023 16:22 UTC

23 points

0 comments18 min readLW link

(daystareld.com)

My current workflow to study the internal mechanisms of LLM

Yulu Pi16 May 2023 15:27 UTC

4 points

0 comments1 min readLW link

Proposal: we should start referring to the risk from unaligned AI as a type of accident risk

Christopher King16 May 2023 15:18 UTC

22 points

6 comments2 min readLW link

AI Safety Newsletter #6: Examples of AI safety progress, Yoshua Bengio proposes a ban on AI agents, and lessons from nuclear arms control

Dan H, Akash and aogara

16 May 2023 15:14 UTC

31 points

0 comments6 min readLW link

(newsletter.safe.ai)

Lazy Baked Mac and Cheese

jefftk16 May 2023 14:40 UTC

18 points

2 comments1 min readLW link

(www.jefftk.com)

Tyler Cowen’s challenge to develop an ‘actual mathematical model’ for AI X-Risk

Joe Brenton16 May 2023 11:57 UTC

6 points

4 comments1 min readLW link

Evaluating Language Model Behaviours for Shutdown Avoidance in Textual Scenarios

Simon Lermen, Teun van der Weij and Leon Lang

16 May 2023 10:53 UTC

26 points

0 comments13 min readLW link

[Review] Two People Smoking Behind the Supermarket

lsusr16 May 2023 7:25 UTC

32 points

1 comment1 min readLW link

Superposition and Dropout

Edoardo Pona16 May 2023 7:24 UTC

21 points

5 comments6 min readLW link

[Question] What is the literature on long term water fasts?

lc16 May 2023 3:23 UTC

16 points

4 comments1 min readLW link

Lessons learned from offering in-office nutritional testing

Elizabeth15 May 2023 23:20 UTC

86 points

11 comments14 min readLW link

(acesounderglass.com)

Judgments often smuggle in implicit standards

Richard_Ngo15 May 2023 18:50 UTC

91 points

4 comments3 min readLW link

Rational retirement plans

Ik15 May 2023 17:49 UTC

5 points

17 comments1 min readLW link

[Question] (Crosspost) Asking for online calls on AI s-risks discussions

jackchang11015 May 2023 17:42 UTC

1 point

0 comments1 min readLW link

(forum.effectivealtruism.org)

Simple experiments with deceptive alignment

Andreas_Moe15 May 2023 17:41 UTC

7 points

0 comments4 min readLW link

Some Summaries of Agent Foundations Work

mattmacdermott15 May 2023 16:09 UTC

62 points

1 comment13 min readLW link

Facebook Increased Visibility

jefftk15 May 2023 15:40 UTC

15 points

1 comment1 min readLW link

(www.jefftk.com)

Un-unpluggability—can’t we just unplug it?

Oliver Sourbut15 May 2023 13:23 UTC

26 points

10 comments12 min readLW link

(www.oliversourbut.net)

[Question] Can we learn much by studying the behaviour of RL policies?

AidanGoth15 May 2023 12:56 UTC

1 point

0 comments1 min readLW link

How I apply (so-called) Non-Violent Communication

Kaj_Sotala15 May 2023 9:56 UTC

84 points

25 comments3 min readLW link

Let’s build a fire alarm for AGI

chaosmage15 May 2023 9:16 UTC

−1 points

0 comments2 min readLW link

From fear to excitement

Richard_Ngo15 May 2023 6:23 UTC

117 points

9 comments3 min readLW link

Reward is the optimization target (of capabilities researchers)

Max H15 May 2023 3:22 UTC

32 points

4 comments5 min readLW link

The Lightcone Theorem: A Better Foundation For Natural Abstraction?

johnswentworth15 May 2023 2:24 UTC

69 points

25 comments6 min readLW link

GovAI: Towards best practices in AGI safety and governance: A survey of expert opinion

Zach Stein-Perlman15 May 2023 1:42 UTC

28 points

11 comments1 min readLW link

(arxiv.org)

[Question] Why don’t quantilizers also cut off the upper end of the distribution?

Alex_Altair15 May 2023 1:40 UTC

25 points

2 comments1 min readLW link

Support Structures for Naturalist Study

LoganStrohl15 May 2023 0:25 UTC

47 points

6 comments10 min readLW link

Catastrophic Regressional Goodhart: Appendix

Thomas Kwa and Drake Thomas

15 May 2023 0:10 UTC

25 points

1 comment9 min readLW link

Helping your Senator Prepare for the Upcoming Sam Altman Hearing

Tiago de Vassal14 May 2023 22:45 UTC

69 points

2 comments1 min readLW link

(aisafetytour.com)

Difficulties in making powerful aligned AI

DanielFilan14 May 2023 20:50 UTC

41 points

1 comment10 min readLW link

(danielfilan.com)

How much do markets value Open AI?

Xodarap14 May 2023 19:28 UTC

21 points

5 comments1 min readLW link

Misaligned AGI Death Match

Nate Reinar Windwood14 May 2023 18:00 UTC

1 point

0 comments1 min readLW link

[Question] What new technology, for what institutions?

bhauth14 May 2023 17:33 UTC

29 points

6 comments3 min readLW link

A strong mind continues its trajectory of creativity

TsviBT14 May 2023 17:24 UTC

22 points

8 comments6 min readLW link

Ontologies Should Be Backwards-Compatible

Thoth Hermes14 May 2023 17:21 UTC

3 points

3 comments4 min readLW link

(thothhermes.substack.com)

Jaan Tallinn’s 2022 Philanthropy Overview

jaan14 May 2023 15:35 UTC

64 points

2 comments1 min readLW link

(jaan.online)