All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718 19 20 21 22 23 24 25 26 27 28 29 30 31

[Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey, Dan Braun and beren

Dec 13, 2022, 3:41 PM

150 points

23 comments22 min readLW link 2 reviews

[Question] Is the ChatGPT-simulated Linux virtual machine real?

KenoubiDec 13, 2022, 3:41 PM

18 points

7 comments1 min readLW link

Existential AI Safety is NOT separate from near-term applications

scasperDec 13, 2022, 2:47 PM

37 points

17 comments3 min readLW link

What is the correlation between upvoting and benefit to readers of LW?

banevDec 13, 2022, 2:26 PM

7 points

15 comments1 min readLW link

Limits of Superintelligence

Aleksei PetrenkoDec 13, 2022, 12:19 PM

1 point

5 comments1 min readLW link

Bay 2022 Solstice

RaemonDec 13, 2022, 8:58 AM

17 points

0 comments1 min readLW link

Last day to nominate things for the Review. Also, 2019 books still exist.

RaemonDec 13, 2022, 8:53 AM

15 points

0 comments1 min readLW link

AI alignment is distinct from its near-term applications

paulfchristianoDec 13, 2022, 7:10 AM

255 points

21 comments2 min readLW link

(ai-alignment.com)

Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.

Charlie SteinerDec 13, 2022, 7:04 AM

37 points

3 comments2 min readLW link

[Question] Are lawsuits against AGI companies extending AGI timelines?

SlowingAGIDec 13, 2022, 6:00 AM

1 point

1 comment1 min readLW link

EA & LW Forums Weekly Summary (5th Dec − 11th Dec 22′)

Zoe WilliamsDec 13, 2022, 2:53 AM

7 points

0 comments LW link

Alignment with argument-networks and assessment-predictions

Tor Økland BarstadDec 13, 2022, 2:17 AM

10 points

5 comments45 min readLW link

Revisiting algorithmic progress

Tamay and Ege Erdil

Dec 13, 2022, 1:39 AM

95 points

15 comments2 min readLW link 1 review

(arxiv.org)

An exploration of GPT-2′s embedding weights

Adam ScherlisDec 13, 2022, 12:46 AM

44 points

4 comments10 min readLW link

12 career-related questions that may (or may not) be helpful for people interested in alignment research

Orpheus16Dec 12, 2022, 10:36 PM

20 points

0 comments2 min readLW link

Concept extrapolation for hypothesis generation

Stuart_Armstrong, Patrick Leask and rgorman

Dec 12, 2022, 10:09 PM

20 points

2 comments3 min readLW link

Let’s go meta: Grammatical knowledge and self-referential sentences [ChatGPT]

Bill BenzonDec 12, 2022, 9:50 PM

5 points

0 comments9 min readLW link

D&D.Sci December 2022 Evaluation and Ruleset

abstractapplicDec 12, 2022, 9:21 PM

17 points

8 comments2 min readLW link

Log-odds are better than Probabilities

Robert_AIZIDec 12, 2022, 8:10 PM

22 points

4 comments4 min readLW link

(aizi.substack.com)

Bengaluru LW/ACX Social Meetup—December 2022

faizDec 12, 2022, 7:30 PM

4 points

0 comments1 min readLW link

Psychological Disorders and Problems

adamShimi and Gabriel Alfour

Dec 12, 2022, 6:15 PM

39 points

6 comments1 min readLW link

Confusing the goal and the path

adamShimiDec 12, 2022, 4:42 PM

44 points

7 comments1 min readLW link

(epistemologicalvigilance.substack.com)

Meaningful things are those the universe possesses a semantics for

Abhimanyu Pallavi SudhirDec 12, 2022, 4:03 PM

16 points

14 comments14 min readLW link

Tradeoffs in complexity, abstraction, and generality

remember and Gabriel Alfour

Dec 12, 2022, 3:55 PM

32 points

0 comments2 min readLW link

Green Line Extension Opening Dates

jefftkDec 12, 2022, 2:40 PM

12 points

0 comments1 min readLW link

(www.jefftk.com)

Join the AI Testing Hackathon this Friday

Esben KranDec 12, 2022, 2:24 PM

10 points

0 comments LW link

Side-channels: input versus output

davidadDec 12, 2022, 12:32 PM

44 points

16 comments2 min readLW link

Take 9: No, RLHF/IDA/debate doesn’t solve outer alignment.

Charlie SteinerDec 12, 2022, 11:51 AM

33 points

13 comments2 min readLW link

Creating a database for base rates

nikosDec 12, 2022, 10:09 AM

2 points

1 comment3 min readLW link

(forum.effectivealtruism.org)

Trivial GPT-3.5 limitation workaround

Dave LindberghDec 12, 2022, 8:42 AM

5 points

4 comments1 min readLW link

Ponzi schemes can be highly profitable if your timing is good

GeneSmithDec 12, 2022, 6:42 AM

10 points

18 comments5 min readLW link

Prodding ChatGPT to solve a basic algebra problem

ShmiDec 12, 2022, 4:09 AM

14 points

6 comments1 min readLW link

(twitter.com)

Wider Default Audio Player in Chrome?

jefftkDec 12, 2022, 3:30 AM

11 points

2 comments1 min readLW link

(www.jefftk.com)

A brainteaser for language models

Adam ScherlisDec 12, 2022, 2:43 AM

47 points

3 comments2 min readLW link

Benchmarks for Comparing Human and AI Intelligence

MrThinkDec 11, 2022, 10:06 PM

9 points

4 comments2 min readLW link

Reflections on the PIBBSS Fellowship 2022

Nora_Ammann and particlemania

Dec 11, 2022, 9:53 PM

32 points

0 comments18 min readLW link

A crisis for online communication: bots and bot users will overrun the Internet?

Mitchell_PorterDec 11, 2022, 9:11 PM

15 points

11 comments1 min readLW link

Finite Factored Sets in Pictures

Magdalena WacheDec 11, 2022, 6:49 PM

174 points

35 comments12 min readLW link

Formalization as suspension of intuition

adamShimiDec 11, 2022, 3:16 PM

54 points

18 comments1 min readLW link

(epistemologicalvigilance.substack.com)

An argument on animal consciousness (soliciting criticism)

SciHamsterDec 11, 2022, 3:12 PM

1 point

2 comments1 min readLW link

ChatGPT’s new novel rationality technique of fact checking

ChristianKlDec 11, 2022, 1:54 PM

−14 points

7 comments1 min readLW link

Reframing inner alignment

davidadDec 11, 2022, 1:53 PM

53 points

13 comments4 min readLW link

A poem about applied rationality by ChatGPT

ChristianKlDec 11, 2022, 1:43 PM

4 points

0 comments1 min readLW link

ChatGPT goes through a wormhole hole in our Shandyesque universe [virtual wacky weed]

Bill BenzonDec 11, 2022, 11:59 AM

−1 points

2 comments3 min readLW link

Using Obsidian if you’re used to using Roam

Solenoid_EntityDec 11, 2022, 8:59 AM

19 points

4 comments2 min readLW link

[fiction] Our Final Hour

Mati_RoyDec 11, 2022, 5:49 AM

23 points

5 comments3 min readLW link

Consider using reversible automata for alignment research

Alex_AltairDec 11, 2022, 1:00 AM

88 points

30 comments2 min readLW link

High level discourse structure in ChatGPT: Part 2 [Quasi-symbolic?]

Bill BenzonDec 10, 2022, 10:26 PM

7 points

0 comments6 min readLW link

Poll Results on AGI

Niclas KupperDec 10, 2022, 9:25 PM

18 points

0 comments2 min readLW link

Reflecting on the 2022 Guild of the Rose Workshops

moridinamaelDec 10, 2022, 9:21 PM

26 points

7 comments8 min readLW link