Ten Levels of AI Align­ment Difficulty

Sammy Martin3 Jul 2023 20:20 UTC
123 points
16 comments12 min readLW link1 review

Se­cu­rity, Cryp­tograhy AI Work­shop in SF

Allison Duettmann3 Jul 2023 19:01 UTC
7 points
0 comments1 min readLW link

[Question] What in your opinion is the biggest open prob­lem in AI al­ign­ment?

tailcalled3 Jul 2023 16:34 UTC
39 points
35 comments1 min readLW link

A Sub­tle Selec­tion Effect in Over­con­fi­dence Studies

Kevin Dorst3 Jul 2023 14:43 UTC
24 points
0 comments6 min readLW link
(kevindorst.substack.com)

Monthly Roundup #8: July 2023

Zvi3 Jul 2023 13:20 UTC
40 points
4 comments46 min readLW link
(thezvi.wordpress.com)

Com­plex Signs Bad

Evenstar3 Jul 2023 13:09 UTC
5 points
2 comments3 min readLW link

6/​23

Celer3 Jul 2023 6:30 UTC
8 points
0 comments10 min readLW link
(keller.substack.com)

Marginal charity

Pat Myron3 Jul 2023 2:13 UTC
3 points
1 comment1 min readLW link

My Cen­tral Align­ment Pri­or­ity (2 July 2023)

Nicholas / Heather Kross3 Jul 2023 1:46 UTC
12 points
1 comment3 min readLW link

My Align­ment Timeline

Nicholas / Heather Kross3 Jul 2023 1:04 UTC
22 points
0 comments2 min readLW link

Dou­glas Hofs­tadter changes his mind on Deep Learn­ing & AI risk (June 2023)?

gwern3 Jul 2023 0:48 UTC
425 points
54 comments7 min readLW link
(www.youtube.com)

Frames in context

Richard_Ngo3 Jul 2023 0:38 UTC
39 points
9 comments6 min readLW link

Meta-ra­tio­nal­ity and frames

Richard_Ngo3 Jul 2023 0:33 UTC
64 points
2 comments5 min readLW link

VC The­ory Overview

Joar Skalse2 Jul 2023 22:45 UTC
11 points
2 comments11 min readLW link

Sources of ev­i­dence in Alignment

Martín Soto2 Jul 2023 20:38 UTC
20 points
0 comments11 min readLW link

Quan­ti­ta­tive cruxes in Alignment

Martín Soto2 Jul 2023 20:38 UTC
19 points
0 comments23 min readLW link

Go­ing Crazy and Get­ting Bet­ter Again

Evenstar2 Jul 2023 18:55 UTC
139 points
13 comments7 min readLW link1 review

Shall We Throw A Huge Party Be­fore AGI Bids Us Adieu?

GeorgeMan2 Jul 2023 17:56 UTC
−1 points
6 comments1 min readLW link

Why it’s so hard to talk about Consciousness

Rafael Harth2 Jul 2023 15:56 UTC
131 points
159 comments9 min readLW link1 review

How Smart Are Hu­mans?

Joar Skalse2 Jul 2023 15:46 UTC
9 points
19 comments2 min readLW link

Through a panel, darkly: a case study in in­ter­net BS detection

jchan2 Jul 2023 13:40 UTC
22 points
7 comments3 min readLW link

LLMs, Batches, and Emer­gent Epi­sodic Memory

Lao Mein2 Jul 2023 7:55 UTC
5 points
4 comments1 min readLW link

Nega­tivity en­hances positivity

Adam Zerner2 Jul 2023 2:47 UTC
12 points
7 comments2 min readLW link

faster la­tent diffusion

bhauth2 Jul 2023 1:30 UTC
10 points
8 comments2 min readLW link
(www.bhauth.com)

Us­ing (Un­in­ter­pretable) LLMs to Gen­er­ate In­ter­pretable AI Code

Joar Skalse2 Jul 2023 1:01 UTC
13 points
12 comments3 min readLW link

Grant ap­pli­ca­tions and grand narratives

Elizabeth2 Jul 2023 0:16 UTC
191 points
22 comments6 min readLW link

An In­tro­duc­tion, an Overview of my per­sonal re­sources, and how one might make use of them

ProofBySonnet1 Jul 2023 21:00 UTC
4 points
6 comments3 min readLW link

My “2.9 trauma limit”

Raemon1 Jul 2023 19:32 UTC
193 points
31 comments7 min readLW link

Alpha

Erich_Grunewald1 Jul 2023 16:05 UTC
65 points
2 comments14 min readLW link
(www.erichgrunewald.com)

Fo­rum Karma: view stats and find highly-rated com­ments for any LW user

Max H1 Jul 2023 15:36 UTC
60 points
16 comments2 min readLW link
(forumkarma.com)

[ASoT] GPT2 Steer­ing & The Tuned Lens

Ulisse Mini1 Jul 2023 14:12 UTC
23 points
0 comments2 min readLW link

[Linkpost] A shared lin­guis­tic space for trans­mit­ting our thoughts from brain to brain in nat­u­ral conversations

Bogdan Ionut Cirstea1 Jul 2023 13:57 UTC
17 points
2 comments1 min readLW link

Ele­ments of Com­pu­ta­tional Philos­o­phy, Vol. I: Truth

1 Jul 2023 11:44 UTC
12 points
6 comments1 min readLW link
(compphil.github.io)

Micro Habits that Im­prove One’s Day

silentbob1 Jul 2023 10:53 UTC
62 points
9 comments5 min readLW link

Ate­liers: But what is an Ate­lier?

Stephen Fowler1 Jul 2023 5:57 UTC
4 points
2 comments10 min readLW link

Pre­dict­ing: Quick Start

duck_master1 Jul 2023 3:43 UTC
9 points
3 comments14 min readLW link

EA/​LW/​SSC Ar­gentina Group!

daviddelauba1 Jul 2023 2:47 UTC
1 point
0 comments1 min readLW link

De­s­pe­dida a Pablo Stafforini

daviddelauba1 Jul 2023 2:44 UTC
1 point
0 comments1 min readLW link

Hori­zon­tal and Ver­ti­cal Integration

Jeffrey Heninger1 Jul 2023 1:15 UTC
17 points
1 comment2 min readLW link

In­flec­tion AI an­nounces $1.3 billion of fund­ing led by cur­rent in­vestors, Microsoft, and NVIDIA

SandXbox30 Jun 2023 21:32 UTC
7 points
0 comments1 min readLW link
(inflection.ai)

Introduction

30 Jun 2023 20:45 UTC
7 points
0 comments2 min readLW link

In­her­ently In­ter­pretable Architectures

30 Jun 2023 20:43 UTC
4 points
0 comments7 min readLW link

Pos­i­tive Attractors

30 Jun 2023 20:43 UTC
6 points
0 comments13 min readLW link

Agency from a causal perspective

30 Jun 2023 17:37 UTC
39 points
5 comments6 min readLW link

Lit­tle at­ten­tion seems to be on dis­cour­ag­ing hard­ware progress

RussellThor30 Jun 2023 10:14 UTC
5 points
3 comments1 min readLW link

In­tro­duc­ing EffiS­ciences’ AI Safety Unit

30 Jun 2023 7:44 UTC
68 points
0 comments12 min readLW link

Con­tra An­ton 🏴‍☠️ on Kol­mogorov com­plex­ity and re­cur­sive self improvement

DaemonicSigil30 Jun 2023 5:15 UTC
25 points
12 comments2 min readLW link

Foom Liability

PeterMcCluskey30 Jun 2023 3:55 UTC
21 points
10 comments6 min readLW link
(bayesianinvestor.com)

I Think Eliezer Should Go on Glenn Beck

Lao Mein30 Jun 2023 3:12 UTC
29 points
21 comments1 min readLW link

Ben­gio’s FAQ on Catas­trophic AI Risks

Vaniver29 Jun 2023 23:04 UTC
39 points
0 comments1 min readLW link
(yoshuabengio.org)