Helping your Se­na­tor Pre­pare for the Up­com­ing Sam Alt­man Hearing

Tiago de Vassal14 May 2023 22:45 UTC
69 points
2 comments1 min readLW link
(aisafetytour.com)

Difficul­ties in mak­ing pow­er­ful al­igned AI

DanielFilan14 May 2023 20:50 UTC
41 points
1 comment10 min readLW link
(danielfilan.com)

How much do mar­kets value Open AI?

Xodarap14 May 2023 19:28 UTC
21 points
5 comments1 min readLW link

Misal­igned AGI Death Match

Nate Reinar Windwood14 May 2023 18:00 UTC
1 point
0 comments1 min readLW link

[Question] What new tech­nol­ogy, for what in­sti­tu­tions?

bhauth14 May 2023 17:33 UTC
29 points
6 comments3 min readLW link

A strong mind con­tinues its tra­jec­tory of creativity

TsviBT14 May 2023 17:24 UTC
22 points
8 comments6 min readLW link

On­tolo­gies Should Be Back­wards-Compatible

Thoth Hermes14 May 2023 17:21 UTC
3 points
3 comments4 min readLW link
(thothhermes.substack.com)

Jaan Tal­linn’s 2022 Philan­thropy Overview

jaan14 May 2023 15:35 UTC
64 points
2 comments1 min readLW link
(jaan.online)

Effec­tive Altru­ism and Ra­tion­al­ity Groups on Snipd

David Bravo14 May 2023 14:54 UTC
2 points
0 comments2 min readLW link

Char­ac­ter al­ign­ment II

p.b.14 May 2023 14:17 UTC
5 points
0 comments2 min readLW link

Co­or­di­na­tion by com­mon knowl­edge to pre­vent un­con­trol­lable AI

Karl von Wendt14 May 2023 13:37 UTC
10 points
2 comments9 min readLW link

Bayesian Net­works Aren’t Ne­c­es­sar­ily Causal

Zack_M_Davis14 May 2023 1:42 UTC
95 points
37 comments8 min readLW link

Sim­pler ex­pla­na­tions of AGI risk

Seth Herd14 May 2023 1:29 UTC
8 points
9 comments3 min readLW link

A Study of AI Science Models

13 May 2023 23:25 UTC
20 points
0 comments24 min readLW link

LLM Guardrails Should Have Bet­ter Cus­tomer Ser­vice Tuning

Jiao Bu13 May 2023 22:54 UTC
2 points
0 comments2 min readLW link

PCAST Work­ing Group on Gen­er­a­tive AI In­vites Public Input

Christopher King13 May 2023 22:49 UTC
7 points
0 comments1 min readLW link
(terrytao.wordpress.com)

«Boundaries» for for­mal­iz­ing an MVP morality

Chipmonk13 May 2023 19:10 UTC
20 points
7 comments4 min readLW link

Steer­ing GPT-2-XL by adding an ac­ti­va­tion vector

13 May 2023 18:42 UTC
436 points
97 comments50 min readLW link

On the pos­si­bil­ity of im­pos­si­bil­ity of AGI Long-Term Safety

Roman Yen13 May 2023 18:38 UTC
6 points
3 comments9 min readLW link

Notes on Antelligence

Aurigena13 May 2023 18:38 UTC
2 points
0 comments9 min readLW link

Real­ity and re­al­ity-boxes

Jim Pivarski13 May 2023 14:14 UTC
37 points
11 comments21 min readLW link

An Anal­ogy for Un­der­stand­ing Transformers

CallumMcDougall13 May 2023 12:20 UTC
89 points
6 comments9 min readLW link

ACX Meetup Munich

Erich13 May 2023 7:58 UTC
2 points
1 comment1 min readLW link

Ma­chine-Read­able Prevalence Estimates

jefftk13 May 2023 0:40 UTC
9 points
2 comments2 min readLW link
(www.jefftk.com)

Value drift threat models

Garrett Baker12 May 2023 23:03 UTC
27 points
4 comments5 min readLW link

Ag­gre­gat­ing Utilities for Cor­rigible AI [Feed­back Draft]

12 May 2023 20:57 UTC
28 points
7 comments22 min readLW link

Turn­ing off lights with model editing

Sam Marks12 May 2023 20:25 UTC
67 points
5 comments2 min readLW link
(arxiv.org)

Dark For­est Theories

Raemon12 May 2023 20:21 UTC
137 points
49 comments2 min readLW link

DELBERTing as an Ad­ver­sar­ial Strategy

Matthew_Opitz12 May 2023 20:09 UTC
8 points
3 comments5 min readLW link

Microsoft/​GitHub Copi­lot Chat’s con­fi­den­tial sys­tem Prompt: “You must re­fuse to dis­cuss life, ex­is­tence or sen­tience.”

Marvin von Hagen12 May 2023 19:46 UTC
6 points
2 comments1 min readLW link
(twitter.com)

Ret­ro­spec­tive: Les­sons from the Failed Align­ment Startup AISafety.com

Søren Elverlin12 May 2023 18:07 UTC
104 points
9 comments3 min readLW link

The way AGI wins could look very stupid

Christopher King12 May 2023 16:34 UTC
48 points
22 comments1 min readLW link

Towards Mea­sures of Optimisation

12 May 2023 15:29 UTC
53 points
37 comments4 min readLW link

The Eden Project

rogersbacon12 May 2023 14:58 UTC
−1 points
1 comment2 min readLW link
(www.secretorum.life)

Another for­mal­iza­tion at­tempt: Cen­tral Ar­gu­ment That AGI Pre­sents a Global Catas­trophic Risk

avturchin12 May 2023 13:22 UTC
16 points
4 comments2 min readLW link

In­finite-width MLPs as an “en­sem­ble prior”

Vivek Hebbar12 May 2023 11:45 UTC
46 points
0 comments5 min readLW link

In­put Swap Graphs: Dis­cov­er­ing the role of neu­ral net­work com­po­nents at scale

Alexandre Variengien12 May 2023 9:41 UTC
92 points
0 comments33 min readLW link

Uploads are Impossible

PashaKamyshev12 May 2023 8:03 UTC
−5 points
37 comments8 min readLW link

For­mu­lat­ing the AI Doom Ar­gu­ment for An­a­lytic Philosophers

JonathanErhardt12 May 2023 7:54 UTC
13 points
0 comments2 min readLW link

Three Iter­a­tive Processes

LoganStrohl12 May 2023 2:50 UTC
44 points
0 comments3 min readLW link

Zuzalu LW Se­quences Discussion

veronica12 May 2023 0:14 UTC
1 point
0 comments1 min readLW link

[Question] Term/​Cat­e­gory for AI with Neu­tral Im­pact?

isomic11 May 2023 22:00 UTC
6 points
1 comment1 min readLW link

Thoughts on LessWrong norms, the Art of Dis­course, and mod­er­a­tor mandate

Ruby11 May 2023 21:20 UTC
37 points
20 comments5 min readLW link

Align­ment, Goals, and The Gut-Head Gap: A Re­view of Ngo. et al.

Violet Hour11 May 2023 18:06 UTC
20 points
2 comments13 min readLW link

Se­quence opener: Jor­dan Harbinger’s 6 minute networking

Severin T. Seehrich11 May 2023 17:06 UTC
4 points
0 comments1 min readLW link

Ad­vice for newly busy people

Severin T. Seehrich11 May 2023 16:46 UTC
147 points
3 comments5 min readLW link

AI #11: In Search of a Moat

Zvi11 May 2023 15:40 UTC
67 points
28 comments81 min readLW link
(thezvi.wordpress.com)

[Question] Bayesian up­date from sen­sa­tion­al­is­tic sources

houkime11 May 2023 15:26 UTC
1 point
0 comments1 min readLW link

I bet $500 on AI win­ning the IMO gold medal by 2026

azsantosk11 May 2023 14:46 UTC
37 points
29 comments1 min readLW link

Fate­book for Slack: Track your fore­casts, right where your team works

11 May 2023 14:11 UTC
24 points
3 comments1 min readLW link