What should we cen­sor from train­ing data?

wassname22 Apr 2023 23:33 UTC
6 points
3 comments1 min readLW link

Ar­chi­tec­ture-aware op­ti­mi­sa­tion: train ImageNet and more with­out hyperparameters

Chris Mingard22 Apr 2023 21:50 UTC
6 points
2 comments2 min readLW link

OpenAI’s GPT-4 Safety Goals

PeterMcCluskey22 Apr 2023 19:11 UTC
3 points
3 comments4 min readLW link
(bayesianinvestor.com)

In­tro­duc­ing the Nuts and Bolts Of Naturalism

LoganStrohl22 Apr 2023 18:31 UTC
76 points
2 comments3 min readLW link

We Need To Know About Con­tinual Learning

michael_mjd22 Apr 2023 17:08 UTC
29 points
14 comments4 min readLW link

The Se­cu­rity Mind­set, S-Risk and Pub­lish­ing Pro­saic Align­ment Research

lukemarks22 Apr 2023 14:36 UTC
39 points
7 comments5 min readLW link

[Question] How did LW up­date p(doom) af­ter LLMs blew up?

FinalFormal222 Apr 2023 14:21 UTC
24 points
29 comments1 min readLW link

The Cruel Trade-Off Between AI Mi­suse and AI X-risk Concerns

simeon_c22 Apr 2023 13:49 UTC
24 points
1 comment2 min readLW link

five ways to say “Al­most Always” and ac­tu­ally mean it

Yudhister Kumar22 Apr 2023 10:38 UTC
17 points
3 comments2 min readLW link
(www.ykumar.org)

P(doom|su­per­in­tel­li­gence) or coin tosses and dice throws of hu­man val­ues (and other re­lated Ps).

Muyyd22 Apr 2023 10:06 UTC
−7 points
0 comments4 min readLW link

[Question] Is it al­lowed to post job post­ings here? I am look­ing for a new PhD stu­dent to work on AI In­ter­pretabil­ity. Can I ad­ver­tise my po­si­tion?

Tiberius22 Apr 2023 1:22 UTC
5 points
4 comments1 min readLW link

LessWrong mod­er­a­tion mes­sag­ing container

Raemon22 Apr 2023 1:19 UTC
21 points
13 comments1 min readLW link

Neu­ral net­work poly­topes (Co­lab note­book)

Zach Furman21 Apr 2023 22:42 UTC
11 points
0 comments1 min readLW link
(colab.research.google.com)

Read­abil­ity is mostly a waste of characters

vlad.proex21 Apr 2023 22:05 UTC
21 points
7 comments3 min readLW link

The Re­la­tion­ship be­tween RLHF and AI Psy­chol­ogy: De­bunk­ing the Shog­goth Argument

FinalFormal221 Apr 2023 22:05 UTC
−12 points
8 comments2 min readLW link

Think­ing about max­i­miza­tion and corrigibility

James Payor21 Apr 2023 21:22 UTC
63 points
4 comments5 min readLW link

Would we even want AI to solve all our prob­lems?

So8res21 Apr 2023 18:04 UTC
97 points
15 comments2 min readLW link

The Com­mis­sion for Stop­ping Fur­ther Im­prove­ments: A let­ter of note from Isam­bard K. Brunel

jasoncrawford21 Apr 2023 17:42 UTC
39 points
0 comments4 min readLW link
(rootsofprogress.org)

Should we pub­lish mechanis­tic in­ter­pretabil­ity re­search?

21 Apr 2023 16:19 UTC
105 points
40 comments13 min readLW link

500 Million, But Not A Sin­gle One More—The Animation

Writer21 Apr 2023 15:48 UTC
47 points
0 comments1 min readLW link

Talk­ing pub­li­cly about AI risk

Jan_Kulveit21 Apr 2023 11:28 UTC
180 points
9 comments6 min readLW link

Notes on “the hot mess the­ory of AI mis­al­ign­ment”

JakubK21 Apr 2023 10:07 UTC
13 points
0 comments5 min readLW link
(sohl-dickstein.github.io)

Req­ui­site Variety

Stephen Fowler21 Apr 2023 8:07 UTC
6 points
0 comments5 min readLW link

The Agency Overhang

Jeffrey Ladish21 Apr 2023 7:47 UTC
85 points
6 comments6 min readLW link

[Question] What would “The Med­i­cal Model Is Wrong” look like?

Elo21 Apr 2023 1:46 UTC
8 points
7 comments2 min readLW link

Gas and Water

jefftk21 Apr 2023 1:30 UTC
17 points
9 comments1 min readLW link
(www.jefftk.com)

[Question] Did the fonts change?

the gears to ascension21 Apr 2023 0:40 UTC
2 points
1 comment1 min readLW link

[Question] Should we openly talk about ex­plicit use cases for Au­toGPT?

ChristianKl20 Apr 2023 23:44 UTC
20 points
4 comments1 min readLW link

United We Align: Har­ness­ing Col­lec­tive Hu­man In­tel­li­gence for AI Align­ment Progress

Shoshannah Tekofsky20 Apr 2023 23:19 UTC
41 points
13 comments25 min readLW link

[Question] Where to start with statis­tics if I want to mea­sure things?

matto20 Apr 2023 22:40 UTC
21 points
7 comments1 min readLW link

Up­skil­ling, bridge-build­ing, re­search on se­cu­rity/​cryp­tog­ra­phy and AI safety

Allison Duettmann20 Apr 2023 22:32 UTC
14 points
0 comments4 min readLW link

Be­havi­oural statis­tics for a maze-solv­ing agent

20 Apr 2023 22:26 UTC
46 points
11 comments10 min readLW link

An in­tro­duc­tion to lan­guage model interpretability

Alexandre Variengien20 Apr 2023 22:22 UTC
14 points
0 comments9 min readLW link

The Case for Brain-Only Preservation

Mati_Roy20 Apr 2023 22:01 UTC
21 points
7 comments1 min readLW link
(biostasis.substack.com)

[Question] Prac­ti­cal ways to ac­tu­al­ize our be­liefs into con­crete bets over a longer time hori­zon?

M. Y. Zuo20 Apr 2023 21:21 UTC
4 points
2 comments1 min readLW link

LW mod­er­a­tion: my cur­rent thoughts and ques­tions, 2023-04-12

Ruby20 Apr 2023 21:02 UTC
53 points
30 comments10 min readLW link

Pro­posal: Us­ing Monte Carlo tree search in­stead of RLHF for al­ign­ment research

Christopher King20 Apr 2023 19:57 UTC
2 points
7 comments3 min readLW link

Deep­Mind and Google Brain are merg­ing [Linkpost]

Akash20 Apr 2023 18:47 UTC
55 points
5 comments1 min readLW link
(www.deepmind.com)

Ideas for stud­ies on AGI risk

dr_s20 Apr 2023 18:17 UTC
5 points
1 comment11 min readLW link

Study 1b: This One Weird Trick does NOT cause in­cor­rect­ness cascades

Robert_AIZI20 Apr 2023 18:10 UTC
5 points
0 comments6 min readLW link
(aizi.substack.com)

An open let­ter to SERI MATS pro­gram organisers

Roman Leventov20 Apr 2023 16:34 UTC
26 points
26 comments4 min readLW link

De­cep­tion Strategies

Thoth Hermes20 Apr 2023 15:59 UTC
−7 points
2 comments5 min readLW link
(thothhermes.substack.com)

Paper­clip Club (AI Safety Meetup)

LThorburn20 Apr 2023 15:55 UTC
1 point
0 comments1 min readLW link

AI #8: Peo­ple Can Do Rea­son­able Things

Zvi20 Apr 2023 15:50 UTC
100 points
16 comments55 min readLW link
(thezvi.wordpress.com)

OpenAI could help X-risk by wa­ger­ing itself

VojtaKovarik20 Apr 2023 14:51 UTC
31 points
16 comments1 min readLW link

Ja­pan AI Align­ment Con­fer­ence Postmortem

20 Apr 2023 10:58 UTC
71 points
8 comments8 min readLW link

Sta­bil­ity AI re­leases StableLM, an open-source ChatGPT counterpart

Ozyrus20 Apr 2023 6:04 UTC
11 points
3 comments1 min readLW link
(github.com)

The Quan­tum Wave Func­tion is Re­lated to a Philos­o­phy Concept

Richard Aragon20 Apr 2023 3:16 UTC
−11 points
3 comments6 min readLW link

A poem writ­ten by a fancy autocomplete

Christopher King20 Apr 2023 2:31 UTC
1 point
0 comments1 min readLW link

List of com­monly used bench­marks for LLMs

Diziet20 Apr 2023 2:25 UTC
8 points
0 comments1 min readLW link