Bias-Aug­mented Con­sis­tency Train­ing Re­duces Bi­ased Rea­son­ing in Chain-of-Thought

Miles Turpin11 Mar 2024 23:46 UTC
16 points
0 comments1 min readLW link
(arxiv.org)

AI Safety Ac­tion Plan—A re­port com­mis­sioned by the US State Department

agucova11 Mar 2024 22:14 UTC
22 points
1 comment1 min readLW link
(www.gladstone.ai)

A dis­cus­sion of AI risk and the cost/​benefit calcu­la­tion of stop­ping or paus­ing AI development

DuncanFowler11 Mar 2024 21:41 UTC
1 point
0 comments1 min readLW link

Among the A.I. Doom­say­ers—The New Yorker

agucova11 Mar 2024 21:35 UTC
12 points
1 comment1 min readLW link
(www.newyorker.com)

Be More Katja

Nathan Young11 Mar 2024 21:12 UTC
53 points
0 comments3 min readLW link

AI In­ci­dent Re­port­ing: A Reg­u­la­tory Review

11 Mar 2024 21:03 UTC
16 points
0 comments6 min readLW link

Re­sults from an Ad­ver­sar­ial Col­lab­o­ra­tion on AI Risk (FRI)

11 Mar 2024 20:00 UTC
60 points
3 comments9 min readLW link
(forecastingresearch.org)

The Astro­nom­i­cal Sacri­fice Dilemma

Matthew McRedmond11 Mar 2024 19:58 UTC
15 points
3 comments4 min readLW link

Epiphe­nom­e­nal­ism leads to elimi­na­tivism about qualia

Clément L11 Mar 2024 19:53 UTC
4 points
0 comments7 min readLW link

The Best Es­say (Paul Gra­ham)

Chris_Leong11 Mar 2024 19:25 UTC
25 points
2 comments1 min readLW link
(paulgraham.com)

Open Thread Spring 2024

habryka11 Mar 2024 19:17 UTC
22 points
160 comments1 min readLW link

New so­cial credit formalizations

KatjaGrace11 Mar 2024 19:00 UTC
23 points
3 comments2 min readLW link
(worldspiritsockpuppet.com)

How dis­agree­ments about Ev­i­den­tial Cor­re­la­tions could be settled

Martín Soto11 Mar 2024 18:28 UTC
11 points
3 comments4 min readLW link

“Ar­tifi­cial Gen­eral In­tel­li­gence”: an ex­tremely brief FAQ

Steven Byrnes11 Mar 2024 17:49 UTC
73 points
6 comments2 min readLW link

Some (prob­le­matic) aes­thet­ics of what con­sti­tutes good work in academia

Steven Byrnes11 Mar 2024 17:47 UTC
147 points
12 comments12 min readLW link

Storable Votes with a Pay as you win mechanism: a con­tri­bu­tion for in­sti­tu­tional design

Arturo Macias11 Mar 2024 15:58 UTC
17 points
19 comments2 min readLW link

Tend to your clar­ity, not your confusion

Severin T. Seehrich11 Mar 2024 15:09 UTC
23 points
1 comment6 min readLW link

[Question] What do we know about the AI knowl­edge and views, es­pe­cially about ex­is­ten­tial risk, of the new OpenAI board mem­bers?

Zvi11 Mar 2024 14:55 UTC
60 points
2 comments2 min readLW link

“How could I have thought that faster?”

mesaoptimizer11 Mar 2024 10:56 UTC
222 points
32 comments2 min readLW link
(twitter.com)

Sim­ple ver­sus Short: Higher-or­der de­gen­er­acy and er­ror-correction

Daniel Murfet11 Mar 2024 7:52 UTC
113 points
6 comments13 min readLW link

De­con­struct­ing Bostrom’s Clas­sic Ar­gu­ment for AI Doom

Nora Belrose11 Mar 2024 5:58 UTC
16 points
14 comments1 min readLW link
(www.youtube.com)

Ad­vice Needed: Does Us­ing a LLM Com­pomise My Per­sonal Epistemic Se­cu­rity?

Naomi11 Mar 2024 5:57 UTC
17 points
7 comments2 min readLW link

Some Thoughts on Con­cept For­ma­tion and Use in Agents

CatGoddess11 Mar 2024 5:03 UTC
12 points
0 comments8 min readLW link

Steel­man­ning as an es­pe­cially in­sidious form of strawmanning

Cornelius Dybdahl11 Mar 2024 2:25 UTC
10 points
13 comments5 min readLW link

One-shot strat­egy games?

Raemon11 Mar 2024 0:19 UTC
41 points
42 comments1 min readLW link

Un­der­stand­ing SAE Fea­tures with the Logit Lens

11 Mar 2024 0:16 UTC
66 points
0 comments14 min readLW link

Re­plac­ing the Water Heater’s Anode

jefftk11 Mar 2024 0:00 UTC
22 points
0 comments2 min readLW link
(www.jefftk.com)

Briefly Ex­tend­ing Differ­en­tial Op­ti­miza­tion to Distributions

J Bostock10 Mar 2024 20:41 UTC
4 points
0 comments2 min readLW link

Evolu­tion did a sur­pris­ing good job at al­ign­ing hu­mans...to so­cial status

Eli Tyre10 Mar 2024 19:34 UTC
24 points
37 comments1 min readLW link

Paus­ing AI is Pos­i­tive Ex­pected Value

Liron10 Mar 2024 17:10 UTC
8 points
2 comments3 min readLW link
(twitter.com)

W2SG: Introduction

Maria Kapros10 Mar 2024 16:25 UTC
1 point
2 comments10 min readLW link

An Op­ti­mistic Solu­tion to the Fermi Paradox

Glenn Clayton10 Mar 2024 14:39 UTC
4 points
6 comments13 min readLW link

Coun­ter­fac­tual Civ­i­liza­tion Si­mu­la­tion Ver­sion −1.0 aka my ap­pli­ca­tion to Jo­hannes Mayer’s SPAR project

Morphism10 Mar 2024 10:10 UTC
1 point
0 comments14 min readLW link

Notes from a Prompt Factory

Richard_Ngo10 Mar 2024 5:13 UTC
101 points
19 comments9 min readLW link
(www.narrativeark.xyz)

In­ves­ti­gat­ing Basin Vol­ume with XOR Networks

CatGoddess10 Mar 2024 1:35 UTC
10 points
0 comments5 min readLW link

[Linkpost] MindEye2: Shared-Sub­ject Models En­able fMRI-To-Image With 1 Hour of Data

Bogdan Ionut Cirstea10 Mar 2024 1:30 UTC
10 points
0 comments1 min readLW link
(openreview.net)

0th Per­son and 1st Per­son Logic

Adele Lopez10 Mar 2024 0:56 UTC
60 points
28 comments6 min readLW link

Com­ple­tion Estimates

scarcegreengrass9 Mar 2024 22:56 UTC
7 points
2 comments3 min readLW link

Semi-Sim­pli­cial Types, Part I: Mo­ti­va­tion and History

astradiol9 Mar 2024 22:07 UTC
20 points
3 comments10 min readLW link

Distinc­tions when Dis­cussing Utility Functions

ozziegooen9 Mar 2024 20:14 UTC
24 points
7 comments1 min readLW link

What is progress?

jasoncrawford9 Mar 2024 16:28 UTC
10 points
4 comments6 min readLW link
(rootsofprogress.org)

Fif­teen Law­suits against OpenAI

Remmelt9 Mar 2024 12:22 UTC
27 points
4 comments1 min readLW link

Cam­bridge ACX/​SSC monthly meetup (lo­ca­tion changed to Fort St Ge­orge!)

hamishtodd19 Mar 2024 11:10 UTC
2 points
0 comments1 min readLW link

MA E-ZPass Without a Car?

jefftk9 Mar 2024 2:40 UTC
15 points
1 comment1 min readLW link
(www.jefftk.com)

Close­ness To the Is­sue (Part 5 of “The Sense Of Phys­i­cal Ne­ces­sity”)

LoganStrohl9 Mar 2024 0:36 UTC
36 points
0 comments15 min readLW link

Ex­plor­ing the Evolu­tion and Mi­gra­tion of Differ­ent Layer Embed­ding in LLMs

Ruixuan Huang8 Mar 2024 15:01 UTC
6 points
0 comments8 min readLW link

[Question] When and why did ‘train­ing’ be­come ‘pre­train­ing’?

beren8 Mar 2024 14:29 UTC
16 points
6 comments1 min readLW link

A T-o-M test: ‘pop­corn’ or ‘choco­late’

MiguelDev8 Mar 2024 4:24 UTC
20 points
13 comments1 min readLW link

Sce­nario Fore­cast­ing Work­shop: Ma­te­ri­als and Learnings

8 Mar 2024 2:30 UTC
50 points
3 comments2 min readLW link

Fore­cast­ing fu­ture gains due to post-train­ing enhancements

8 Mar 2024 2:11 UTC
31 points
2 comments1 min readLW link
(docs.google.com)