Three con­figurable prettyprinters

philh10 Aug 2023 23:10 UTC
9 points
0 comments22 min readLW link
(reasonableapproximation.net)

Ilya Sutskever’s thoughts on AI safety (July 2023): a tran­script with my comments

mishka10 Aug 2023 19:07 UTC
21 points
3 comments5 min readLW link

Seek­ing In­put to AI Safety Book for non-tech­ni­cal audience

Darren McKee10 Aug 2023 17:58 UTC
10 points
4 comments1 min readLW link

Eval­u­at­ing GPT-4 The­ory of Mind Capabilities

10 Aug 2023 17:57 UTC
15 points
2 comments14 min readLW link

Some al­ign­ment ideas

SelonNerias10 Aug 2023 17:51 UTC
1 point
0 comments11 min readLW link

Self Su­per­vised Learn­ing (SSL)

Varshul Gupta10 Aug 2023 17:43 UTC
5 points
1 comment2 min readLW link
(dubverseblack.substack.com)

Pre­dict­ing Virus Rel­a­tive Abun­dance in Wastewater

jefftk10 Aug 2023 15:46 UTC
33 points
2 comments1 min readLW link
(naobservatory.org)

AI #24: Week of the Podcast

Zvi10 Aug 2023 15:00 UTC
49 points
5 comments44 min readLW link
(thezvi.wordpress.com)

Could We Au­to­mate AI Align­ment Re­search?

Stephen McAleese10 Aug 2023 12:17 UTC
34 points
10 comments21 min readLW link

The po­si­tional em­bed­ding ma­trix and pre­vi­ous-to­ken heads: how do they ac­tu­ally work?

AdamYedidia10 Aug 2023 1:58 UTC
26 points
4 comments13 min readLW link

LLMs are (mostly) not helped by filler tokens

Kshitij Sachan10 Aug 2023 0:48 UTC
66 points
35 comments6 min readLW link

2023 ACX Mee­tups Every­where—New­ton, MA

duck_master9 Aug 2023 22:47 UTC
6 points
2 comments1 min readLW link

Progress links di­gest, 2023-08-09: US adds new nu­clear, Katalin Kar­ikó in­ter­view, and more

jasoncrawford9 Aug 2023 19:22 UTC
18 points
0 comments3 min readLW link
(rootsofprogress.org)

Mech In­terp Challenge: Au­gust—De­ci­pher­ing the First Unique Char­ac­ter Model

CallumMcDougall9 Aug 2023 19:14 UTC
36 points
1 comment3 min readLW link

Real Mean­ing of life has been found. Eliezer dis­cov­ered it in 2000′s.

Jorterder9 Aug 2023 18:13 UTC
−15 points
1 comment1 min readLW link
(docs.google.com)

Marginal Revolu­tion un­offi­cial birth­day party

Derek M. Jones9 Aug 2023 14:35 UTC
4 points
0 comments1 min readLW link

The Case for Convexity

Jesse Richardson9 Aug 2023 14:09 UTC
19 points
3 comments1 min readLW link

A con­tent anal­y­sis of the SQ-R ques­tion­naire and a pro­posal for test­ing EQ-SQ theory

tailcalled9 Aug 2023 13:51 UTC
10 points
2 comments13 min readLW link

[Question] Does LessWrong al­low ex­empt­ing posts from be­ing scraped by GPTBot?

mic9 Aug 2023 13:02 UTC
29 points
3 comments1 min readLW link

If I Was An Ec­cen­tric Trillionaire

niplav9 Aug 2023 7:56 UTC
9 points
8 comments26 min readLW link

Mo­du­lat­ing syco­phancy in an RLHF model via ac­ti­va­tion steering

Nina Panickssery9 Aug 2023 7:06 UTC
69 points
20 comments12 min readLW link

Open Thread—Au­gust 2023

habryka9 Aug 2023 3:52 UTC
18 points
49 comments1 min readLW link

marine cloud brightening

bhauth9 Aug 2023 2:50 UTC
40 points
14 comments3 min readLW link
(www.bhauth.com)

In­flec­tion.ai is a ma­jor AGI lab

Nikola Jurkovic9 Aug 2023 1:05 UTC
137 points
13 comments2 min readLW link

Acausal Now: We could to­tally acausally bar­gain with aliens at our cur­rent tech level if desired

Christopher King9 Aug 2023 0:50 UTC
1 point
5 comments4 min readLW link

Ne­cro­mancy’s un­in­tended con­se­quences.

Christopher King9 Aug 2023 0:08 UTC
−6 points
2 comments2 min readLW link

What’s A “Mar­ket”?

johnswentworth8 Aug 2023 23:29 UTC
94 points
16 comments10 min readLW link

Pod­cast (+tran­script): Nathan Barnard on how US fi­nan­cial reg­u­la­tion can in­form AI governance

Aaron Bergman8 Aug 2023 21:46 UTC
8 points
0 comments1 min readLW link
(www.aaronbergman.net)

What are the flaws in this ar­gu­ment about p(Doom)?

William the Kiwi 8 Aug 2023 20:34 UTC
0 points
25 comments1 min readLW link

A Sim­ple The­ory Of Consciousness

SherlockHolmes8 Aug 2023 18:05 UTC
2 points
5 comments1 min readLW link
(peterholmes.medium.com)

[Linkpost] Ra­tion­ally awake

jpc8 Aug 2023 17:59 UTC
−1 points
0 comments4 min readLW link
(jpc.dev)

Yet more UFO Bet­ting: Put Up or Shut Up

MoreRatsWrongReUAP8 Aug 2023 17:50 UTC
10 points
18 comments1 min readLW link

AISN #18: Challenges of Re­in­force­ment Learn­ing from Hu­man Feed­back, Microsoft’s Se­cu­rity Breach, and Con­cep­tual Re­search on AI Safety

aogara8 Aug 2023 15:52 UTC
13 points
0 comments1 min readLW link
(newsletter.safe.ai)

[Question] Begin­ner’s ques­tion about RLHF

FTPickle8 Aug 2023 15:48 UTC
1 point
3 comments1 min readLW link

My Trial Pe­riod as an In­de­pen­dent Align­ment Researcher

Bart Bussmann8 Aug 2023 14:16 UTC
34 points
1 comment3 min readLW link

4 types of AGI se­lec­tion, and how to con­strain them

Remmelt8 Aug 2023 10:02 UTC
−4 points
3 comments3 min readLW link

No­tice your everything

metachirality8 Aug 2023 2:38 UTC
15 points
1 comment2 min readLW link

Model Or­ganisms of Misal­ign­ment: The Case for a New Pillar of Align­ment Research

8 Aug 2023 1:30 UTC
312 points
29 comments18 min readLW link1 review

Per­pet­u­ally De­clin­ing Pop­u­la­tion?

jefftk8 Aug 2023 1:30 UTC
48 points
29 comments3 min readLW link
(www.jefftk.com)

[Question] How do I find all the items on LW that I’ve *fa­vor­ited* or up­voted?

Alex K. Chen (parrot)7 Aug 2023 23:51 UTC
14 points
3 comments1 min readLW link

A plea for more fund­ing short­fall transparency

porby7 Aug 2023 21:33 UTC
73 points
4 comments2 min readLW link

[Question] Tips for re­duc­ing think­ing branch­ing factor

Simon Berens7 Aug 2023 20:21 UTC
4 points
6 comments1 min readLW link

An in­ter­ac­tive in­tro­duc­tion to grokking and mechanis­tic interpretability

7 Aug 2023 19:09 UTC
23 points
3 comments1 min readLW link
(pair.withgoogle.com)

Feed­back­loop-first Rationality

Raemon7 Aug 2023 17:58 UTC
195 points
65 comments8 min readLW link

Grow­ing Bon­sai Net­works with RNNs

ameo7 Aug 2023 17:34 UTC
21 points
5 comments1 min readLW link
(cprimozic.net)

[Question] Should I test my­self for microplas­tics?

Augs7 Aug 2023 17:31 UTC
9 points
2 comments1 min readLW link

Op­ti­mi­sa­tion Mea­sures: Desider­ata, Im­pos­si­bil­ity, Proposals

7 Aug 2023 15:52 UTC
36 points
9 comments1 min readLW link

An­nounc­ing the Clearer Think­ing micro-grants pro­gram for 2023

spencerg7 Aug 2023 15:21 UTC
14 points
1 comment1 min readLW link
(www.clearerthinking.org)

What I’ve been read­ing, July–Au­gust 2023

jasoncrawford7 Aug 2023 14:22 UTC
23 points
0 comments13 min readLW link
(rootsofprogress.org)

Monthly Roundup #9: Au­gust 2023

Zvi7 Aug 2023 13:20 UTC
42 points
25 comments57 min readLW link
(thezvi.wordpress.com)