How Often Does Tak­ing Away Op­tions Help?

niplav21 Sep 2024 21:52 UTC
20 points
6 comments2 min readLW link

A Ra­tional Com­pany—Seek­ing Advisors

AlignmentOptimizer21 Sep 2024 19:51 UTC
0 points
1 comment1 min readLW link

Seek­ing mentorship

Kevin Afachao21 Sep 2024 16:54 UTC
5 points
0 comments1 min readLW link

Ap­pli­ca­tions of Chaos: Say­ing No (with Hast­ings Greer)

Elizabeth21 Sep 2024 16:30 UTC
50 points
16 comments2 min readLW link
(acesounderglass.com)

In­ves­ti­gat­ing an in­surance-for-AI startup

21 Sep 2024 15:29 UTC
69 points
0 comments16 min readLW link
(www.strataoftheworld.com)

An Un­mea­sured Song of Measurement

jan Sijan21 Sep 2024 15:08 UTC
−3 points
0 comments4 min readLW link

Should Sports Bet­ting Be Banned?

Maxwell Tabarrok21 Sep 2024 14:13 UTC
18 points
2 comments4 min readLW link
(www.maximum-progress.com)

Work with me on agent foun­da­tions: in­de­pen­dent fellowship

Alex_Altair21 Sep 2024 13:59 UTC
49 points
5 comments3 min readLW link

Elec­tric Mandola

jefftk21 Sep 2024 13:40 UTC
9 points
0 comments1 min readLW link
(www.jefftk.com)

Glitch To­ken Cat­a­log - (Al­most) a Full Clear

Lao Mein21 Sep 2024 12:22 UTC
38 points
3 comments37 min readLW link

The Other Ex­is­ten­tial Crisis

James Stephen Brown21 Sep 2024 1:16 UTC
9 points
24 comments2 min readLW link

Ap­ply to MATS 7.0!

21 Sep 2024 0:23 UTC
31 points
0 comments5 min readLW link

Moscow – ACX Mee­tups Every­where Fall 2024

red-hara20 Sep 2024 23:03 UTC
−1 points
0 comments1 min readLW link

Val­i­dat­ing /​ find­ing al­ign­ment-rele­vant con­cepts us­ing neu­ral data

Bogdan Ionut Cirstea20 Sep 2024 21:12 UTC
7 points
0 comments1 min readLW link
(docs.google.com)

Aug­ment­ing Statis­ti­cal Models with Nat­u­ral Lan­guage Parameters

jsteinhardt20 Sep 2024 18:30 UTC
34 points
0 comments8 min readLW link
(bounded-regret.ghost.io)

Fun With The Tab­ula Muris (Se­nis)

sarahconstantin20 Sep 2024 18:20 UTC
25 points
0 comments8 min readLW link
(sarahconstantin.substack.com)

My Cri­tique of Effec­tive Altruism

Dylan Price20 Sep 2024 17:41 UTC
−10 points
7 comments4 min readLW link

[Question] Why be moral if we can’t mea­sure how moral we are? Is it even pos­si­ble to mea­sure moral­ity?

OKlogic20 Sep 2024 17:40 UTC
−2 points
0 comments3 min readLW link

On Mea­sur­ing In­tel­lec­tual Perfor­mance—per­sonal ex­pe­rience and sev­eral thoughts

Alexander Gufan20 Sep 2024 17:21 UTC
3 points
2 comments8 min readLW link

In­tro­duc­tion to Su­per Pow­ers (for kids!)

Shoshannah Tekofsky20 Sep 2024 17:17 UTC
25 points
0 comments3 min readLW link
(kidquest.substack.com)

Col­laps­ing “Col­laps­ing the Belief/​Knowl­edge Distinc­tion”

Jeremias20 Sep 2024 16:11 UTC
3 points
0 comments4 min readLW link

A New Class of Glitch To­kens—BPE Subto­ken Ar­ti­facts (BSA)

Lao Mein20 Sep 2024 13:13 UTC
37 points
7 comments5 min readLW link

o1-pre­view is pretty good at do­ing ML on an un­known dataset

Håvard Tveit Ihle20 Sep 2024 8:39 UTC
67 points
1 comment2 min readLW link

Mo­ral Trade, Im­pact Distri­bu­tions and Large Worlds

Larks20 Sep 2024 3:45 UTC
7 points
0 comments1 min readLW link

Key­board Gremlins

jefftk20 Sep 2024 2:30 UTC
10 points
0 comments2 min readLW link
(www.jefftk.com)

The case for more Align­ment Tar­get Anal­y­sis (ATA)

20 Sep 2024 1:14 UTC
25 points
13 comments17 min readLW link

Piling bounded arguments

momom219 Sep 2024 22:27 UTC
7 points
0 comments4 min readLW link

We Don’t Know Our Own Values, but Re­ward Bridges The Is-Ought Gap

19 Sep 2024 22:22 UTC
47 points
47 comments5 min readLW link

In­ter­ested in Cog­ni­tive Boot­camp?

Raemon19 Sep 2024 22:12 UTC
48 points
0 comments2 min readLW link

Just How Good Are Modern Chess Com­put­ers?

nem19 Sep 2024 18:57 UTC
10 points
1 comment6 min readLW link

RLHF is the worst pos­si­ble thing done when fac­ing the al­ign­ment problem

tailcalled19 Sep 2024 18:56 UTC
32 points
10 comments6 min readLW link

AISafety.info: What are In­duc­tive Bi­ases?

Algon19 Sep 2024 17:26 UTC
11 points
4 comments2 min readLW link
(aisafety.info)

Physics of Lan­guage mod­els (part 2.1)

Nathan Helm-Burger19 Sep 2024 16:48 UTC
9 points
2 comments1 min readLW link
(youtu.be)

Why good things of­ten don’t lead to bet­ter outcomes

DMMF19 Sep 2024 16:37 UTC
16 points
1 comment4 min readLW link
(danfrank.ca)

To CoT or not to CoT? Chain-of-thought helps mainly on math and sym­bolic reasoning

Bogdan Ionut Cirstea19 Sep 2024 16:13 UTC
21 points
1 comment1 min readLW link
(arxiv.org)

Laz­i­ness death spirals

PatrickDFarley19 Sep 2024 15:58 UTC
247 points
35 comments8 min readLW link

[In­tu­itive self-mod­els] 1. Preliminaries

Steven Byrnes19 Sep 2024 13:45 UTC
88 points
20 comments15 min readLW link

AI #82: The Gover­nor Ponders

Zvi19 Sep 2024 13:30 UTC
50 points
8 comments27 min readLW link
(thezvi.wordpress.com)

Slave Mo­ral­ity: A place for ev­ery man and ev­ery man in his place

Martin Sustrik19 Sep 2024 4:20 UTC
16 points
7 comments2 min readLW link
(250bpm.substack.com)

Which LessWrong/​Align­ment top­ics would you like to be tu­tored in? [Poll]

Ruby19 Sep 2024 1:35 UTC
43 points
12 comments1 min readLW link

The Oblique­ness Thesis

jessicata19 Sep 2024 0:26 UTC
77 points
17 comments17 min readLW link

How to choose what to work on

jasoncrawford18 Sep 2024 20:39 UTC
22 points
6 comments4 min readLW link
(blog.rootsofprogress.org)

In­ten­tion-to-Treat (Re: How harm­ful is mu­sic, re­ally?)

kqr18 Sep 2024 18:44 UTC
11 points
0 comments5 min readLW link
(entropicthoughts.com)

The case for a nega­tive al­ign­ment tax

18 Sep 2024 18:33 UTC
74 points
20 comments7 min readLW link

En­doge­nous Growth and Hu­man Intelligence

Nicholas D.18 Sep 2024 14:05 UTC
3 points
0 comments2 min readLW link

In­quisi­tive vs. ad­ver­sar­ial rationality

gb18 Sep 2024 13:50 UTC
6 points
9 comments2 min readLW link

Pro­nouns are Annoying

ymeskhout18 Sep 2024 13:30 UTC
15 points
21 comments4 min readLW link
(www.ymeskhout.com)

Is “su­per­hu­man” AI fore­cast­ing BS? Some ex­per­i­ments on the “539″ bot from the Cen­tre for AI Safety

titotal18 Sep 2024 13:07 UTC
78 points
3 comments1 min readLW link
(open.substack.com)

Knowl­edge’s practicability

Ted Nguyễn18 Sep 2024 2:31 UTC
−5 points
0 comments7 min readLW link
(tednguyen.substack.com)

Skills from a year of Pur­pose­ful Ra­tion­al­ity Practice

Raemon18 Sep 2024 2:05 UTC
185 points
18 comments7 min readLW link