Un­der­stand­ing in­com­pa­ra­bil­ity ver­sus in­com­men­su­ra­bil­ity in re­la­tion to RLHF

artemiocobb2 Nov 2024 22:57 UTC
1 point
1 comment2 min readLW link

elec­tric turbofans

bhauth2 Nov 2024 22:50 UTC
61 points
2 comments5 min readLW link
(bhauth.com)

Real­ity as Cat­e­gory-The­o­retic State Machines: A Math­e­mat­i­cal Framework

Wenitte Apiou2 Nov 2024 21:04 UTC
−8 points
0 comments2 min readLW link

The Me­dian Re­searcher Problem

johnswentworth2 Nov 2024 20:16 UTC
164 points
69 comments1 min readLW link

Test­ing “True” Lan­guage Un­der­stand­ing in LLMs: A Sim­ple Proposal

MtryaSam2 Nov 2024 19:12 UTC
9 points
2 comments2 min readLW link

Test­ing “True” Lan­guage Un­der­stand­ing in LLMs: A Sim­ple Proposal

MtryaSam2 Nov 2024 19:12 UTC
−3 points
0 comments2 min readLW link

[Question] Feed­back re­quest: what am I miss­ing?

Nathan Helm-Burger2 Nov 2024 17:38 UTC
35 points
5 comments1 min readLW link

Frag­ile, Ro­bust, and An­tifrag­ile Prefer­ence Satisfaction

adamShimi2 Nov 2024 17:25 UTC
19 points
0 comments5 min readLW link
(formethods.substack.com)

Higher Order Signs, Hal­lu­ci­na­tion and Schizophrenia

Nicolas Villarreal2 Nov 2024 16:33 UTC
3 points
0 comments13 min readLW link
(nicolasdvillarreal.substack.com)

[Question] Is OpenAI net nega­tive for AI Safety?

Lysandre Terrisse2 Nov 2024 16:18 UTC
4 points
0 comments1 min readLW link

Two ar­gu­ments against longter­mist thought experiments

momom22 Nov 2024 10:22 UTC
15 points
5 comments3 min readLW link

Both-Side­sism—When Fair & Balanced Goes Wrong

James Stephen Brown2 Nov 2024 3:04 UTC
3 points
15 comments6 min readLW link
(nonzerosum.games)

What can we learn from in­se­cure do­mains?

Logan Zoellner1 Nov 2024 23:53 UTC
14 points
21 comments1 min readLW link

Science ad­vances one funeral at a time

1 Nov 2024 23:06 UTC
92 points
9 comments2 min readLW link

The Carte­sian Crisis

mindprison1 Nov 2024 23:02 UTC
−5 points
2 comments2 min readLW link

Com­po­si­tion Cir­cuits in Vi­sion Trans­form­ers (Hy­poth­e­sis)

phenomanon1 Nov 2024 22:16 UTC
1 point
0 comments3 min readLW link

SAE Prob­ing: What is it good for? Ab­solutely some­thing!

1 Nov 2024 19:23 UTC
31 points
0 comments11 min readLW link

[Question] Set The­ory Mul­ti­verse vs Math­e­mat­i­cal Truth—Philo­soph­i­cal Discussion

Wenitte Apiou1 Nov 2024 18:56 UTC
8 points
25 comments1 min readLW link

Ed­u­ca­tional CAI: Align­ing a Lan­guage Model with Ped­a­gog­i­cal Theories

Bharath Puranam1 Nov 2024 18:55 UTC
5 points
1 comment13 min readLW link

Pre­dic­tion mar­kets and Taxes

Edmund Nelson1 Nov 2024 17:39 UTC
10 points
7 comments1 min readLW link

Den­tistry, Oral Sur­geons, and the Ineffi­ciency of Small Markets

GeneSmith1 Nov 2024 17:26 UTC
76 points
16 comments5 min readLW link

Live Machin­ery: An In­ter­face De­sign Philos­o­phy for Whole­some AI Futures

Sahil1 Nov 2024 17:24 UTC
45 points
3 comments35 min readLW link

Seek­ing Collaborators

abramdemski1 Nov 2024 17:13 UTC
57 points
15 comments7 min readLW link

Com­plete Feedback

abramdemski1 Nov 2024 16:58 UTC
23 points
7 comments3 min readLW link

Lev­ers for Biolog­i­cal Progress—A Re­sponse to “Machines of Lov­ing Grace”

Niko_McCarty1 Nov 2024 16:35 UTC
15 points
0 comments20 min readLW link
(www.asimov.press)

2024 Unoffi­cial LW Com­mu­nity Cen­sus, Re­quest for Comments

Screwtape1 Nov 2024 16:34 UTC
23 points
32 comments3 min readLW link

[Question] When en­gag­ing with a large amount of re­sources dur­ing a liter­a­ture re­view, how do you pre­vent your­self from be­com­ing over­whelmed?

corruptedCatapillar1 Nov 2024 7:29 UTC
25 points
2 comments3 min readLW link

(draft) Cy­borg soft­ware should be open (?)

AtillaYasar1 Nov 2024 7:24 UTC
4 points
5 comments3 min readLW link

Another UFO Bet

codyz1 Nov 2024 1:55 UTC
6 points
11 comments1 min readLW link

Trad­ing Candy

jefftk1 Nov 2024 1:10 UTC
28 points
4 comments1 min readLW link
(www.jefftk.com)

Jar­gonBot Beta Test

Raemon1 Nov 2024 1:05 UTC
84 points
55 comments6 min readLW link

GPT-4o Guardrails Gone: Data Poi­son­ing & Jailbreak-Tuning

1 Nov 2024 0:10 UTC
18 points
0 comments6 min readLW link
(far.ai)

The sling­shot helps with learning

Wilson Wu31 Oct 2024 23:18 UTC
33 points
0 comments8 min readLW link

Toward Safety Case In­spired Ba­sic Research

31 Oct 2024 23:06 UTC
55 points
2 comments13 min readLW link

Spooky Recom­men­da­tion Sys­tem Scaling

phdead31 Oct 2024 22:00 UTC
11 points
0 comments4 min readLW link

‘Meta’, ‘mesa’, and mountains

Lorec31 Oct 2024 17:25 UTC
1 point
0 comments3 min readLW link

Toward Safety Cases For AI Scheming

31 Oct 2024 17:20 UTC
60 points
1 comment2 min readLW link

AI #88: Thanks for the Memos

Zvi31 Oct 2024 15:00 UTC
46 points
5 comments77 min readLW link
(thezvi.wordpress.com)

The Com­pendium, A full ar­gu­ment about ex­tinc­tion risk from AGI

31 Oct 2024 12:01 UTC
192 points
52 comments2 min readLW link
(www.thecompendium.ai)

Some Pre­limi­nary Notes on the Promise of a Wis­dom Explosion

Chris_Leong31 Oct 2024 9:21 UTC
2 points
0 comments1 min readLW link
(aiimpacts.org)

What TMS is like

Sable31 Oct 2024 0:44 UTC
207 points
23 comments6 min readLW link
(affablyevil.substack.com)

AI Safety at the Fron­tier: Paper High­lights, Oc­to­ber ’24

gasteigerjo31 Oct 2024 0:09 UTC
3 points
0 comments9 min readLW link
(aisafetyfrontier.substack.com)

Stan­dard SAEs Might Be In­co­her­ent: A Choos­ing Prob­lem & A “Con­cise” Solution

Kola Ayonrinde30 Oct 2024 22:50 UTC
27 points
0 comments12 min readLW link

Generic ad­vice caveats

Saul Munn30 Oct 2024 21:03 UTC
27 points
1 comment3 min readLW link
(www.brasstacks.blog)

I turned de­ci­sion the­ory prob­lems into memes about trolleys

Tapatakt30 Oct 2024 20:13 UTC
103 points
20 comments1 min readLW link

AI as a pow­er­ful meme, via CGP Grey

TheManxLoiner30 Oct 2024 18:31 UTC
46 points
8 comments4 min readLW link

[Question] How might lan­guage in­fluence how an AI “thinks”?

bodry30 Oct 2024 17:41 UTC
3 points
0 comments1 min readLW link

Mo­ti­va­tion control

Joe Carlsmith30 Oct 2024 17:15 UTC
45 points
7 comments52 min readLW link

Up­dat­ing the NAO Simulator

jefftk30 Oct 2024 13:50 UTC
11 points
0 comments2 min readLW link
(www.jefftk.com)

Oc­cu­pa­tional Li­cens­ing Roundup #1

Zvi30 Oct 2024 11:00 UTC
65 points
11 comments11 min readLW link
(thezvi.wordpress.com)