Fron­tier Models are Ca­pable of In-con­text Scheming

5 Dec 2024 22:11 UTC
203 points
24 comments7 min readLW link

Should you be wor­ried about H5N1?

gw5 Dec 2024 21:11 UTC
87 points
2 comments5 min readLW link
(www.georgeyw.com)

Are SAE fea­tures from the Base Model still mean­ingful to LLaVA?

Shan23Chen5 Dec 2024 20:21 UTC
7 points
1 comment10 min readLW link
(www.lesswrong.com)

o1 tried to avoid be­ing shut down

Raelifin5 Dec 2024 19:52 UTC
10 points
5 comments1 min readLW link
(www.transformernews.ai)

More Growth, Me­lan­choly, and MindCraft @3QD [re­vised and up­dated]

Bill Benzon5 Dec 2024 19:36 UTC
4 points
0 comments4 min readLW link

Ex­pevolu, a laissez-faire ap­proach to coun­try creation

Fernando5 Dec 2024 19:29 UTC
4 points
3 comments43 min readLW link
(expevolu.substack.com)

Are SAE fea­tures from the Base Model still mean­ingful to LLaVA?

Shan23Chen5 Dec 2024 19:24 UTC
4 points
0 comments10 min readLW link

OpenAI o1 + ChatGPT Pro release

anaguma5 Dec 2024 19:13 UTC
5 points
0 comments1 min readLW link
(openai.com)

Smart peo­ple should do biology

Haotian5 Dec 2024 19:11 UTC
9 points
2 comments3 min readLW link

An­nounce­ment: AI for Math Fund

sarahconstantin5 Dec 2024 18:33 UTC
20 points
9 comments2 min readLW link
(renaissancephilanthropy.org)

De­tec­tion of Asymp­tomat­i­cally Spread­ing Pathogens

jefftk5 Dec 2024 18:20 UTC
45 points
7 comments7 min readLW link
(www.jefftk.com)

Model In­tegrity: MAI on Value Alignment

Jonas Hallgren5 Dec 2024 17:11 UTC
6 points
11 comments1 min readLW link
(meaningalignment.substack.com)

So­cial Science in its episte­molog­i­cal context

Arturo Macias5 Dec 2024 16:12 UTC
3 points
0 comments1 min readLW link
(www.theseedsofscience.pub)

Higher and lower pleasures

Chris_Leong5 Dec 2024 13:13 UTC
19 points
3 comments1 min readLW link

Sam Har­ris’s Ar­gu­ment For Ob­jec­tive Morality

Zero Contradictions5 Dec 2024 10:19 UTC
7 points
5 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

Mo­ral­ity as Co­op­er­a­tion Part III: Failure Modes

DeLesley Hutchins5 Dec 2024 9:39 UTC
4 points
0 comments20 min readLW link

Mo­ral­ity as Co­op­er­a­tion Part II: The­ory and Experiment

DeLesley Hutchins5 Dec 2024 9:04 UTC
2 points
0 comments17 min readLW link

Mo­ral­ity as Co­op­er­a­tion Part I: Humans

DeLesley Hutchins5 Dec 2024 8:16 UTC
5 points
0 comments19 min readLW link

I Fi­nally Worked Through Bayes’ The­o­rem (Per­sonal Achieve­ment)

keltan5 Dec 2024 2:04 UTC
51 points
6 comments9 min readLW link

The Dream Machine

sarahconstantin5 Dec 2024 0:00 UTC
117 points
6 comments12 min readLW link
(sarahconstantin.substack.com)

Should you have chil­dren? A de­ci­sion frame­work for a cru­cial life choice that af­fects your­self, your child and the world

Sherrinford4 Dec 2024 23:14 UTC
0 points
1 comment20 min readLW link

CCing Mailing Lists on Ex­ter­nal Communication

jefftk4 Dec 2024 22:00 UTC
9 points
0 comments1 min readLW link
(www.jefftk.com)

Pick­ing favourites is hard

dkl94 Dec 2024 20:46 UTC
11 points
3 comments1 min readLW link
(dkl9.net)

[Question] How can I con­vince my cryp­to­bro friend that S&P500 is effi­cient?

AhmedNeedsATherapist4 Dec 2024 20:04 UTC
−7 points
10 comments1 min readLW link

The 2023 LessWrong Re­view: The Ba­sic Ask

Raemon4 Dec 2024 19:52 UTC
74 points
25 comments9 min readLW link

Is the AI Dooms­day Nar­ra­tive the Product of a Big Tech Con­spir­acy?

garrison4 Dec 2024 19:20 UTC
35 points
1 comment1 min readLW link
(garrisonlovely.substack.com)

[Question] AI box question

KvmanThinking4 Dec 2024 19:03 UTC
2 points
2 comments1 min readLW link

The Po­lite Coup

Charlie Sanders4 Dec 2024 14:03 UTC
3 points
0 comments3 min readLW link
(www.dailymicrofiction.com)

Anal­y­sis of Global AI Gover­nance Strategies

4 Dec 2024 10:45 UTC
38 points
10 comments36 min readLW link

[Question] Cry­on­ics con­sid­er­a­tions: how big of a prob­lem is is­chemia?

kman4 Dec 2024 4:45 UTC
8 points
1 comment1 min readLW link

AI #93: Happy Tuesday

Zvi4 Dec 2024 0:30 UTC
26 points
2 comments23 min readLW link
(thezvi.wordpress.com)

A Qual­i­ta­tive Case for LTFF: Filling Crit­i­cal Ecosys­tem Gaps

Linch3 Dec 2024 21:57 UTC
64 points
2 comments1 min readLW link

Deep Causal Transcod­ing: A Frame­work for Mechanis­ti­cally Elic­it­ing La­tent Be­hav­iors in Lan­guage Models

3 Dec 2024 21:19 UTC
83 points
7 comments41 min readLW link

“Align­ment at Large”: Bend­ing the Arc of His­tory Towards Life-Affirm­ing Futures

welfvh3 Dec 2024 21:17 UTC
5 points
0 comments4 min readLW link

Roots of Progress is hiring an event manager

jasoncrawford3 Dec 2024 20:46 UTC
10 points
0 comments7 min readLW link
(rootsofprogress.notion.site)

Do simu­lacra dream of digi­tal sheep?

EuanMcLean3 Dec 2024 20:25 UTC
16 points
36 comments10 min readLW link

Orca com­mu­ni­ca­tion pro­ject—seek­ing feed­back (and col­lab­o­ra­tors)

Towards_Keeperhood3 Dec 2024 17:29 UTC
35 points
16 comments2 min readLW link

Book a Time to Chat about In­terp Research

Logan Riggs3 Dec 2024 17:27 UTC
47 points
3 comments1 min readLW link

Balsa Re­search 2024 Update

Zvi3 Dec 2024 12:30 UTC
19 points
0 comments5 min readLW link
(thezvi.wordpress.com)

First Solo Bus Ride

jefftk3 Dec 2024 12:20 UTC
28 points
1 comment1 min readLW link
(www.jefftk.com)

How to make evals for the AISI evals bounty

TheManxLoiner3 Dec 2024 10:44 UTC
8 points
0 comments5 min readLW link

Should there be just one west­ern AGI pro­ject?

3 Dec 2024 10:11 UTC
78 points
72 comments15 min readLW link

Cog­ni­tive Bi­ases Con­tribut­ing to AI X-risk — a deleted ex­cerpt from my 2018 ARCHES draft

Andrew_Critch3 Dec 2024 9:29 UTC
46 points
2 comments5 min readLW link

[Question] What is your opinion of Dr. An­gelo Dilullo(med­i­ta­tion)?

Suh_Prance_Alot3 Dec 2024 5:54 UTC
0 points
0 comments1 min readLW link

Chem­i­cal Tur­ing Machines

Yudhister Kumar3 Dec 2024 5:26 UTC
10 points
2 comments4 min readLW link
(www.yudhister.me)

MIRI’s 2024 End-of-Year Update

Rob Bensinger3 Dec 2024 4:33 UTC
98 points
2 comments4 min readLW link

Linkpost: Rat Traps by Sheon Han in As­ter­isk Mag

Chris_Leong3 Dec 2024 3:22 UTC
12 points
5 comments1 min readLW link
(asteriskmag.com)

[Question] Who are the worth­while non-Euro­pean pre-In­dus­trial thinkers?

Lorec3 Dec 2024 1:45 UTC
12 points
4 comments1 min readLW link

A Para­dox of Si­mu­lated Suffering

arusarda2 Dec 2024 23:44 UTC
−1 points
3 comments1 min readLW link

Levels of Thought: from Points to Fields

HNX2 Dec 2024 20:25 UTC
4 points
2 comments23 min readLW link