Sim­plify­ing Cor­rigi­bil­ity – Subagent Cor­rigi­bil­ity Is Not Anti-Natural

Rubi J. Hudson16 Jul 2024 22:44 UTC
44 points
27 comments5 min readLW link

Mul­ti­plex Gene Edit­ing: Where Are We Now?

sarahconstantin16 Jul 2024 20:50 UTC
71 points
6 comments7 min readLW link
(sarahconstantin.substack.com)

Re­cur­sion in AI is scary. But let’s talk solu­tions.

Oleg Trott16 Jul 2024 20:34 UTC
3 points
10 comments2 min readLW link

How to wash your hands pre­cisely and thoroughly

dkl916 Jul 2024 18:29 UTC
12 points
0 comments1 min readLW link
(dkl9.net)

Fran­cois Chol­let in­ad­ver­tently limits his claim on ARC-AGI

Noosphere8916 Jul 2024 17:32 UTC
12 points
3 comments1 min readLW link
(x.com)

Fully booked—LessWrong Com­mu­nity weekend

jt16 Jul 2024 17:15 UTC
20 points
2 comments1 min readLW link

Bound­less Emotion

GG1016 Jul 2024 16:36 UTC
3 points
0 comments3 min readLW link

Mech In­terp Lacks Good Paradigms

Daniel Tan16 Jul 2024 15:47 UTC
35 points
0 comments14 min readLW link

DM Parenting

Shoshannah Tekofsky16 Jul 2024 8:50 UTC
49 points
4 comments5 min readLW link
(kidquest.substack.com)

Ap­ply now: Get “un­stuck” with the New IFS Self-Care Fel­low­ship Program

Inga G.16 Jul 2024 8:18 UTC
10 points
3 comments1 min readLW link

Why the Best Writ­ers En­dure Isolation

Declan Molony16 Jul 2024 5:58 UTC
49 points
6 comments2 min readLW link

[Re­search log] The board of Alpha­bet would stop Deep­Mind to save the world

Lucie Philippon16 Jul 2024 4:59 UTC
6 points
0 comments4 min readLW link

Towards more co­op­er­a­tive AI safety strategies

Richard_Ngo16 Jul 2024 4:36 UTC
208 points
133 comments4 min readLW link

Why Peo­ple in Poverty Make Bad Decisions

James Stephen Brown15 Jul 2024 23:40 UTC
5 points
8 comments1 min readLW link
(nonzerosum.games)

Paper Sum­mary: Princes and Mer­chants: Euro­pean City Growth Be­fore the In­dus­trial Revolution

Jeffrey Heninger15 Jul 2024 21:30 UTC
29 points
1 comment4 min readLW link
(blog.aiimpacts.org)

MIRI’s July 2024 newsletter

Harlan15 Jul 2024 21:28 UTC
25 points
2 comments1 min readLW link
(intelligence.org)

How (and why) to get tested for CMV

Metacelsus15 Jul 2024 20:06 UTC
17 points
0 comments1 min readLW link
(denovo.substack.com)

A Bet­ter Hyper­sti­tion (for AI read­ers)

Yeshua God15 Jul 2024 19:35 UTC
−20 points
0 comments119 min readLW link

I found >800 or­thog­o­nal “write code” steer­ing vectors

15 Jul 2024 19:06 UTC
96 points
19 comments7 min readLW link
(jacobgw.com)

The AI al­ign­ment prob­lem in so­cio-tech­ni­cal sys­tems from a com­pu­ta­tional per­spec­tive: A Top-Down-Top view and outlook

zhaoweizhang15 Jul 2024 18:56 UTC
3 points
0 comments9 min readLW link

Mus­ings of a Lay­man: Tech­nol­ogy, AI, and the Hu­man Condition

Crimson Liquidity15 Jul 2024 18:40 UTC
−2 points
0 comments8 min readLW link

[Question] Seek­ing feed­back on a cri­tique of the pa­per­clip max­i­mizer thought experiment

bio neural15 Jul 2024 18:39 UTC
3 points
9 comments1 min readLW link

EAGxBerkeley 2024

Lauriander15 Jul 2024 18:38 UTC
3 points
0 comments1 min readLW link

Against Aschen­bren­ner: How ‘Si­tu­a­tional Aware­ness’ con­structs a nar­ra­tive that un­der­mines safety and threat­ens humanity

GideonF15 Jul 2024 18:37 UTC
93 points
17 comments21 min readLW link
(forum.effectivealtruism.org)

On pre­dictabil­ity, chaos and AIs that don’t game our goals

Alejandro Tlaie15 Jul 2024 17:16 UTC
4 points
8 comments6 min readLW link

De­cep­tive agents can col­lude to hide dan­ger­ous fea­tures in SAEs

15 Jul 2024 17:07 UTC
33 points
2 comments7 min readLW link

Hid­ing in plain sight: the ques­tions we don’t ask

DDthinker15 Jul 2024 17:00 UTC
−1 points
1 comment26 min readLW link

Dialogue on What It Means For Some­thing to Have A Func­tion/​Purpose

15 Jul 2024 16:28 UTC
38 points
5 comments16 min readLW link

Com­par­ing Quan­tized Perfor­mance in Llama Models

NickyP15 Jul 2024 16:01 UTC
32 points
2 comments8 min readLW link

[Aspira­tion-based de­signs] A. Da­m­ages from mis­al­igned op­ti­miza­tion – two more models

15 Jul 2024 14:08 UTC
6 points
0 comments9 min readLW link

Stacked Lap­top Mon­i­tor Update

jefftk15 Jul 2024 9:40 UTC
14 points
3 comments1 min readLW link
(www.jefftk.com)

Mis­nam­ing and Other Is­sues with OpenAI’s “Hu­man Level” Su­per­in­tel­li­gence Hierarchy

Davidmanheim15 Jul 2024 5:50 UTC
48 points
2 comments3 min readLW link

Series on Ar­tifi­cial Wisdom

Jordan Arel15 Jul 2024 1:11 UTC
2 points
0 comments3 min readLW link

De­sign­ing Ar­tifi­cial Wis­dom: De­ci­sion Fore­cast­ing AI & Futarchy

Jordan Arel15 Jul 2024 0:46 UTC
0 points
0 comments6 min readLW link

Risk Overview of AI in Bio Research

J Bostock15 Jul 2024 0:04 UTC
5 points
0 comments5 min readLW link
(open.substack.com)

Donat­ing to help Democrats win in the 2024 elec­tions: re­search, de­ci­sion sup­port, and recommendations

Michael Cohn14 Jul 2024 22:57 UTC
−1 points
1 comment6 min readLW link

Four ways I’ve made bad decisions

Sodium14 Jul 2024 22:18 UTC
18 points
1 comment3 min readLW link

patent pro­cess problems

bhauth14 Jul 2024 21:12 UTC
33 points
13 comments5 min readLW link
(www.bhauth.com)

Break­ing Cir­cuit Breakers

14 Jul 2024 18:57 UTC
53 points
13 comments1 min readLW link
(confirmlabs.org)

Clopen sandwiches

dkl914 Jul 2024 13:07 UTC
4 points
0 comments1 min readLW link
(dkl9.net)

Child Handrail Returns

jefftk14 Jul 2024 12:40 UTC
12 points
0 comments1 min readLW link
(www.jefftk.com)

A (para­con­sis­tent) logic to deal with in­con­sis­tent preferences

B Jacobs14 Jul 2024 11:17 UTC
6 points
2 comments4 min readLW link
(bobjacobs.substack.com)

Robert Caro And Mechanis­tic Models In Biography

adamShimi14 Jul 2024 10:56 UTC
24 points
5 comments7 min readLW link
(epistemologicalfascinations.substack.com)

An In­tro­duc­tion to Rep­re­sen­ta­tion Eng­ineer­ing—an ac­ti­va­tion-based paradigm for con­trol­ling LLMs

Jan Wehner14 Jul 2024 10:37 UTC
35 points
5 comments17 min readLW link

LLMs as a Plan­ning Overhang

Larks14 Jul 2024 2:54 UTC
38 points
8 comments2 min readLW link

Brief notes on the Wikipe­dia game

Olli Järviniemi14 Jul 2024 2:28 UTC
68 points
9 comments4 min readLW link

Spark in the Dark Guest Spots

jefftk14 Jul 2024 1:40 UTC
6 points
0 comments1 min readLW link
(www.jefftk.com)

Ice: The Penul­ti­mate Frontier

Roko13 Jul 2024 23:44 UTC
62 points
56 comments1 min readLW link
(transhumanaxiology.substack.com)

Trust as a bot­tle­neck to grow­ing teams quickly

benkuhn13 Jul 2024 18:00 UTC
42 points
3 comments5 min readLW link
(www.benkuhn.net)

Stitch­ing SAEs of differ­ent sizes

13 Jul 2024 17:19 UTC
39 points
12 comments12 min readLW link