RSS

GeoGuessr Genius

ScrewtapeMay 8, 2025, 8:40 PM
8 points
0 comments1 min readLW link

Sab­o­taged Wikipedia

ScrewtapeMay 8, 2025, 8:24 PM
8 points
0 comments1 min readLW link

Is there a Half-Life for the Suc­cess Rates of AI Agents?

Matrice JacobineMay 8, 2025, 8:10 PM
9 points
0 comments1 min readLW link
(www.tobyord.com)

Misal­ign­ment and Strate­gic Un­der­perfor­mance: An Anal­y­sis of Sand­bag­ging and Ex­plo­ra­tion Hacking

May 8, 2025, 7:06 PM
55 points
0 comments15 min readLW link

A Pro­posal for Rec­og­niz­ing Non­stan­dard Intelligences

YatesMay 8, 2025, 6:10 PM
1 point
4 comments6 min readLW link

Be­hold the Pale Child (es­cap­ing Moloch’s Mad Maze)

rogersbaconMay 8, 2025, 4:36 PM
1 point
2 comments11 min readLW link
(www.secretorum.life)

An al­ign­ment safety case sketch based on debate

May 8, 2025, 3:02 PM
55 points
12 comments25 min readLW link
(arxiv.org)

Mechanis­tic In­ter­pretabil­ity Via Learn­ing Differ­en­tial Equa­tions: AI Safety Camp Pro­ject In­ter­me­di­ate Re­port.

May 8, 2025, 2:45 PM
4 points
0 comments7 min readLW link

AI #115: The Evil Ap­pli­ca­tions Division

ZviMay 8, 2025, 1:40 PM
20 points
3 comments62 min readLW link
(thezvi.wordpress.com)

The Stegano­graphic Po­ten­tials of Lan­guage Models

May 8, 2025, 11:23 AM
8 points
0 comments1 min readLW link

Our bet on whether the AI mar­ket will crash

May 8, 2025, 9:56 AM
21 points
0 comments1 min readLW link

Con­cept-an­chored rep­re­sen­ta­tion en­g­ineer­ing for alignment

Sandy FraserMay 8, 2025, 8:59 AM
2 points
0 comments3 min readLW link

Orthog­o­nal­ity Th­e­sis in lay­man’s terms.

Michael (@lethal_ai)May 8, 2025, 8:31 AM
1 point
0 comments2 min readLW link

Arkose may be clos­ing, but you can help

Victoria BrookMay 8, 2025, 7:28 AM
8 points
0 comments2 min readLW link

Heal­ing pow­ers of med­i­ta­tion or the role of at­ten­tion in hu­moral reg­u­la­tion.

Yaroslav GranowskiMay 8, 2025, 6:48 AM
7 points
0 comments1 min readLW link

Ori­ent­ing Toward Wizard Power

johnswentworthMay 8, 2025, 5:23 AM
262 points
45 comments5 min readLW link

Re­la­tional Align­ment: Trust, Re­pair, and the Emo­tional Work of AI

Priyanka BharadwajMay 8, 2025, 2:44 AM
3 points
0 comments3 min readLW link

There’s more low-hang­ing fruit in in­ter­dis­ci­plinary work thanks to LLMs

ChristianKlMay 7, 2025, 7:48 PM
23 points
0 comments1 min readLW link

OpenAI Claims Non­profit Will Re­tain Nom­i­nal Control

ZviMay 7, 2025, 7:40 PM
64 points
4 comments11 min readLW link
(thezvi.wordpress.com)

So­cial sta­tus games might have “com­pute weight class” in the future

RaemonMay 7, 2025, 6:56 PM
28 points
4 comments2 min readLW link