RSS

simeon_c

Karma: 1,334

@SaferAI

A Fron­tier AI Risk Man­age­ment Frame­work: Bridg­ing the Gap Between Cur­rent AI Prac­tices and Estab­lished Risk Management

Mar 13, 2025, 6:29 PM
10 points
0 comments1 min readLW link
(arxiv.org)

Towards Quan­ti­ta­tive AI Risk Management

Oct 16, 2024, 7:26 PM
28 points
1 comment6 min readLW link

simeon_c’s Shortform

simeon_cApr 4, 2024, 9:01 AM
5 points
73 comments1 min readLW link

Fore­cast­ing fu­ture gains due to post-train­ing enhancements

Mar 8, 2024, 2:11 AM
31 points
2 comments1 min readLW link
(docs.google.com)

Davi­dad’s Prov­ably Safe AI Ar­chi­tec­ture—ARIA’s Pro­gramme Thesis

simeon_cFeb 1, 2024, 9:30 PM
69 points
17 comments1 min readLW link
(www.aria.org.uk)

A Brief Assess­ment of OpenAI’s Pre­pared­ness Frame­work & Some Sugges­tions for Improvement

simeon_cJan 22, 2024, 8:08 PM
14 points
0 comments6 min readLW link
(uploads-ssl.webflow.com)

Re­spon­si­ble Scal­ing Poli­cies Are Risk Man­age­ment Done Wrong

simeon_cOct 25, 2023, 11:46 PM
123 points
35 comments22 min readLW link1 review
(www.navigatingrisks.ai)

[Question] Do LLMs Im­ple­ment NLP Al­gorithms for Bet­ter Next To­ken Pre­dic­tions?

simeon_cSep 19, 2023, 12:28 PM
5 points
1 comment1 min readLW link

[Question] In the Short-Term, Why Couldn’t You Just RLHF-out In­stru­men­tal Con­ver­gence?

simeon_cSep 16, 2023, 10:44 AM
21 points
6 comments1 min readLW link

AGI x An­i­mal Welfare: A High-EV Outreach Op­por­tu­nity?

simeon_cJun 28, 2023, 8:44 PM
29 points
0 comments1 min readLW link

The Cruel Trade-Off Between AI Mi­suse and AI X-risk Concerns

simeon_cApr 22, 2023, 1:49 PM
24 points
1 comment2 min readLW link

AI Takeover Sce­nario with Scaled LLMs

simeon_cApr 16, 2023, 11:28 PM
42 points
15 comments8 min readLW link

Nav­i­gat­ing AI Risks (NAIR) #1: Slow­ing Down AI

simeon_cApr 14, 2023, 2:35 PM
11 points
3 comments1 min readLW link
(navigatingairisks.substack.com)

Re­quest to AGI or­ga­ni­za­tions: Share your views on paus­ing AI progress

Apr 11, 2023, 5:30 PM
141 points
11 comments1 min readLW link

[Question] Could Si­mu­lat­ing an AGI Tak­ing Over the World Ac­tu­ally Lead to a LLM Tak­ing Over the World?

simeon_cJan 13, 2023, 6:33 AM
15 points
1 comment1 min readLW link

[Linkpost] Dream­erV3: A Gen­eral RL Architecture

simeon_cJan 12, 2023, 3:55 AM
23 points
3 comments1 min readLW link
(arxiv.org)

[Question] Are Mix­ture-of-Ex­perts Trans­form­ers More In­ter­pretable Than Dense Trans­form­ers?

simeon_cDec 31, 2022, 11:34 AM
8 points
5 comments1 min readLW link

AGI Timelines in Gover­nance: Differ­ent Strate­gies for Differ­ent Timeframes

Dec 19, 2022, 9:31 PM
65 points
28 comments10 min readLW link

Ex­tract­ing and Eval­u­at­ing Causal Direc­tion in LLMs’ Activations

Dec 14, 2022, 2:33 PM
29 points
5 comments11 min readLW link

Is GPT3 a Good Ra­tion­al­ist? - In­struc­tGPT3 [2/​2]

simeon_cApr 7, 2022, 1:46 PM
11 points
0 comments7 min readLW link