simeon_c

Karma: 1,307

@SaferAI

Towards Quantitative AI Risk Management

Henry Papadatos and simeon_c

16 Oct 2024 19:26 UTC

28 points

1 comment6 min readLW link

simeon_c’s Shortform

simeon_c4 Apr 2024 9:01 UTC

5 points

73 comments1 min readLW link

Forecasting future gains due to post-training enhancements

elifland, Joel Becker and simeon_c

8 Mar 2024 2:11 UTC

26 points

2 comments1 min readLW link

(docs.google.com)

Davidad’s Provably Safe AI Architecture—ARIA’s Programme Thesis

simeon_c1 Feb 2024 21:30 UTC

69 points

17 comments1 min readLW link

(www.aria.org.uk)

A Brief Assessment of OpenAI’s Preparedness Framework & Some Suggestions for Improvement

simeon_c22 Jan 2024 20:08 UTC

14 points

0 comments6 min readLW link

(uploads-ssl.webflow.com)

Responsible Scaling Policies Are Risk Management Done Wrong

simeon_c25 Oct 2023 23:46 UTC

120 points

34 comments22 min readLW link

(www.navigatingrisks.ai)

[Question] Do LLMs Implement NLP Algorithms for Better Next Token Predictions?

simeon_c19 Sep 2023 12:28 UTC

5 points

1 comment1 min readLW link

[Question] In the Short-Term, Why Couldn’t You Just RLHF-out Instrumental Convergence?

simeon_c16 Sep 2023 10:44 UTC

21 points

6 comments1 min readLW link

AGI x Animal Welfare: A High-EV Outreach Opportunity?

simeon_c28 Jun 2023 20:44 UTC

29 points

0 comments1 min readLW link

The Cruel Trade-Off Between AI Misuse and AI X-risk Concerns

simeon_c22 Apr 2023 13:49 UTC

24 points

1 comment2 min readLW link

AI Takeover Scenario with Scaled LLMs

simeon_c16 Apr 2023 23:28 UTC

42 points

15 comments8 min readLW link

Navigating AI Risks (NAIR) #1: Slowing Down AI

simeon_c14 Apr 2023 14:35 UTC

11 points

3 comments1 min readLW link

(navigatingairisks.substack.com)

Request to AGI organizations: Share your views on pausing AI progress

Akash and simeon_c

11 Apr 2023 17:30 UTC

141 points

11 comments1 min readLW link

[Question] Could Simulating an AGI Taking Over the World Actually Lead to a LLM Taking Over the World?

simeon_c13 Jan 2023 6:33 UTC

15 points

1 comment1 min readLW link

[Linkpost] DreamerV3: A General RL Architecture

simeon_c12 Jan 2023 3:55 UTC

23 points

3 comments1 min readLW link

(arxiv.org)

[Question] Are Mixture-of-Experts Transformers More Interpretable Than Dense Transformers?

simeon_c31 Dec 2022 11:34 UTC

7 points

5 comments1 min readLW link

AGI Timelines in Governance: Different Strategies for Different Timeframes

simeon_c and AmberDawn

19 Dec 2022 21:31 UTC

65 points

28 comments10 min readLW link

Extracting and Evaluating Causal Direction in LLMs’ Activations

Fabien Roger and simeon_c

14 Dec 2022 14:33 UTC

29 points

5 comments11 min readLW link

Is GPT3 a Good Rationalist? - InstructGPT3 [2/2]

simeon_c7 Apr 2022 13:46 UTC

11 points

0 comments7 min readLW link

New GPT3 Impressive Capabilities—InstructGPT3 [1/2]

simeon_c13 Mar 2022 10:58 UTC

72 points

10 comments7 min readLW link