2 Feb 2024 23:17 UTC

98 points

0 comments9 min readLW link

Survey for alignment researchers!

Cameron Berg, Judd Rosenblatt and AE Studio

2 Feb 2024 20:41 UTC

71 points

11 comments1 min readLW link

Voting Results for the 2022 Review

Ben Pace2 Feb 2024 20:34 UTC

57 points

3 comments73 min readLW link

On Dwarkesh’s 3rd Podcast With Tyler Cowen

Zvi2 Feb 2024 19:30 UTC

36 points

9 comments21 min readLW link

(thezvi.wordpress.com)

Most experts believe COVID-19 was probably not a lab leak

DanielFilan2 Feb 2024 19:28 UTC

66 points

89 comments2 min readLW link

(gcrinstitute.org)

What Failure Looks Like is not an existential risk (and alignment is not the solution)

otto.barten2 Feb 2024 18:59 UTC

13 points

12 comments9 min readLW link

Solving alignment isn’t enough for a flourishing future

mic2 Feb 2024 18:23 UTC

27 points

0 comments1 min readLW link

(papers.ssrn.com)

Manifold Markets

PeterMcCluskey2 Feb 2024 17:48 UTC

26 points

9 comments4 min readLW link

(bayesianinvestor.com)

Types of subjective welfare

MichaelStJules2 Feb 2024 9:56 UTC

10 points

3 comments1 min readLW link

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small

Joseph Bloom2 Feb 2024 6:54 UTC

102 points

37 comments15 min readLW link

Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities

porby2 Feb 2024 5:49 UTC

47 points

1 comment4 min readLW link

(1drv.ms)

Running a Prediction Market Mafia Game

Arjun Panickssery1 Feb 2024 23:24 UTC

22 points

5 comments1 min readLW link

(arjunpanickssery.substack.com)

Evaluating Stability of Unreflective Alignment

james.lucassen1 Feb 2024 22:15 UTC

49 points

10 comments18 min readLW link

(jlucassen.com)

Davidad’s Provably Safe AI Architecture—ARIA’s Programme Thesis

simeon_c1 Feb 2024 21:30 UTC

69 points

17 comments1 min readLW link

(www.aria.org.uk)

Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis

RogerDearnaley1 Feb 2024 21:15 UTC

15 points

15 comments13 min readLW link

Wrong answer bias

lemonhope1 Feb 2024 20:05 UTC

49 points

24 comments1 min readLW link

On Not Requiring Vaccination

jefftk1 Feb 2024 19:20 UTC

31 points

21 comments1 min readLW link

(www.jefftk.com)

The economy is mostly newbs (strat predictions)

lemonhope1 Feb 2024 19:15 UTC

27 points

6 comments2 min readLW link

Managing risks while trying to do good

Wei Dai1 Feb 2024 18:08 UTC

61 points

26 comments1 min readLW link

Putting multimodal LLMs to the Tetris test

Lovre and gabrielagc

1 Feb 2024 16:02 UTC

30 points

5 comments7 min readLW link

AI #49: Bioweapon Testing Begins

Zvi1 Feb 2024 15:30 UTC

37 points

11 comments42 min readLW link

(thezvi.wordpress.com)

Some Notes on Ethics

Pareto Optimal1 Feb 2024 10:18 UTC

−3 points

0 comments1 min readLW link

(paretooptimal.substack.com)

Increasingly vague interpersonal welfare comparisons

MichaelStJules1 Feb 2024 6:45 UTC

5 points

0 comments1 min readLW link

PIBBSS Speaker events comings up in February

DusanDNesic, Nora_Ammann and Lucas Teixeira

1 Feb 2024 3:28 UTC

10 points

2 comments1 min readLW link

Drone Wars Endgame

RussellThor1 Feb 2024 2:30 UTC

36 points

71 comments8 min readLW link

Sequencing Swabs

jefftk1 Feb 2024 1:50 UTC

19 points

1 comment5 min readLW link

(www.jefftk.com)

Leading The Parade

johnswentworth31 Jan 2024 22:39 UTC

147 points

31 comments9 min readLW link

Proposal for an AI Safety Prize

sweenesm31 Jan 2024 18:35 UTC

3 points

0 comments2 min readLW link

Literally Everything is Infinite

Spiral31 Jan 2024 18:31 UTC

−9 points

8 comments5 min readLW link

What fuels your ambition?

Cissy31 Jan 2024 18:30 UTC

29 points

1 comment5 min readLW link

(www.moremyself.xyz)

“Genlangs” and Zipf’s Law: Do languages generated by ChatGPT statistically look human?

Justin-Diamond31 Jan 2024 18:30 UTC

2 points

2 comments1 min readLW link

(arxiv.org)

AI, Intellectual Property, and the Techno-Optimist Revolution

Justin-Diamond31 Jan 2024 18:30 UTC

1 point

0 comments1 min readLW link

(www.researchgate.net)

A response to an attempted rebuttal of maximising ethics

JacobBowden31 Jan 2024 17:49 UTC

−5 points

8 comments3 min readLW link

My Alignment “Plan”: Avoid Strong Optimisation and Align Economy

VojtaKovarik31 Jan 2024 17:03 UTC

24 points

9 comments7 min readLW link

Where freedom comes from

Logan Kieller31 Jan 2024 16:53 UTC

−5 points

1 comment3 min readLW link

(logankieller.substack.com)

Per protocol analysis as medical malpractice

braces31 Jan 2024 16:22 UTC

53 points

8 comments1 min readLW link

Adam Smith Meets AI Doomers

James_Miller31 Jan 2024 15:53 UTC

34 points

10 comments5 min readLW link

Ten Modes of Culture War Discourse

jchan31 Jan 2024 13:58 UTC

54 points

15 comments15 min readLW link

Without Fundamental Advances, Rebellion and Coup d’État are the Inevitable Outcomes of Dictators & Monarchs Trying to Control Large, Capable Countries

Roko31 Jan 2024 10:14 UTC

27 points

34 comments1 min readLW link

Explaining Impact Markets

Saul Munn31 Jan 2024 9:51 UTC

95 points

2 comments3 min readLW link

(www.brasstacks.blog)

Exploring OpenAI’s Latent Directions: Tests, Observations, and Poking Around

Johnny Lin31 Jan 2024 6:01 UTC

26 points

4 comments14 min readLW link

Clip keys together with tiny carabiners

Brendan Long31 Jan 2024 4:26 UTC

11 points

5 comments1 min readLW link

The problem with proportional extrapolation

pathos_bot30 Jan 2024 23:40 UTC

6 points

0 comments1 min readLW link

Counterfactual Mechanism Networks

StrivingForLegibility30 Jan 2024 20:30 UTC

4 points

0 comments5 min readLW link

Control vs Selection: Civilisation is best at control, but navigating AGI requires selection

VojtaKovarik30 Jan 2024 19:06 UTC

7 points

1 comment1 min readLW link

AI governance frames

NathanBarnard30 Jan 2024 18:18 UTC

3 points

0 comments3 min readLW link

Deciding What Project/Org to Start: A Guide to Prioritization Research

Alexandra Bos30 Jan 2024 18:13 UTC

8 points

0 comments1 min readLW link

on neodymium magnets

bhauth30 Jan 2024 15:58 UTC

47 points

6 comments4 min readLW link

(www.bhauth.com)

[Question] Can we create self-improving AIs that perfect their own ethics?

Gabi QUENE30 Jan 2024 14:45 UTC

1 point

10 comments1 min readLW link

Childhood and Education Roundup #4

Zvi30 Jan 2024 13:50 UTC

43 points

10 comments24 min readLW link

(thezvi.wordpress.com)