All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 9 10 11 121314 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

What do Marginal Grants at EAIF Look Like? Funding Priorities and Grantmaking Thresholds at the EA Infrastructure Fund

Linch12 Oct 2023 21:40 UTC

20 points

0 comments1 min readLW link

unRLHF—Efficiently undoing LLM safeguards

Pranav Gade, Jeffrey Ladish and Simon Lermen

12 Oct 2023 19:58 UTC

117 points

15 comments20 min readLW link

LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B

Simon Lermen and Jeffrey Ladish

12 Oct 2023 19:58 UTC

151 points

29 comments14 min readLW link

[Question] Looking for reading recommendations: Theories of right/justice that safeguard against having one’s job automated?

bulKlub12 Oct 2023 19:40 UTC

−1 points

1 comment1 min readLW link

The International PauseAI Protest: Activism under uncertainty

Joseph Miller12 Oct 2023 17:36 UTC

32 points

1 comment1 min readLW link

AI #33: Cool New Interpretability Paper

Zvi12 Oct 2023 16:20 UTC

46 points

18 comments46 min readLW link

(thezvi.wordpress.com)

Noticing confusion in physics

Jacob G-W12 Oct 2023 15:21 UTC

20 points

27 comments2 min readLW link

(jacobgw.com)

[Question] How to make to-do lists (and to get things done)?

TeaTieAndHat12 Oct 2023 14:26 UTC

9 points

13 comments2 min readLW link

Relevance of ‘Harmful Intelligence’ Data in Training Datasets (WebText vs. Pile)

MiguelDev12 Oct 2023 12:08 UTC

12 points

0 comments9 min readLW link

Soulmate Fermi Estimate + My A(ltr)u[t]istic Mating Strategy

Jordan Arel12 Oct 2023 8:32 UTC

0 points

9 comments3 min readLW link

Evolution Solved Alignment (what sharp left turn?)

jacob_cannell12 Oct 2023 4:15 UTC

16 points

89 comments4 min readLW link

The CHOICE

Gabi QUENE12 Oct 2023 3:02 UTC

−29 points

2 comments3 min readLW link

Solstice 2023 Roundup

dspeyer11 Oct 2023 23:09 UTC

28 points

6 comments1 min readLW link

Understanding LLMs: Some basic observations about words, syntax, and discourse [w/ a conjecture about grokking]

Bill Benzon11 Oct 2023 19:13 UTC

6 points

0 comments5 min readLW link

[Linkpost] Generalization in diffusion models arises from geometry-adaptive harmonic representation

Bogdan Ionut Cirstea11 Oct 2023 17:48 UTC

4 points

3 comments1 min readLW link

What I’ve been reading, October 2023: The stirrup in Europe, 19th-century art deco, and more

jasoncrawford11 Oct 2023 16:11 UTC

18 points

2 comments11 min readLW link

(rootsofprogress.org)

EA Madrid social

Pablo Villalobos11 Oct 2023 15:34 UTC

6 points

0 comments1 min readLW link

Attributing to interactions with GCPD and GWPD

jenny11 Oct 2023 15:06 UTC

20 points

0 comments6 min readLW link

You’re Measuring Model Complexity Wrong

Jesse Hoogland and Stan van Wingerden

11 Oct 2023 11:46 UTC

87 points

15 comments13 min readLW link

Update on the UK AI Taskforce & upcoming AI Safety Summit

Elliot Mckernon11 Oct 2023 11:37 UTC

83 points

2 comments4 min readLW link

An explanation for every token: using an LLM to sample another LLM

Max H11 Oct 2023 0:53 UTC

35 points

5 comments11 min readLW link

[Question] Examples of Low Status Fun

niplav10 Oct 2023 23:19 UTC

18 points

17 comments1 min readLW link

A New Model for Compute Center Verification

Damin Curtis10 Oct 2023 19:22 UTC

8 points

0 comments5 min readLW link

Announcing MIRI’s new CEO and leadership team

Gretta Duleba10 Oct 2023 19:22 UTC

222 points

52 comments3 min readLW link

18 Heterodox lenses to look the world through

Shaurya Gupta10 Oct 2023 18:33 UTC

−1 points

2 comments5 min readLW link

Documenting Journey Into AI Safety

jacobhaimes10 Oct 2023 18:30 UTC

17 points

4 comments6 min readLW link

Looking for AI Art Collaborators!

beatrice@foresight.org10 Oct 2023 18:24 UTC

1 point

0 comments1 min readLW link

Childhood Roundup #3

Zvi10 Oct 2023 14:30 UTC

49 points

3 comments30 min readLW link

(thezvi.wordpress.com)

My simple model for Alignment vs Capability

ryan_b10 Oct 2023 12:07 UTC

7 points

0 comments7 min readLW link

Next year in Jerusalem: The brilliant ideas and radiant legacy of Miriam Lipschutz Yevick [in relation to current AI debates]

Bill Benzon10 Oct 2023 9:06 UTC

1 point

0 comments1 min readLW link

(3quarksdaily.com)

I’m a Former Israeli Officer. AMA

Yovel Rom10 Oct 2023 8:33 UTC

78 points

70 comments1 min readLW link

Become a PIBBSS Research Affiliate

Nora_Ammann and DusanDNesic

10 Oct 2023 7:41 UTC

24 points

6 comments6 min readLW link

My 1st month at a “neurodivergent gifted school” called Minerva University

exanova10 Oct 2023 3:34 UTC

4 points

1 comment1 min readLW link

(inawe.substack.com)

Epistemic Motif of Abstract-Concrete Cycles & Domain Expansion

Dalcy10 Oct 2023 3:28 UTC

26 points

2 comments3 min readLW link

Simple Terminal Colors

jefftk10 Oct 2023 0:40 UTC

11 points

1 comment1 min readLW link

(www.jefftk.com)

The Handbook of Rationality (2021, MIT press) is now open access

romeostevensit10 Oct 2023 0:30 UTC

48 points

4 comments1 min readLW link

Non-superintelligent paperclip maximizers are normal

jessicata10 Oct 2023 0:29 UTC

67 points

4 comments9 min readLW link

(unstableontology.com)

The Witching Hour

Richard_Ngo10 Oct 2023 0:19 UTC

113 points

1 comment9 min readLW link

(www.narrativeark.xyz)

One: a story

Richard_Ngo10 Oct 2023 0:18 UTC

30 points

0 comments4 min readLW link

(www.narrativeark.xyz)

Truthseeking when your disagreements lie in moral philosophy

Elizabeth and Tristan Williams

10 Oct 2023 0:00 UTC

98 points

4 comments4 min readLW link

(acesounderglass.com)

NYT on the Manifest forecasting conference

Austin Chen9 Oct 2023 21:40 UTC

45 points

14 comments1 min readLW link

(www.nytimes.com)

Forecasting and prediction markets

CarlJ9 Oct 2023 20:43 UTC

3 points

0 comments1 min readLW link

Comparing Two Forecasters in an Ideal World

nikos9 Oct 2023 19:52 UTC

5 points

0 comments6 min readLW link

The case for aftermarket blind spot mirrors

Brendan Long9 Oct 2023 19:30 UTC

59 points

14 comments2 min readLW link

(www.brendanlong.com)

New contractor role: Web security task force contractor for AI safety announcements

Ethan Ashkie and Andrew_Critch

9 Oct 2023 18:36 UTC

11 points

0 comments2 min readLW link

(survivalandflourishing.com)

[Question] Anyone working on D. Amodei’s Bartlett show transcript?

Leopard9 Oct 2023 18:17 UTC

10 points

0 comments1 min readLW link

Knowledge Base 3: Shopping advisor and other uses of knowledge base about products

iwis9 Oct 2023 11:53 UTC

0 points

0 comments4 min readLW link

Knowledge Base 2: The structure and the method of building

iwis9 Oct 2023 11:53 UTC

2 points

4 comments7 min readLW link

We don’t understand what happened with culture enough

Jan_Kulveit9 Oct 2023 9:54 UTC

86 points

21 comments6 min readLW link

Leveraging Bayes’ Theorem to Supercharge Memory Techniques

disoha9 Oct 2023 3:34 UTC

−15 points

1 comment4 min readLW link