All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

AI Alignment Breakthroughs this week (10/08/23)

Logan Zoellner8 Oct 2023 23:30 UTC

30 points

14 comments6 min readLW link

“The Heart of Gaming is the Power Fantasy”, and Cohabitive Games

Raemon8 Oct 2023 21:02 UTC

81 points

49 comments4 min readLW link

(bottomfeeder.substack.com)

FAQ: What the heck is goal agnosticism?

porby8 Oct 2023 19:11 UTC

66 points

36 comments28 min readLW link

Time is homogeneous sequentially-composable determination

TsviBT8 Oct 2023 14:58 UTC

15 points

0 comments21 min readLW link

Linkpost: Are Emergent Abilities in Large Language Models just In-Context Learning?

Erich_Grunewald8 Oct 2023 12:14 UTC

12 points

7 comments2 min readLW link

(arxiv.org)

Bird-eye view visualization of LLM activations

Sergii8 Oct 2023 12:12 UTC

11 points

2 comments1 min readLW link

(grgv.xyz)

Perspective Based Reasoning Could Absolve CDT

dadadarren8 Oct 2023 11:22 UTC

4 points

5 comments5 min readLW link

The Gradient – The Artificiality of Alignment

mic8 Oct 2023 4:06 UTC

12 points

1 comment5 min readLW link

(thegradient.pub)

Comparing Anthropic’s Dictionary Learning to Ours

Robert_AIZI7 Oct 2023 23:30 UTC

137 points

8 comments4 min readLW link

A thought about the constraints of debtlessness in online communities

mako yass7 Oct 2023 21:26 UTC

57 points

23 comments1 min readLW link

Arguments for utilitarianism are impossibility arguments under unbounded prospects

MichaelStJules7 Oct 2023 21:08 UTC

7 points

7 comments21 min readLW link

Sam Altman’s sister, Annie Altman, claims Sam has severely abused her

pythagoras50157 Oct 2023 21:06 UTC

98 points

107 comments192 min readLW link

Griffin Island

jefftk7 Oct 2023 18:40 UTC

14 points

3 comments1 min readLW link

(www.jefftk.com)

Every Mention of EA in “Going Infinite”

KirstenH7 Oct 2023 14:42 UTC

48 points

0 comments8 min readLW link

(open.substack.com)

Fixing Insider Threats in the AI Supply Chain

Madhav Malhotra7 Oct 2023 13:19 UTC

20 points

2 comments5 min readLW link

Contra Nora Belrose on Orthogonality Thesis Being Trivial

tailcalled7 Oct 2023 11:47 UTC

18 points

21 comments1 min readLW link

Related Discussion from Thomas Kwa’s MIRI Research Experience

Raemon7 Oct 2023 6:25 UTC

71 points

140 comments1 min readLW link

[Question] Current State of Probabilistic Logic

lunatic_at_large7 Oct 2023 5:06 UTC

3 points

2 comments1 min readLW link

On the Relationship Between Variability and the Evolutionary Outcomes of Systems in Nature

Artyom Shaposhnikov7 Oct 2023 3:06 UTC

2 points

0 comments1 min readLW link

Announcing Dialogues

Ben Pace7 Oct 2023 2:57 UTC

155 points

52 comments4 min readLW link

Don’t Dismiss Simple Alignment Approaches

Chris_Leong7 Oct 2023 0:35 UTC

134 points

9 comments4 min readLW link

Linking Alt Accounts

jefftk6 Oct 2023 17:00 UTC

70 points

33 comments1 min readLW link

(www.jefftk.com)

Super-Exponential versus Exponential Growth in Compute Price-Performance

moridinamael6 Oct 2023 16:23 UTC

37 points

25 comments2 min readLW link

A personal explanation of ELK concept and task.

Zeyu Qin6 Oct 2023 3:55 UTC

1 point

0 comments1 min readLW link

The Long-Term Future Fund is looking for a full-time fund chair

Linch, calebp99 and abergal

5 Oct 2023 22:18 UTC

52 points

0 comments7 min readLW link

(forum.effectivealtruism.org)

Provably Safe AI

PeterMcCluskey5 Oct 2023 22:18 UTC

33 points

15 comments4 min readLW link

(bayesianinvestor.com)

Stampy’s AI Safety Info soft launch

steven0461 and Robert Miles

5 Oct 2023 22:13 UTC

120 points

9 comments2 min readLW link

Impacts of AI on the housing markets

PottedRosePetal5 Oct 2023 21:24 UTC

8 points

0 comments5 min readLW link

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Zac Hatfield-Dodds5 Oct 2023 21:01 UTC

288 points

22 comments2 min readLW link 1 review

(transformer-circuits.pub)

Ideation and Trajectory Modelling in Language Models

NickyP5 Oct 2023 19:21 UTC

16 points

2 comments10 min readLW link

A well-defined history in measurable factor spaces

Matthias G. Mayer5 Oct 2023 18:36 UTC

22 points

0 comments2 min readLW link

Evaluating the historical value misspecification argument

Matthew Barnett5 Oct 2023 18:34 UTC

177 points

153 comments7 min readLW link 2 reviews

Translations Should Invert

abramdemski5 Oct 2023 17:44 UTC

48 points

19 comments3 min readLW link

Censorship in LLMs is here to stay because it mirrors how our own intelligence is structured

mnvr5 Oct 2023 17:37 UTC

3 points

0 comments1 min readLW link

Twin Cities ACX Meetup October 2023

Timothy M.5 Oct 2023 16:29 UTC

1 point

2 comments1 min readLW link

This anime storyboard doesn’t exist: a graphic novel written and illustrated by GPT4

RomanS5 Oct 2023 14:01 UTC

12 points

7 comments55 min readLW link

AI #32: Lie Detector

Zvi5 Oct 2023 13:50 UTC

45 points

19 comments44 min readLW link

(thezvi.wordpress.com)

Can the House Legislate?

jefftk5 Oct 2023 13:40 UTC

26 points

6 comments2 min readLW link

(www.jefftk.com)

Making progress on the ``what alignment target should be aimed at?″ question, is urgent

ThomasCederborg5 Oct 2023 12:55 UTC

2 points

0 comments18 min readLW link

Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn

Zvi5 Oct 2023 11:39 UTC

128 points

29 comments9 min readLW link

How to Get Rationalist Feedback

Nicholas / Heather Kross5 Oct 2023 2:03 UTC

13 points

0 comments2 min readLW link

On my AI Fable, and the importance of de re, de dicto, and de se reference for AI alignment

PhilGoetz5 Oct 2023 0:50 UTC

9 points

5 comments1 min readLW link

Underspecified Probabilities: A Thought Experiment

lunatic_at_large4 Oct 2023 22:25 UTC

8 points

4 comments2 min readLW link

Fraternal Birth Order Effect and the Maternal Immune Hypothesis

Bucky4 Oct 2023 21:18 UTC

20 points

1 comment2 min readLW link

How to solve deception and still fail.

Charlie Steiner4 Oct 2023 19:56 UTC

40 points

7 comments6 min readLW link

PortAudio M1 Latency

jefftk4 Oct 2023 19:10 UTC

8 points

5 comments1 min readLW link

(www.jefftk.com)

Open Philanthropy is hiring for multiple roles across our Global Catastrophic Risks teams

aarongertler4 Oct 2023 18:04 UTC

6 points

0 comments3 min readLW link

(forum.effectivealtruism.org)

Safeguarding Humanity: Ensuring AI Remains a Servant, Not a Master

kgldeshapriya4 Oct 2023 17:52 UTC

−20 points

2 comments2 min readLW link

The 5 Pillars of Happiness

Gabi QUENE4 Oct 2023 17:50 UTC

−24 points

5 comments5 min readLW link

[Question] Using Reinforcement Learning to try to control the heating of a building (district heating)

Tony Karlsson4 Oct 2023 17:47 UTC

3 points

5 comments1 min readLW link