All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 101112 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Three configurable prettyprinters

philh10 Aug 2023 23:10 UTC

9 points

0 comments22 min readLW link

(reasonableapproximation.net)

Ilya Sutskever’s thoughts on AI safety (July 2023): a transcript with my comments

mishka10 Aug 2023 19:07 UTC

21 points

3 comments5 min readLW link

Seeking Input to AI Safety Book for non-technical audience

Darren McKee10 Aug 2023 17:58 UTC

10 points

4 comments1 min readLW link

Evaluating GPT-4 Theory of Mind Capabilities

gcmac and Nathan

10 Aug 2023 17:57 UTC

15 points

2 comments14 min readLW link

Some alignment ideas

SelonNerias10 Aug 2023 17:51 UTC

1 point

0 comments11 min readLW link

Self Supervised Learning (SSL)

Varshul Gupta10 Aug 2023 17:43 UTC

5 points

1 comment2 min readLW link

(dubverseblack.substack.com)

Predicting Virus Relative Abundance in Wastewater

jefftk10 Aug 2023 15:46 UTC

33 points

2 comments1 min readLW link

(naobservatory.org)

AI #24: Week of the Podcast

Zvi10 Aug 2023 15:00 UTC

49 points

5 comments44 min readLW link

(thezvi.wordpress.com)

Could We Automate AI Alignment Research?

Stephen McAleese10 Aug 2023 12:17 UTC

34 points

10 comments21 min readLW link

The positional embedding matrix and previous-token heads: how do they actually work?

AdamYedidia10 Aug 2023 1:58 UTC

26 points

4 comments13 min readLW link

LLMs are (mostly) not helped by filler tokens

Kshitij Sachan10 Aug 2023 0:48 UTC

66 points

35 comments6 min readLW link

2023 ACX Meetups Everywhere—Newton, MA

duck_master9 Aug 2023 22:47 UTC

6 points

2 comments1 min readLW link

Progress links digest, 2023-08-09: US adds new nuclear, Katalin Karikó interview, and more

jasoncrawford9 Aug 2023 19:22 UTC

18 points

0 comments3 min readLW link

(rootsofprogress.org)

Mech Interp Challenge: August—Deciphering the First Unique Character Model

CallumMcDougall9 Aug 2023 19:14 UTC

36 points

1 comment3 min readLW link

Real Meaning of life has been found. Eliezer discovered it in 2000′s.

Jorterder9 Aug 2023 18:13 UTC

−15 points

1 comment1 min readLW link

(docs.google.com)

Marginal Revolution unofficial birthday party

Derek M. Jones9 Aug 2023 14:35 UTC

4 points

0 comments1 min readLW link

The Case for Convexity

Jesse Richardson9 Aug 2023 14:09 UTC

19 points

3 comments1 min readLW link

A content analysis of the SQ-R questionnaire and a proposal for testing EQ-SQ theory

tailcalled9 Aug 2023 13:51 UTC

10 points

2 comments13 min readLW link

[Question] Does LessWrong allow exempting posts from being scraped by GPTBot?

mic9 Aug 2023 13:02 UTC

29 points

3 comments1 min readLW link

If I Was An Eccentric Trillionaire

niplav9 Aug 2023 7:56 UTC

9 points

8 comments26 min readLW link

Modulating sycophancy in an RLHF model via activation steering

Nina Panickssery9 Aug 2023 7:06 UTC

69 points

20 comments12 min readLW link

Open Thread—August 2023

habryka9 Aug 2023 3:52 UTC

18 points

49 comments1 min readLW link

marine cloud brightening

bhauth9 Aug 2023 2:50 UTC

40 points

14 comments3 min readLW link

(www.bhauth.com)

Inflection.ai is a major AGI lab

Nikola Jurkovic9 Aug 2023 1:05 UTC

137 points

13 comments2 min readLW link

Acausal Now: We could totally acausally bargain with aliens at our current tech level if desired

Christopher King9 Aug 2023 0:50 UTC

1 point

5 comments4 min readLW link

Necromancy’s unintended consequences.

Christopher King9 Aug 2023 0:08 UTC

−6 points

2 comments2 min readLW link

What’s A “Market”?

johnswentworth8 Aug 2023 23:29 UTC

94 points

16 comments10 min readLW link

Podcast (+transcript): Nathan Barnard on how US financial regulation can inform AI governance

Aaron Bergman8 Aug 2023 21:46 UTC

8 points

0 comments1 min readLW link

(www.aaronbergman.net)

What are the flaws in this argument about p(Doom)?

William the Kiwi 8 Aug 2023 20:34 UTC

0 points

25 comments1 min readLW link

A Simple Theory Of Consciousness

SherlockHolmes8 Aug 2023 18:05 UTC

2 points

5 comments1 min readLW link

(peterholmes.medium.com)

[Linkpost] Rationally awake

jpc8 Aug 2023 17:59 UTC

−1 points

0 comments4 min readLW link

(jpc.dev)

Yet more UFO Betting: Put Up or Shut Up

MoreRatsWrongReUAP8 Aug 2023 17:50 UTC

10 points

18 comments1 min readLW link

AISN #18: Challenges of Reinforcement Learning from Human Feedback, Microsoft’s Security Breach, and Conceptual Research on AI Safety

aogara8 Aug 2023 15:52 UTC

13 points

0 comments1 min readLW link

(newsletter.safe.ai)

[Question] Beginner’s question about RLHF

FTPickle8 Aug 2023 15:48 UTC

1 point

3 comments1 min readLW link

My Trial Period as an Independent Alignment Researcher

Bart Bussmann8 Aug 2023 14:16 UTC

34 points

1 comment3 min readLW link

4 types of AGI selection, and how to constrain them

Remmelt8 Aug 2023 10:02 UTC

−4 points

3 comments3 min readLW link

Notice your everything

metachirality8 Aug 2023 2:38 UTC

15 points

1 comment2 min readLW link

Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research

evhub, Nicholas Schiefer, Carson Denison and Ethan Perez

8 Aug 2023 1:30 UTC

312 points

29 comments18 min readLW link 1 review

Perpetually Declining Population?

jefftk8 Aug 2023 1:30 UTC

48 points

29 comments3 min readLW link

(www.jefftk.com)

[Question] How do I find all the items on LW that I’ve favorited or upvoted?

Alex K. Chen (parrot)7 Aug 2023 23:51 UTC

14 points

3 comments1 min readLW link

A plea for more funding shortfall transparency

porby7 Aug 2023 21:33 UTC

73 points

4 comments2 min readLW link

[Question] Tips for reducing thinking branching factor

Simon Berens7 Aug 2023 20:21 UTC

4 points

6 comments1 min readLW link

An interactive introduction to grokking and mechanistic interpretability

Adam Pearce and Asma Ghandeharioun

7 Aug 2023 19:09 UTC

23 points

3 comments1 min readLW link

(pair.withgoogle.com)

Feedbackloop-first Rationality

Raemon7 Aug 2023 17:58 UTC

195 points

65 comments8 min readLW link

Growing Bonsai Networks with RNNs

ameo7 Aug 2023 17:34 UTC

21 points

5 comments1 min readLW link

(cprimozic.net)

[Question] Should I test myself for microplastics?

Augs7 Aug 2023 17:31 UTC

9 points

2 comments1 min readLW link

Optimisation Measures: Desiderata, Impossibility, Proposals

mattmacdermott and Alexander Gietelink Oldenziel

7 Aug 2023 15:52 UTC

36 points

9 comments1 min readLW link

Announcing the Clearer Thinking micro-grants program for 2023

spencerg7 Aug 2023 15:21 UTC

14 points

1 comment1 min readLW link

(www.clearerthinking.org)

What I’ve been reading, July–August 2023

jasoncrawford7 Aug 2023 14:22 UTC

23 points

0 comments13 min readLW link

(rootsofprogress.org)

Monthly Roundup #9: August 2023

Zvi7 Aug 2023 13:20 UTC

42 points

25 comments57 min readLW link

(thezvi.wordpress.com)