19 Jun 2023 23:58 UTC

244 points

57 comments10 min readLW link 1 review

Mode collapse in RL may be fueled by the update equation

TurnTrout and MichaelEinhorn

19 Jun 2023 21:51 UTC

49 points

10 comments8 min readLW link

New reference standard on LLM Application security started by OWASP

QuantumForest19 Jun 2023 20:54 UTC

2 points

0 comments1 min readLW link

Experiments in Evaluating Steering Vectors

Gytis Daujotas19 Jun 2023 15:11 UTC

34 points

4 comments4 min readLW link

Provisionality

TsviBT19 Jun 2023 11:49 UTC

7 points

2 comments7 min readLW link

[Question] When did you orient?

lemonhope19 Jun 2023 7:22 UTC

11 points

7 comments1 min readLW link

Guide to rationalist interior decorating

mingyuan19 Jun 2023 6:47 UTC

300 points

48 comments12 min readLW link 3 reviews

A Multidisciplinary Approach to Alignment (MATA) and Archetypal Transfer Learning (ATL)

MiguelDev19 Jun 2023 2:32 UTC

4 points

2 comments7 min readLW link

resolving some neural network mysteries

bhauth19 Jun 2023 0:09 UTC

44 points

6 comments2 min readLW link

(www.bhauth.com)

Why I am not an AI extinction cautionista

Shmi18 Jun 2023 21:28 UTC

22 points

40 comments2 min readLW link

My impression of singular learning theory

Ege Erdil18 Jun 2023 15:34 UTC

47 points

30 comments2 min readLW link

Berlin AI Alignment Open Meetup July 2023

GuyP18 Jun 2023 14:13 UTC

1 point

0 comments1 min readLW link

Alaska Trip

jefftk18 Jun 2023 13:40 UTC

18 points

0 comments2 min readLW link

(www.jefftk.com)

UK Foundation Model Task Force—Expression of Interest

ojorgensen18 Jun 2023 9:43 UTC

64 points

2 comments1 min readLW link

(twitter.com)

Cryonics Career Survey (more jobs than you think)

Mati_Roy18 Jun 2023 2:13 UTC

41 points

1 comment2 min readLW link

Solomonoff induction still works if the universe is uncomputable, and its usefulness doesn’t require knowing Occam’s razor

Christopher King18 Jun 2023 1:52 UTC

38 points

28 comments4 min readLW link

DSLT 2. Why Neural Networks obey Occam’s Razor

Liam Carroll18 Jun 2023 0:23 UTC

22 points

14 comments17 min readLW link

The foundations of knowledge.

archeon18 Jun 2023 0:05 UTC

−1 points

4 comments2 min readLW link

A few more ants and grasshoppers

c.trout17 Jun 2023 23:38 UTC

16 points

3 comments4 min readLW link

The “Loss Function of Reality” Is Not So Spiky and Unpredictable

Thoth Hermes17 Jun 2023 21:43 UTC

12 points

0 comments6 min readLW link

(thothhermes.substack.com)

[Question] What is the foundation of me experiencing the present moment being right now and not at some other point in time?

MvB17 Jun 2023 20:47 UTC

20 points

19 comments1 min readLW link

Adventist Health Study-2 supports pescetarianism more than veganism

Elizabeth17 Jun 2023 20:10 UTC

67 points

11 comments6 min readLW link

(acesounderglass.com)

The environment as infrastructure

jasoncrawford17 Jun 2023 18:42 UTC

28 points

9 comments1 min readLW link

(rootsofprogress.org)

A summary of current work in AI governance

constructive17 Jun 2023 18:41 UTC

44 points

1 comment11 min readLW link

(forum.effectivealtruism.org)

[Linkpost] Rosetta Neurons: Mining the Common Units in a Model Zoo

Bogdan Ionut Cirstea17 Jun 2023 16:38 UTC

12 points

0 comments1 min readLW link

Partial Simulation Extrapolation: A Proposal for Building Safer Simulators

lukemarks17 Jun 2023 13:55 UTC

16 points

0 comments10 min readLW link

Alewife Train is Now Arriving

jefftk17 Jun 2023 13:20 UTC

21 points

4 comments1 min readLW link

(www.jefftk.com)

[Question] What fraction of words written/read are AI-written?

Mati_Roy17 Jun 2023 13:15 UTC

8 points

6 comments1 min readLW link

Are Bayesian methods guaranteed to overfit?

Ege Erdil17 Jun 2023 12:52 UTC

52 points

5 comments3 min readLW link

(www.yulingyao.com)

The AI governance gaps in developing countries

ntran17 Jun 2023 2:50 UTC

20 points

1 comment14 min readLW link

June and Mulberries

jefftk17 Jun 2023 1:30 UTC

13 points

2 comments1 min readLW link

(www.jefftk.com)

Updating Drexler’s CAIS model

Matthew Barnett16 Jun 2023 22:53 UTC

47 points

32 comments4 min readLW link

Avoiding metaphysics means giving bad philosophy a free pass

Aditya16 Jun 2023 20:54 UTC

5 points

9 comments4 min readLW link

Criticism of Eliezer’s irrational moral beliefs

Jorterder16 Jun 2023 20:47 UTC

−17 points

21 comments1 min readLW link

Cartography, blowing one’s mind, the illusion of separation and other general musings

Neil 16 Jun 2023 19:19 UTC

0 points

4 comments2 min readLW link

[Replication] Conjecture’s Sparse Coding in Small Transformers

Hoagy and Logan Riggs

16 Jun 2023 18:02 UTC

52 points

0 comments5 min readLW link

Longevity: Double Human Lifespan in the Next Decade?

Jannik Schg16 Jun 2023 17:51 UTC

1 point

0 comments1 min readLW link

LLMs Sometimes Generate Purely Negatively-Reinforced Text

Fabien Roger16 Jun 2023 16:31 UTC

177 points

11 comments7 min readLW link

 Palantir’s AI models

ChristianKl16 Jun 2023 16:20 UTC

26 points

16 comments1 min readLW link

(www.palantir.com)

[Linkpost] Faith and Fate: Limits of Transformers on Compositionality

Joe Kwon16 Jun 2023 15:04 UTC

19 points

4 comments1 min readLW link

(arxiv.org)

The ones who endure

Richard_Ngo16 Jun 2023 14:40 UTC

61 points

16 comments5 min readLW link

(www.thinkingcomplete.com)

Conjecture: A standing offer for public debates on AI

Andrea_Miotti16 Jun 2023 14:33 UTC

29 points

1 comment2 min readLW link

(www.conjecture.dev)

Explaining “Taking features out of superposition with sparse autoencoders”

Robert_AIZI16 Jun 2023 13:59 UTC

10 points

0 comments8 min readLW link

(aizi.substack.com)

[Question] How not to write the Cookbook of Doom?

brunoparga16 Jun 2023 13:37 UTC

17 points

5 comments1 min readLW link

Scaffolded LLMs: Less Obvious Concerns

Stephen Fowler16 Jun 2023 10:39 UTC

32 points

13 comments11 min readLW link

Motivation in AI

nickasaf16 Jun 2023 9:50 UTC

−1 points

1 comment2 min readLW link

DSLT 0. Distilling Singular Learning Theory

Liam Carroll16 Jun 2023 9:50 UTC

76 points

6 comments5 min readLW link

DSLT 1. The RLCT Measures the Effective Dimension of Neural Networks

Liam Carroll16 Jun 2023 9:50 UTC

51 points

9 comments13 min readLW link

[Linkpost] Mapping Brains with Language Models: A Survey

Bogdan Ionut Cirstea16 Jun 2023 9:49 UTC

5 points

0 comments1 min readLW link

Rational Animations is looking for an AI Safety scriptwriter, a lead community manager, and other roles.

Writer16 Jun 2023 9:41 UTC

74 points

1 comment3 min readLW link