20 Jun 2024 23:14 UTC

40 points

2 comments21 min readLW link

Evaporation of improvements

Viliam20 Jun 2024 18:34 UTC

28 points

27 comments2 min readLW link

Interpreting and Steering Features in Images

Gytis Daujotas20 Jun 2024 18:33 UTC

65 points

6 comments5 min readLW link

Claude 3.5 Sonnet

Zach Stein-Perlman20 Jun 2024 18:00 UTC

75 points

41 comments1 min readLW link

(www.anthropic.com)

[Question] What is going to happen in a case of an AGI era where humans are out of the game?

Cipolla20 Jun 2024 17:44 UTC

−2 points

1 comment1 min readLW link

Jailbreak steering generalization

Sarah Ball and Nina Panickssery

20 Jun 2024 17:25 UTC

41 points

4 comments2 min readLW link

(arxiv.org)

Case studies on social-welfare-based standards in various industries

HoldenKarnofsky20 Jun 2024 13:33 UTC

42 points

0 comments1 min readLW link

AI #69: Nice

Zvi20 Jun 2024 12:40 UTC

65 points

9 comments51 min readLW link

(thezvi.wordpress.com)

Niche product design

Itay Dreyfus20 Jun 2024 6:34 UTC

2 points

1 comment3 min readLW link

(productidentity.co)

Data on AI

Robi Rahman, Jaime Sevilla Molina, Pablo Villalobos and Ben Cottier

20 Jun 2024 6:31 UTC

1 point

0 comments1 min readLW link

(epochai.org)

Actually, Power Plants May Be an AI Training Bottleneck.

Lao Mein20 Jun 2024 4:41 UTC

83 points

13 comments2 min readLW link

Proposing the Post-Singularity Symbiotic Researches

Hiroshi Yamakawa20 Jun 2024 4:05 UTC

5 points

0 comments12 min readLW link

Week One of Studying Transformers Architecture

JustisMills20 Jun 2024 3:47 UTC

3 points

0 comments15 min readLW link

(justismills.substack.com)

[Question] What are things you’re allowed to do as a startup?

Elizabeth20 Jun 2024 0:01 UTC

30 points

9 comments1 min readLW link

LessWrong/ACX meetup Transilvanya tour—Alba Iulia

Marius Adrian Nicoară19 Jun 2024 19:56 UTC

1 point

1 comment1 min readLW link

Chronic perfectionism through the eyes of school reports

Stuart Johnson19 Jun 2024 17:46 UTC

13 points

3 comments1 min readLW link

Ilya Sutskever created a new AGI startup

harfe19 Jun 2024 17:17 UTC

95 points

35 comments1 min readLW link

(ssi.inc)

Beyond the Board: Exploring AI Robustness Through Go

AdamGleave19 Jun 2024 16:40 UTC

41 points

2 comments1 min readLW link

(far.ai)

A study on cults and non-cults—answer questions about a group and get a cult score

spencerg19 Jun 2024 14:30 UTC

1 point

8 comments1 min readLW link

(www.guidedtrack.com)

Workshop: data analysis for software engineers

Derek M. Jones19 Jun 2024 14:20 UTC

2 points

0 comments1 min readLW link

FLEXIBLE AND ADAPTABLE LLM’s WITH CONTINUOUS SELF TRAINING

Escaque 6619 Jun 2024 14:17 UTC

−11 points

0 comments3 min readLW link

Surviving Seveneves

Yair Halberstadt19 Jun 2024 13:11 UTC

41 points

4 comments11 min readLW link

Self responsibility

Elo19 Jun 2024 10:17 UTC

17 points

3 comments2 min readLW link

Gizmo Watch Review

jefftk18 Jun 2024 20:00 UTC

20 points

1 comment6 min readLW link

(www.jefftk.com)

Boycott OpenAI

PeterMcCluskey18 Jun 2024 19:52 UTC

163 points

26 comments1 min readLW link

(bayesianinvestor.com)

Loving a world you don’t trust

Joe Carlsmith18 Jun 2024 19:31 UTC

134 points

13 comments33 min readLW link

Book review: the Iliad

philh18 Jun 2024 18:50 UTC

20 points

1 comment14 min readLW link

(reasonableapproximation.net)

AI Safety Newsletter #37: US Launches Antitrust Investigations Plus, recent criticisms of OpenAI and Anthropic, and a summary of Situational Awareness

Corin Katzke, Alexa Pan, Julius and Dan H

18 Jun 2024 18:07 UTC

8 points

0 comments5 min readLW link

(newsletter.safe.ai)

Suffering Is Not Pain

jbkjr18 Jun 2024 18:04 UTC

34 points

45 comments5 min readLW link

(jbkjr.me)

Lamini’s Targeted Hallucination Reduction May Be a Big Deal for Job Automation

sweenesm18 Jun 2024 15:29 UTC

3 points

0 comments1 min readLW link

On DeepMind’s Frontier Safety Framework

Zvi18 Jun 2024 13:30 UTC

37 points

4 comments8 min readLW link

(thezvi.wordpress.com)

[Linkpost] Transcendence: Generative Models Can Outperform The Experts That Train Them

Bogdan Ionut Cirstea18 Jun 2024 11:00 UTC

19 points

3 comments1 min readLW link

(arxiv.org)

I would have shit in that alley, too

Declan Molony18 Jun 2024 4:41 UTC

431 points

134 comments4 min readLW link

[Question] The thing I don’t understand about AGI

Jeremy Kalfus18 Jun 2024 4:25 UTC

7 points

12 comments1 min readLW link

Calling My Second Family Dance

jefftk18 Jun 2024 2:20 UTC

11 points

0 comments1 min readLW link

(www.jefftk.com)

LLM-Secured Systems: A General-Purpose Tool For Structured Transparency

ozziegooen18 Jun 2024 0:21 UTC

7 points

1 comment1 min readLW link

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset

aphyer17 Jun 2024 21:29 UTC

51 points

11 comments6 min readLW link

Questionable Narratives of “Situational Awareness”

fergusq17 Jun 2024 21:01 UTC

0 points

1 comment1 min readLW link

(forum.effectivealtruism.org)

ZuVillage Georgia – Mission Statement

Burns17 Jun 2024 19:53 UTC

3 points

3 comments9 min readLW link

Getting 50% (SoTA) on ARC-AGI with GPT-4o

ryan_greenblatt17 Jun 2024 18:44 UTC

262 points

49 comments13 min readLW link

Sycophancy to subterfuge: Investigating reward tampering in large language models

Carson Denison and evhub

17 Jun 2024 18:41 UTC

161 points

22 comments8 min readLW link

(arxiv.org)

Labor Participation is a High-Priority AI Alignment Risk

alex17 Jun 2024 18:09 UTC

4 points

0 comments17 min readLW link

Towards a Less Bullshit Model of Semantics

johnswentworth and David Lorell

17 Jun 2024 15:51 UTC

94 points

44 comments21 min readLW link

Analysing Adversarial Attacks with Linear Probing

Yoann Poupart, Imene Kerboua, Clement Neo and Jason Hoelscher-Obermaier

17 Jun 2024 14:16 UTC

9 points

0 comments8 min readLW link

What’s the future of AI hardware?

Itay Dreyfus17 Jun 2024 13:05 UTC

2 points

0 comments8 min readLW link

(productidentity.co)

OpenAI #8: The Right to Warn

Zvi17 Jun 2024 12:00 UTC

97 points

8 comments34 min readLW link

(thezvi.wordpress.com)

Logit Prisms: Decomposing Transformer Outputs for Mechanistic Interpretability

ntt12317 Jun 2024 11:46 UTC

5 points

4 comments6 min readLW link

(neuralblog.github.io)

Weak AGIs Kill Us First

yrimon17 Jun 2024 11:13 UTC

15 points

4 comments9 min readLW link

[Linkpost] Guardian article covering Lightcone Infrastructure, Manifest and CFAR ties to FTX

ROM17 Jun 2024 10:05 UTC

8 points

9 comments1 min readLW link

(www.theguardian.com)

Fat Tails Discourage Compromise

niplav17 Jun 2024 9:39 UTC

53 points

5 comments1 min readLW link