5 Sep 2024 19:13 UTC

37 points

0 comments5 min readLW link

AI x Human Flourishing: Introducing the Cosmos Institute

Brendan McCord5 Sep 2024 18:23 UTC

14 points

5 comments6 min readLW link

(cosmosinstitute.substack.com)

What is SB 1047 for?

Raemon5 Sep 2024 17:39 UTC

61 points

8 comments3 min readLW link

instruction tuning and autoregressive distribution shift

nostalgebraist5 Sep 2024 16:53 UTC

40 points

5 comments5 min readLW link

Conflating value alignment and intent alignment is causing confusion

Seth Herd5 Sep 2024 16:39 UTC

48 points

18 comments5 min readLW link

A bet for Samo Burja

Nathan Helm-Burger5 Sep 2024 16:01 UTC

13 points

2 comments2 min readLW link

Universal basic income isn’t always AGI-proof

Kevin Kohler5 Sep 2024 15:39 UTC

5 points

3 comments7 min readLW link

(machinocene.substack.com)

Why Reflective Stability is Important

Johannes C. Mayer5 Sep 2024 15:28 UTC

19 points

2 comments1 min readLW link

Why Swiss watches and Taylor Swift are AGI-proof

Kevin Kohler5 Sep 2024 13:23 UTC

17 points

11 comments6 min readLW link

(machinocene.substack.com)

Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth?

Alexander de Vries5 Sep 2024 10:23 UTC

7 points

20 comments10 min readLW link

(2ndhandecon.substack.com)

What program structures enable efficient induction?

Daniel C5 Sep 2024 10:12 UTC

21 points

5 comments3 min readLW link

How to Fake Decryption

ohmurphy5 Sep 2024 9:18 UTC

12 points

0 comments4 min readLW link

(ohmurphy.substack.com)

We Should Try to Directly Measure the Value of Scientific Papers

ohmurphy5 Sep 2024 9:08 UTC

1 point

0 comments5 min readLW link

(ohmurphy.substack.com)

on Science Beakers and DDT

bhauth5 Sep 2024 3:21 UTC

23 points

13 comments9 min readLW link

(bhauth.com)

Massive Activations and why <bos> is important in Tokenized SAE Unigrams

Louka Ewington-Pitsos5 Sep 2024 2:19 UTC

1 point

0 comments3 min readLW link

The Forging of the Great Minds: An Unfinished Tale

Aryeh Englander5 Sep 2024 0:58 UTC

−3 points

0 comments5 min readLW link

The Chatbot of Babble

Aryeh Englander5 Sep 2024 0:56 UTC

−3 points

0 comments7 min readLW link

[Question] Is it Legal to Maintain Turing Tests using Data Poisoning, and would it work?

Double5 Sep 2024 0:35 UTC

8 points

9 comments1 min readLW link

Executable philosophy as a failed totalizing meta-worldview

jessicata4 Sep 2024 22:50 UTC

93 points

40 comments4 min readLW link

(unstableontology.com)

Against Explosive Growth

c.trout4 Sep 2024 21:45 UTC

14 points

1 comment5 min readLW link

The Fragility of Life Hypothesis and the Evolution of Cooperation

KristianRonn4 Sep 2024 21:04 UTC

50 points

6 comments11 min readLW link

Emotion-Informed Valuation Mechanism for Improved AI Alignment in Large Language Models

Javier Marin Valenzuela4 Sep 2024 17:00 UTC

2 points

4 comments6 min readLW link

What happens if you present 500 people with an argument that AI is risky?

KatjaGrace and Nathan Young

4 Sep 2024 16:40 UTC

102 points

7 comments3 min readLW link

(blog.aiimpacts.org)

Automating LLM Auditing with Developmental Interpretability

htlou and evhub

4 Sep 2024 15:50 UTC

17 points

0 comments3 min readLW link

Michael Dickens’ Caffeine Tolerance Research

niplav4 Sep 2024 15:41 UTC

46 points

3 comments2 min readLW link

(mdickens.me)

[Question] Are UV-C Air purifiers so useful?

JohnBuridan4 Sep 2024 14:16 UTC

9 points

0 comments1 min readLW link

AI and the Technological Richter Scale

Zvi4 Sep 2024 14:00 UTC

48 points

8 comments13 min readLW link

(thezvi.wordpress.com)

[Question] Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?

David Scott Krueger (formerly: capybaralet)4 Sep 2024 12:40 UTC

19 points

7 comments1 min readLW link

A Comparison Between The Pragmatosphere And Less Wrong

Zero Contradictions4 Sep 2024 9:39 UTC

−18 points

10 comments2 min readLW link

(zerocontradictions.net)

Announcing the Ultimate Jailbreaking Championship

InnerHufflepuff4 Sep 2024 0:35 UTC

15 points

1 comment1 min readLW link

AI Safety at the Frontier: Paper Highlights, August ’24

gasteigerjo3 Sep 2024 19:17 UTC

28 points

0 comments6 min readLW link

(aisafetyfrontier.substack.com)

The Checklist: What Succeeding at AI Safety Will Involve

Sam Bowman3 Sep 2024 18:18 UTC

142 points

49 comments22 min readLW link

(sleepinyourhat.github.io)

Democracy beyond majoritarianism

Arturo Macias3 Sep 2024 15:10 UTC

5 points

2 comments4 min readLW link

On the UBI Paper

Zvi3 Sep 2024 14:50 UTC

57 points

6 comments19 min readLW link

(thezvi.wordpress.com)

An Opinionated Look at Inference Rules

Gianluca Calcagni3 Sep 2024 13:32 UTC

−5 points

2 comments13 min readLW link

Announcing the PIBBSS Symposium ’24!

DusanDNesic and clem_acs

3 Sep 2024 11:19 UTC

19 points

0 comments3 min readLW link

Reducing global AI competition through the Commerce Control List and Immigration reform: a dual-pronged approach

Ben Smith3 Sep 2024 5:28 UTC

16 points

2 comments1 min readLW link

How I got 4.2M YouTube views without making a single video

Closed Limelike Curves3 Sep 2024 3:52 UTC

376 points

36 comments1 min readLW link

Duped: AI and the Making of a Global Suicide Cult

izzyness2 Sep 2024 18:51 UTC

−8 points

0 comments1 min readLW link

A gentle introduction to sparse autoencoders

Nick Jiang2 Sep 2024 18:11 UTC

9 points

0 comments6 min readLW link

What makes math problems hard for reinforcement learning: a case study

Anibal, Bartek, Sergei, Shehper and Piotr2 Sep 2024 18:11 UTC

1 point

0 comments2 min readLW link

(arxiv.org)

Survey: How Do Elite Chinese Students Feel About the Risks of AI?

Nick Corvino2 Sep 2024 18:11 UTC

141 points

13 comments10 min readLW link

Data-driven donations to help Democrats win federal elections: an update

Michael Cohn2 Sep 2024 16:32 UTC

−1 points

2 comments1 min readLW link

(perplexedguide.net)

[Question] What are the effective utilitarian pros and cons of having children (in rich countries)?

SpectrumDT2 Sep 2024 10:01 UTC

2 points

4 comments1 min readLW link

My decomposition of the alignment problem

Daniel C2 Sep 2024 0:21 UTC

20 points

22 comments13 min readLW link

DC Forecasting & Prediction Markets Meetup

David Glidden2 Sep 2024 0:00 UTC

1 point

0 comments1 min readLW link

A primer on the next generation of antibodies

Abhishaike Mahajan1 Sep 2024 22:37 UTC

25 points

0 comments19 min readLW link

(www.owlposting.com)

[Question] Who looked into extreme nuclear meltdowns?

Remmelt1 Sep 2024 21:38 UTC

2 points

8 comments1 min readLW link

Redundant Attention Heads in Large Language Models For In Context Learning

skunnavakkam1 Sep 2024 20:08 UTC

7 points

1 comment4 min readLW link

(skunnavakkam.github.io)

The Role of Transparency and Explainability in Responsible NLP

RAMEBC781 Sep 2024 20:08 UTC

−3 points

1 comment5 min readLW link