12 Jul 2024 23:42 UTC

67 points

6 comments2 min readLW link

Consider attending the AI Security Forum ’24, a 1-day pre-DEFCON event

Charlie Rogers-Smith12 Jul 2024 23:01 UTC

21 points

0 comments1 min readLW link

Memorising molecular structures

dkl912 Jul 2024 22:40 UTC

6 points

0 comments2 min readLW link

(dkl9.net)

Robin Hanson AI X-Risk Debate — Highlights and Analysis

Liron12 Jul 2024 21:31 UTC

46 points

7 comments45 min readLW link

(www.youtube.com)

Designing Artificial Wisdom: The Wise Workflow Research Organization

Jordan Arel12 Jul 2024 19:18 UTC

2 points

0 comments8 min readLW link

Whiteboard Pen Magazines are Useful

Johannes C. Mayer12 Jul 2024 17:15 UTC

40 points

8 comments1 min readLW link

Alignment: “Do what I would have wanted you to do”

Oleg Trott12 Jul 2024 16:47 UTC

11 points

48 comments1 min readLW link

Virtue taxation

Dentosal12 Jul 2024 14:56 UTC

9 points

1 comment2 min readLW link

Most smart and skilled people are outside of the EA/rationalist community: an analysis

titotal12 Jul 2024 12:13 UTC

107 points

36 comments1 min readLW link

(open.substack.com)

2024 Freedom Communities Events

Tudor Iliescu12 Jul 2024 8:04 UTC

−6 points

1 comment1 min readLW link

Faithful vs Interpretable Sparse Autoencoder Evals

Louka Ewington-Pitsos12 Jul 2024 5:37 UTC

2 points

0 comments12 min readLW link

Moving away from physical continuity

ProgramCrafter12 Jul 2024 5:05 UTC

2 points

1 comment1 min readLW link

Transformer Circuit Faithfulness Metrics Are Not Robust

Joseph Miller, bilalchughtai and William_S

12 Jul 2024 3:47 UTC

104 points

5 comments7 min readLW link

(arxiv.org)

On Artificial Wisdom

Jordan Arel12 Jul 2024 0:20 UTC

3 points

0 comments14 min readLW link

Yoshua Bengio: Reasoning through arguments against taking AI safety seriously

Judd Rosenblatt11 Jul 2024 23:53 UTC

70 points

3 comments1 min readLW link

(yoshuabengio.org)

Podcast: “How the Smart Money teaches trading with Ricki Heicklen” (Patrick McKenzie interviewing)

rossry11 Jul 2024 22:49 UTC

20 points

2 comments1 min readLW link

(www.complexsystemspodcast.com)

Superbabies: Putting The Pieces Together

sarahconstantin11 Jul 2024 20:40 UTC

215 points

37 comments10 min readLW link

(sarahconstantin.substack.com)

Sherlockian Abduction Master List

Cole Wyeth11 Jul 2024 20:27 UTC

50 points

63 comments33 min readLW link

Thoughts to niplav on lie-detection, truthfwl mechanisms, and wealth-inequality

Emrik and niplav

11 Jul 2024 18:55 UTC

7 points

8 comments11 min readLW link

Games for AI Control

charlie_griffin and Buck

11 Jul 2024 18:40 UTC

43 points

0 comments5 min readLW link

Video Intro to Guaranteed Safe AI

Mike Vaiana, Diogo de Lucena and AE Studio

11 Jul 2024 17:53 UTC

27 points

0 comments1 min readLW link

(youtu.be)

Effective Empathy

Thac011 Jul 2024 15:14 UTC

4 points

1 comment1 min readLW link

AI #72: Denying the Future

Zvi11 Jul 2024 15:00 UTC

45 points

8 comments41 min readLW link

(thezvi.wordpress.com)

The Best Bits From Build, Baby, Build

Maxwell Tabarrok11 Jul 2024 14:09 UTC

13 points

0 comments4 min readLW link

(www.maximum-progress.com)

[Question] What Other Lines of Work are Safe from AI Automation?

RogerDearnaley11 Jul 2024 10:01 UTC

29 points

35 comments5 min readLW link

Decomposing Agency — capabilities without desires

owencb and Raymond D

11 Jul 2024 9:38 UTC

146 points

32 comments12 min readLW link

(strangecities.substack.com)

Reliable Sources: The Story of David Gerard

TracingWoodgrains10 Jul 2024 19:50 UTC

381 points

53 comments43 min readLW link

Managing Emotional Potential Energy

adamShimi10 Jul 2024 18:20 UTC

23 points

4 comments4 min readLW link

(epistemologicalfascinations.substack.com)

[EAForum xpost] A breakdown of OpenAI’s revenue

dschwarz and Lawrence Phillips

10 Jul 2024 18:09 UTC

57 points

5 comments1 min readLW link

(forum.effectivealtruism.org)

Solving Pascal’s Wager using dynamic programming

Paul Wilczewski10 Jul 2024 18:09 UTC

1 point

0 comments5 min readLW link

Fluent, Cruxy Predictions

Raemon10 Jul 2024 18:00 UTC

85 points

14 comments14 min readLW link

Antitrust as Controlled Creative Destruction

Martin Sustrik10 Jul 2024 16:40 UTC

14 points

2 comments2 min readLW link

(250bpm.substack.com)

New page: Integrity

Zach Stein-Perlman10 Jul 2024 15:00 UTC

91 points

3 comments1 min readLW link

AirBnB Baking

jefftk10 Jul 2024 12:50 UTC

7 points

1 comment1 min readLW link

(www.jefftk.com)

DIY RLHF: A simple implementation for hands on experience

Mike Vaiana and AE Studio

10 Jul 2024 12:07 UTC

28 points

0 comments6 min readLW link

Usefulness grounds truth

invertedpassion10 Jul 2024 7:58 UTC

0 points

0 comments4 min readLW link

On passing Complete and Honest Ideological Turing Tests (CHITTs)

Aryeh Englander10 Jul 2024 4:01 UTC

11 points

2 comments1 min readLW link

[Question] Pondering how good or bad things will be in the AGI future

Sherrinford9 Jul 2024 22:46 UTC

11 points

9 comments2 min readLW link

Causal Graphs of GPT-2-Small’s Residual Stream

David Udell9 Jul 2024 22:06 UTC

53 points

7 comments7 min readLW link

[Question] If AI starts to end the world, is suicide a good idea?

IlluminateReality9 Jul 2024 21:53 UTC

0 points

8 comments1 min readLW link

Rationalist Purity Test

Gunnar_Zarncke9 Jul 2024 20:30 UTC

−9 points

5 comments1 min readLW link

(ratpuritytest.com)

That which can be destroyed by the truth, should be assumed to should be destroyed by it

Thac09 Jul 2024 19:39 UTC

5 points

0 comments3 min readLW link

AISN #38: Supreme Court Decision Could Limit Federal Ability to Regulate AI Plus, “Circuit Breakers” for AI systems, and updates on China’s AI industry

Corin Katzke, Alexa Pan, Julius and Dan H

9 Jul 2024 19:28 UTC

5 points

0 comments5 min readLW link

(newsletter.safe.ai)

Summer Tour Stops

jefftk9 Jul 2024 19:10 UTC

10 points

0 comments3 min readLW link

(www.jefftk.com)

Fix simple mistakes in ARC-AGI, etc.

Oleg Trott9 Jul 2024 17:46 UTC

9 points

9 comments1 min readLW link

Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers

Jeffrey Heninger9 Jul 2024 16:50 UTC

42 points

2 comments2 min readLW link

(blog.aiimpacts.org)

UC Berkeley course on LLMs and ML Safety

Dan H9 Jul 2024 15:40 UTC

36 points

1 comment1 min readLW link

(rdi.berkeley.edu)

What and Why: Developmental Interpretability of Reinforcement Learning

Garrett Baker9 Jul 2024 14:09 UTC

67 points

4 comments6 min readLW link

Medical Roundup #3

Zvi9 Jul 2024 13:10 UTC

39 points

4 comments19 min readLW link

(thezvi.wordpress.com)

Consent across power differentials

Ramana Kumar9 Jul 2024 11:42 UTC

50 points

12 comments3 min readLW link