All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 101112 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Acting Normal is Good, Actually

Gordon Seidoh Worley10 Feb 2023 23:35 UTC

14 points

5 comments3 min readLW link

[S] D&D.Sci: All the D8a. Allllllll of it.

aphyer10 Feb 2023 21:14 UTC

43 points

17 comments6 min readLW link

A Different Kind of Ark: My failed attempt to build a bridge between universes

ChrisM10 Feb 2023 20:49 UTC

2 points

2 comments6 min readLW link

(www.vesselproject.io)

Prizes for the 2021 Review

Raemon10 Feb 2023 19:47 UTC

69 points

2 comments4 min readLW link

A proposed method for forecasting transformative AI

Matthew Barnett10 Feb 2023 19:34 UTC

121 points

21 comments10 min readLW link

The best way so far to explain AI risk: The Precipice (p. 137-149)

trevor10 Feb 2023 19:33 UTC

50 points

2 comments17 min readLW link

Is this a weak pivotal act: creating nanobots that eat evil AGIs (but nothing else)?

Christopher King10 Feb 2023 19:26 UTC

0 points

3 comments1 min readLW link

Why I’m not working on {debate, RRM, ELK, natural abstractions}

Steven Byrnes10 Feb 2023 19:22 UTC

71 points

19 comments9 min readLW link

Conditioning Predictive Models: Open problems, Conclusion, and Appendix

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

10 Feb 2023 19:21 UTC

36 points

3 comments11 min readLW link

Jobs that can help with the most important century

HoldenKarnofsky10 Feb 2023 18:20 UTC

24 points

0 comments19 min readLW link

(www.cold-takes.com)

[Question] Is it a coincidence that GPT-3 requires roughly the same amount of compute as is necessary to emulate the human brain?

RomanS10 Feb 2023 16:26 UTC

11 points

10 comments1 min readLW link

Contra: Changing Role Terms

jefftk10 Feb 2023 15:00 UTC

8 points

0 comments3 min readLW link

(www.jefftk.com)

Cyborgism

NicholasKees and janus

10 Feb 2023 14:47 UTC

337 points

46 comments35 min readLW link

FLI Podcast: Connor Leahy on AI Progress, Chimps, Memes, and Markets (Part 1/3)

remember and Andrea_Miotti

10 Feb 2023 13:55 UTC

39 points

0 comments43 min readLW link

Many important technologies start out as science fiction before becoming real

trevor10 Feb 2023 9:36 UTC

28 points

2 comments2 min readLW link

[Question] What’s actually going on in the “mind” of the model when we fine-tune GPT-3 to InstructGPT?

rpglover6410 Feb 2023 7:57 UTC

18 points

3 comments1 min readLW link

Mechanism Design for AI Safety—Agenda Creation Retreat

Rubi J. Hudson10 Feb 2023 3:05 UTC

24 points

2 comments1 min readLW link

[Question] On utility functions

jodaru10 Feb 2023 1:22 UTC

11 points

10 comments1 min readLW link

Security Mindset—Fire Alarms and Trigger Signatures

elspood9 Feb 2023 21:15 UTC

23 points

0 comments4 min readLW link

Impostor syndrome: how to cure it with spreadsheets and meditation

KatWoods9 Feb 2023 21:04 UTC

30 points

2 comments19 min readLW link

Conditioning Predictive Models: Deployment strategy

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

9 Feb 2023 20:59 UTC

28 points

0 comments10 min readLW link

Make Conflict of Interest Policies Public

jefftk9 Feb 2023 19:30 UTC

33 points

7 comments2 min readLW link

(www.jefftk.com)

Curated blind auction prediction markets and a reputation system as an alternative to editorial review in news publication.

ciaran 9 Feb 2023 18:48 UTC

2 points

0 comments2 min readLW link

Tools for finding information on the internet

RomanHauksson9 Feb 2023 17:05 UTC

79 points

11 comments2 min readLW link

(roman.computer)

Covid 2/9/23: Interferon λ

Zvi9 Feb 2023 16:50 UTC

48 points

8 comments12 min readLW link

(thezvi.wordpress.com)

EIS II: What is “Interpretability”?

scasper9 Feb 2023 16:48 UTC

28 points

6 comments4 min readLW link

The Engineer’s Interpretability Sequence (EIS) I: Intro

scasper9 Feb 2023 16:28 UTC

46 points

24 comments3 min readLW link

[Question] Do the Safety Properties of Powerful AI Systems Need to be Adversarially Robust? Why?

DragonGod9 Feb 2023 13:36 UTC

22 points

42 comments2 min readLW link

Which ML skills are useful for finding a new AIS research agenda?

Yonatan Cale9 Feb 2023 13:09 UTC

16 points

1 comment1 min readLW link

When To Stop

Alok Singh9 Feb 2023 9:10 UTC

31 points

5 comments1 min readLW link

(alok.github.io)

The Pervasive Illusion of Seeing the Complete World

Shmi9 Feb 2023 6:47 UTC

38 points

1 comment2 min readLW link

Religion is Good, Actually

Gordon Seidoh Worley9 Feb 2023 6:34 UTC

−1 points

39 comments4 min readLW link

Using PICT against PastaGPT Jailbreaking

Quentin FEUILLADE--MONTIXI9 Feb 2023 4:30 UTC

17 points

0 comments9 min readLW link

Notes on the Mathematics of LLM Architectures

carboniferous_umbraculum 9 Feb 2023 1:45 UTC

13 points

2 comments1 min readLW link

(drive.google.com)

On Developing a Mathematical Theory of Interpretability

carboniferous_umbraculum 9 Feb 2023 1:45 UTC

64 points

8 comments6 min readLW link

Anomalous tokens reveal the original identities of Instruct models

janus and jdp

9 Feb 2023 1:30 UTC

139 points

16 comments9 min readLW link

(generative.ink)

[Question] How would you use video gamey tech to help with AI safety?

porby9 Feb 2023 0:20 UTC

9 points

5 comments1 min readLW link

A (EtA: quick) note on terminology: AI Alignment != AI x-safety

David Scott Krueger (formerly: capybaralet)8 Feb 2023 22:33 UTC

46 points

20 comments1 min readLW link

GPT-175bee

Adam Scherlis and LawrenceC

8 Feb 2023 18:58 UTC

121 points

14 comments1 min readLW link

EigenKarma: trust at scale

Henrik Karlsson8 Feb 2023 18:52 UTC

186 points

52 comments5 min readLW link

Conditioning Predictive Models: Interactions with other approaches

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

8 Feb 2023 18:19 UTC

32 points

2 comments11 min readLW link

Wanted: Technical animator and/or front-end developer for interactive diagrams of invention

jasoncrawford8 Feb 2023 17:14 UTC

30 points

3 comments1 min readLW link

(rootsofprogress.org)

A multi-disciplinary view on AI safety research

Roman Leventov8 Feb 2023 16:50 UTC

43 points

4 comments26 min readLW link

Community building: Lessons from ten years of facilitation experience

Severin T. Seehrich8 Feb 2023 16:26 UTC

17 points

0 comments1 min readLW link

Progress links and tweets, 2023-02-08

jasoncrawford8 Feb 2023 15:52 UTC

10 points

0 comments1 min readLW link

(rootsofprogress.org)

A Particular Equilibrium

Algon8 Feb 2023 15:16 UTC

13 points

0 comments2 min readLW link

(algon-33.github.io)

Self-Awareness (and possible mode collapse around it) in ChatGPT

Yitz8 Feb 2023 9:57 UTC

18 points

2 comments2 min readLW link

Drugs are Sometimes Good, Actually

Gordon Seidoh Worley8 Feb 2023 2:24 UTC

12 points

8 comments4 min readLW link

House Covid Infection Retrospective

jefftk8 Feb 2023 2:20 UTC

25 points

1 comment2 min readLW link

(www.jefftk.com)

Noting an error in Inadequate Equilibria

Matthew Barnett8 Feb 2023 1:33 UTC

364 points

60 comments2 min readLW link 2 reviews