RSS

Has Diagram

TagLast edit: 29 Apr 2023 22:52 UTC by Gunnar_Zarncke

This tag is used to indicate that the post contains diagrams. This may be useful to quickly find such posts, or to exclude them in case you are visually impaired.

What are the re­sults of more parental su­per­vi­sion and less out­door play?

juliawise25 Nov 2023 12:52 UTC
221 points
31 comments5 min readLW link

Us­ing axis lines for good or evil

dynomight6 Mar 2024 14:47 UTC
150 points
39 comments4 min readLW link
(dynomight.net)

Neu­ral Categories

Eliezer Yudkowsky10 Feb 2008 0:33 UTC
61 points
17 comments4 min readLW link

The lat­tice of par­tial updatelessness

Martín Soto10 Feb 2024 17:34 UTC
21 points
5 comments5 min readLW link

De­mys­tify­ing “Align­ment” through a Comic

milanrosko9 Jun 2024 8:24 UTC
106 points
19 comments1 min readLW link

Shard The­ory—is it true for hu­mans?

Rishika14 Jun 2024 19:21 UTC
68 points
7 comments15 min readLW link

Towards a Less Bul­lshit Model of Semantics

17 Jun 2024 15:51 UTC
94 points
44 comments21 min readLW link

How good are LLMs at do­ing ML on an un­known dataset?

Håvard Tveit Ihle1 Jul 2024 9:04 UTC
33 points
4 comments13 min readLW link

[In­tro to brain-like-AGI safety] 4. The “short-term pre­dic­tor”

Steven Byrnes16 Feb 2022 13:12 UTC
64 points
11 comments13 min readLW link

An In­tro­duc­tion To The Man­delbrot Set That Doesn’t Men­tion Com­plex Numbers

Yitz17 Jan 2024 9:48 UTC
82 points
11 comments9 min readLW link

An Illus­trated Proof of the No Free Lunch Theorem

lifelonglearner8 Jun 2020 1:54 UTC
19 points
0 comments1 min readLW link
(mlu.red)

Cor­rigi­bil­ity, Much more de­tail than any­one wants to Read

Logan Zoellner7 May 2023 1:02 UTC
26 points
2 comments7 min readLW link

How much do you be­lieve your re­sults?

Eric Neyman6 May 2023 20:31 UTC
467 points
14 comments15 min readLW link
(ericneyman.wordpress.com)

Resi­d­ual stream norms grow ex­po­nen­tially over the for­ward pass

7 May 2023 0:46 UTC
76 points
24 comments11 min readLW link

Be­ing the (Pareto) Best in the World

johnswentworth24 Jun 2019 18:36 UTC
444 points
59 comments3 min readLW link3 reviews

Hyperpolation

Gunnar_Zarncke15 Sep 2024 21:37 UTC
22 points
6 comments1 min readLW link
(arxiv.org)

The case for a nega­tive al­ign­ment tax

18 Sep 2024 18:33 UTC
79 points
20 comments7 min readLW link

Ma­chine Learn­ing Anal­ogy for Med­i­ta­tion (illus­trated)

abramdemski28 Jun 2018 22:51 UTC
100 points
48 comments1 min readLW link

How might we solve the al­ign­ment prob­lem? (Part 1: In­tro, sum­mary, on­tol­ogy)

Joe Carlsmith28 Oct 2024 21:57 UTC
51 points
5 comments32 min readLW link

I turned de­ci­sion the­ory prob­lems into memes about trolleys

Tapatakt30 Oct 2024 20:13 UTC
103 points
20 comments1 min readLW link

The Car­toon Guide to Löb’s Theorem

Eliezer Yudkowsky17 Aug 2008 20:35 UTC
44 points
104 comments1 min readLW link

[In­tro to brain-like-AGI safety] 10. The al­ign­ment problem

Steven Byrnes30 Mar 2022 13:24 UTC
48 points
7 comments19 min readLW link

[In­tro to brain-like-AGI safety] 12. Two paths for­ward: “Con­trol­led AGI” and “So­cial-in­stinct AGI”

Steven Byrnes20 Apr 2022 12:58 UTC
44 points
10 comments15 min readLW link

Draw­ing Less Wrong: Tech­ni­cal Skill

Raemon5 Dec 2011 5:12 UTC
37 points
36 comments9 min readLW link

All images from the WaitButWhy se­quence on AI

trevor8 Apr 2023 7:36 UTC
73 points
5 comments2 min readLW link

The Nat­u­ral Ab­strac­tion Hy­poth­e­sis: Im­pli­ca­tions and Evidence

CallumMcDougall14 Dec 2021 23:14 UTC
39 points
9 comments19 min readLW link

Test­ing The Nat­u­ral Ab­strac­tion Hy­poth­e­sis: Pro­ject Update

johnswentworth20 Sep 2021 3:44 UTC
87 points
17 comments8 min readLW link1 review

Open tech­ni­cal prob­lem: A Quinean proof of Löb’s the­o­rem, for an eas­ier car­toon guide

Andrew_Critch24 Nov 2022 21:16 UTC
58 points
35 comments3 min readLW link1 review

[In­tro to brain-like-AGI safety] 5. The “long-term pre­dic­tor”, and TD learning

Steven Byrnes23 Feb 2022 14:44 UTC
52 points
27 comments20 min readLW link

[In­tro to brain-like-AGI safety] 6. Big pic­ture of mo­ti­va­tion, de­ci­sion-mak­ing, and RL

Steven Byrnes2 Mar 2022 15:26 UTC
68 points
17 comments16 min readLW link

[In­tro to brain-like-AGI safety] 7. From hard­coded drives to fore­sighted plans: A worked example

Steven Byrnes9 Mar 2022 14:28 UTC
78 points
0 comments10 min readLW link

[In­tro to brain-like-AGI safety] 8. Take­aways from neuro 1/​2: On AGI development

Steven Byrnes16 Mar 2022 13:59 UTC
57 points
2 comments14 min readLW link

[In­tro to brain-like-AGI safety] 9. Take­aways from neuro 2/​2: On AGI motivation

Steven Byrnes23 Mar 2022 12:48 UTC
44 points
11 comments22 min readLW link

[In­tro to brain-like-AGI safety] 13. Sym­bol ground­ing & hu­man so­cial instincts

Steven Byrnes27 Apr 2022 13:30 UTC
69 points
15 comments15 min readLW link

[In­tro to brain-like-AGI safety] 14. Con­trol­led AGI

Steven Byrnes11 May 2022 13:17 UTC
41 points
25 comments20 min readLW link

[In­tro to brain-like-AGI safety] 1. What’s the prob­lem & Why work on it now?

Steven Byrnes26 Jan 2022 15:23 UTC
155 points
19 comments26 min readLW link

[In­tro to brain-like-AGI safety] 2. “Learn­ing from scratch” in the brain

Steven Byrnes2 Feb 2022 13:22 UTC
59 points
12 comments25 min readLW link

[In­tro to brain-like-AGI safety] 3. Two sub­sys­tems: Learn­ing & Steering

Steven Byrnes9 Feb 2022 13:09 UTC
95 points
3 comments25 min readLW link

[Valence se­ries] 4. Valence & So­cial Sta­tus (de­p­re­cated)

Steven Byrnes15 Dec 2023 14:24 UTC
35 points
19 comments11 min readLW link

Bayes’ The­o­rem Illus­trated (My Way)

komponisto3 Jun 2010 4:40 UTC
171 points
195 comments9 min readLW link

In­duc­tion heads—illustrated

CallumMcDougall2 Jan 2023 15:35 UTC
111 points
9 comments3 min readLW link

Levels of goals and alignment

zeshen16 Sep 2022 16:44 UTC
27 points
4 comments6 min readLW link

A new­comer’s guide to the tech­ni­cal AI safety field

zeshen4 Nov 2022 14:29 UTC
42 points
3 comments10 min readLW link

Embed­ding safety in ML development

zeshen31 Oct 2022 12:27 UTC
24 points
1 comment18 min readLW link