Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
“You’re the most beautiful girl in the world” and Wittgensteinian Language Games
Chris_Leong
20 Apr 2024 14:54 UTC
6
points
17
comments
1
min read
LW
link
Searching for Search
NicholasKees
and
janus
28 Nov 2022 15:31 UTC
91
points
8
comments
14
min read
LW
link
1
review
So What’s Up With PUFAs Chemically?
J Bostock
27 Apr 2024 13:32 UTC
35
points
16
comments
6
min read
LW
link
Refusal in LLMs is mediated by a single direction
Andy Arditi
,
Oscar Obeso
,
Aaquib111
,
wesg
and
Neel Nanda
27 Apr 2024 11:13 UTC
111
points
24
comments
9
min read
LW
link
Mercy to the Machine: Thoughts & Rights
False Name
27 Apr 2024 16:36 UTC
8
points
5
comments
17
min read
LW
link
D&D.Sci
abstractapplic
5 Dec 2020 23:26 UTC
54
points
46
comments
1
min read
LW
link
ChatGPT understands language
philosophybear
27 Jan 2023 7:14 UTC
26
points
4
comments
6
min read
LW
link
(philosophybear.substack.com)
Magic by forgetting
avturchin
24 Apr 2024 14:32 UTC
18
points
19
comments
4
min read
LW
link
Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai
16 Apr 2024 21:16 UTC
304
points
64
comments
12
min read
LW
link
[Question]
Examples of Highly Counterfactual Discoveries?
johnswentworth
23 Apr 2024 22:19 UTC
156
points
83
comments
1
min read
LW
link
Losing Faith In Contrarianism
omnizoid
25 Apr 2024 20:53 UTC
30
points
36
comments
5
min read
LW
link
D&D.Sci Long War: Defender of Data-mocracy
aphyer
26 Apr 2024 22:30 UTC
38
points
8
comments
3
min read
LW
link
Sparsify: A mechanistic interpretability research agenda
Lee Sharkey
3 Apr 2024 12:34 UTC
85
points
22
comments
22
min read
LW
link
Voting Theory Introduction
Scott Garrabrant
17 Oct 2022 8:48 UTC
81
points
8
comments
6
min read
LW
link
A list of core AI safety problems and how I hope to solve them
davidad
26 Aug 2023 15:12 UTC
161
points
24
comments
5
min read
LW
link
Propagating Facts into Aesthetics
Raemon
19 Dec 2019 4:09 UTC
115
points
36
comments
11
min read
LW
link
1
review
Changes in College Admissions
Zvi
24 Apr 2024 13:50 UTC
54
points
10
comments
39
min read
LW
link
(thezvi.wordpress.com)
On Not Pulling The Ladder Up Behind You
Screwtape
26 Apr 2024 21:58 UTC
95
points
5
comments
9
min read
LW
link
D&D.Sci Evaluation and Ruleset
abstractapplic
12 Dec 2020 15:00 UTC
46
points
12
comments
3
min read
LW
link
This is Water by David Foster Wallace
Nathan Young
24 Apr 2024 21:21 UTC
53
points
15
comments
13
min read
LW
link
(fs.blog)
Back to top
Next