Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Underspecified Probabilities: A Thought Experiment
lunatic_at_large
Oct 4, 2023, 10:25 PM
8
points
4
comments
2
min read
LW
link
Fraternal Birth Order Effect and the Maternal Immune Hypothesis
Bucky
Oct 4, 2023, 9:18 PM
20
points
1
comment
2
min read
LW
link
How to solve deception and still fail.
Charlie Steiner
Oct 4, 2023, 7:56 PM
40
points
7
comments
6
min read
LW
link
PortAudio M1 Latency
jefftk
Oct 4, 2023, 7:10 PM
8
points
5
comments
1
min read
LW
link
(www.jefftk.com)
Open Philanthropy is hiring for multiple roles across our Global Catastrophic Risks teams
aarongertler
Oct 4, 2023, 6:04 PM
6
points
0
comments
3
min read
LW
link
(forum.effectivealtruism.org)
Safeguarding Humanity: Ensuring AI Remains a Servant, Not a Master
kgldeshapriya
Oct 4, 2023, 5:52 PM
−20
points
2
comments
2
min read
LW
link
The 5 Pillars of Happiness
Gabi QUENE
Oct 4, 2023, 5:50 PM
−24
points
5
comments
5
min read
LW
link
[Question]
Using Reinforcement Learning to try to control the heating of a building (district heating)
Tony Karlsson
Oct 4, 2023, 5:47 PM
3
points
5
comments
1
min read
LW
link
rationalistic probability(litterally just throwing shit out there)
NotaSprayer ASprayer
Oct 4, 2023, 5:46 PM
−30
points
8
comments
2
min read
LW
link
AISN #23: New OpenAI Models, News from Anthropic, and Representation Engineering
aogara
and
Dan H
Oct 4, 2023, 5:37 PM
15
points
2
comments
5
min read
LW
link
(newsletter.safe.ai)
I don’t find the lie detection results that surprising (by an author of the paper)
JanB
Oct 4, 2023, 5:10 PM
97
points
8
comments
3
min read
LW
link
[Question]
What evidence is there of LLM’s containing world models?
Chris_Leong
Oct 4, 2023, 2:33 PM
17
points
17
comments
1
min read
LW
link
Entanglement and intuition about words and meaning
Bill Benzon
Oct 4, 2023, 2:16 PM
4
points
0
comments
2
min read
LW
link
Why a Mars colony would lead to a first strike situation
Remmelt
Oct 4, 2023, 11:29 AM
−59
points
8
comments
1
min read
LW
link
(mflb.com)
[Question]
What are some examples of AIs instantiating the ‘nearest unblocked strategy problem’?
EJT
Oct 4, 2023, 11:05 AM
6
points
4
comments
1
min read
LW
link
Graphical tensor notation for interpretability
Jordan Taylor
Oct 4, 2023, 8:04 AM
140
points
11
comments
19
min read
LW
link
[Link] Bay Area Winter Solstice 2023
tcheasdfjkl
and
TheSkeward
Oct 4, 2023, 2:19 AM
18
points
3
comments
1
min read
LW
link
(fb.me)
[Question]
Who determines whether an alignment proposal is the definitive alignment solution?
MiguelDev
Oct 3, 2023, 10:39 PM
−1
points
6
comments
1
min read
LW
link
AXRP Episode 25 - Cooperative AI with Caspar Oesterheld
DanielFilan
Oct 3, 2023, 9:50 PM
43
points
0
comments
92
min read
LW
link
When to Get the Booster?
jefftk
Oct 3, 2023, 9:00 PM
50
points
15
comments
2
min read
LW
link
(www.jefftk.com)
OpenAI-Microsoft partnership
Zach Stein-Perlman
Oct 3, 2023, 8:01 PM
51
points
19
comments
1
min read
LW
link
[Question]
Current AI safety techniques?
Zach Stein-Perlman
Oct 3, 2023, 7:30 PM
30
points
2
comments
2
min read
LW
link
Testing and Automation for Intelligent Systems.
Sai Kiran Kammari
Oct 3, 2023, 5:51 PM
−13
points
0
comments
1
min read
LW
link
(resource-cms.springernature.com)
Metaculus Announces Forecasting Tournament to Evaluate Focused Research Organizations, in Partnership With the Federation of American Scientists
ChristianWilliams
Oct 3, 2023, 4:44 PM
13
points
0
comments
1
min read
LW
link
(www.metaculus.com)
What would it mean to understand how a large language model (LLM) works? Some quick notes.
Bill Benzon
Oct 3, 2023, 3:11 PM
20
points
4
comments
8
min read
LW
link
[Question]
Potential alignment targets for a sovereign superintelligent AI
Paul Colognese
Oct 3, 2023, 3:09 PM
29
points
4
comments
1
min read
LW
link
Monthly Roundup #11: October 2023
Zvi
Oct 3, 2023, 2:10 PM
42
points
12
comments
35
min read
LW
link
(thezvi.wordpress.com)
Why We Use Money? - A Walrasian View
Savio Coelho
Oct 3, 2023, 12:02 PM
4
points
3
comments
8
min read
LW
link
Mech Interp Challenge: October—Deciphering the Sorted List Model
CallumMcDougall
Oct 3, 2023, 10:57 AM
23
points
0
comments
3
min read
LW
link
Early Experiments in Reward Model Interpretation Using Sparse Autoencoders
lukemarks
,
Amirali Abdullah
,
Rauno Arike
,
Fazl
and
nothoughtsheadempty
Oct 3, 2023, 7:45 AM
17
points
0
comments
5
min read
LW
link
Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”
Miles Turpin
Oct 3, 2023, 2:22 AM
31
points
0
comments
9
min read
LW
link
My Mid-Career Transition into Biosecurity
jefftk
Oct 2, 2023, 9:20 PM
26
points
4
comments
2
min read
LW
link
(www.jefftk.com)
Dall-E 3
p.b.
Oct 2, 2023, 8:33 PM
37
points
9
comments
1
min read
LW
link
(openai.com)
Thomas Kwa’s MIRI research experience
Thomas Kwa
,
peterbarnett
,
Vivek Hebbar
,
Jeremy Gillen
,
jacobjacob
and
Raemon
Oct 2, 2023, 4:42 PM
172
points
53
comments
1
min read
LW
link
Population After a Catastrophe
Stan Pinsent
Oct 2, 2023, 4:06 PM
3
points
5
comments
14
min read
LW
link
Expectations for Gemini: hopefully not a big deal
Maxime Riché
Oct 2, 2023, 3:38 PM
15
points
5
comments
1
min read
LW
link
A counterexample for measurable factor spaces
Matthias G. Mayer
Oct 2, 2023, 3:16 PM
14
points
0
comments
3
min read
LW
link
Will early transformative AIs primarily use text? [Manifold question]
Fabien Roger
Oct 2, 2023, 3:05 PM
16
points
0
comments
3
min read
LW
link
energy landscapes of experts
bhauth
Oct 2, 2023, 2:08 PM
45
points
2
comments
3
min read
LW
link
(www.bhauth.com)
Direction of Fit
NicholasKees
Oct 2, 2023, 12:34 PM
34
points
0
comments
3
min read
LW
link
The 99% principle for personal problems
Kaj_Sotala
Oct 2, 2023, 8:20 AM
135
points
20
comments
2
min read
LW
link
(kajsotala.fi)
Linkpost: They Studied Dishonesty. Was Their Work a Lie?
Linch
Oct 2, 2023, 8:10 AM
91
points
12
comments
2
min read
LW
link
(www.newyorker.com)
Why I got the smallpox vaccine in 2023
joec
Oct 2, 2023, 5:11 AM
25
points
6
comments
4
min read
LW
link
Instrumental Convergence and human extinction.
Spiritus Dei
Oct 2, 2023, 12:41 AM
−10
points
3
comments
7
min read
LW
link
Revisiting the Manifold Hypothesis
Aidan Rocke
Oct 1, 2023, 11:55 PM
13
points
19
comments
4
min read
LW
link
AI Alignment Breakthroughs this Week [new substack]
Logan Zoellner
Oct 1, 2023, 10:13 PM
0
points
8
comments
2
min read
LW
link
[Question]
Looking for study
Robert Feinstein
Oct 1, 2023, 7:52 PM
4
points
0
comments
1
min read
LW
link
Join AISafety.info’s Distillation Hackathon (Oct 6-9th)
smallsilo
Oct 1, 2023, 6:43 PM
21
points
0
comments
2
min read
LW
link
(forum.effectivealtruism.org)
Fifty Flips
abstractapplic
Oct 1, 2023, 3:30 PM
32
points
15
comments
1
min read
LW
link
1
review
(h-b-p.github.io)
AI Safety Impact Markets: Your Charity Evaluator for AI Safety
Dawn Drescher
Oct 1, 2023, 10:47 AM
16
points
5
comments
1
min read
LW
link
(impactmarkets.substack.com)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel