Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Mark Xu
Karma:
4,327
I do alignment research at the Alignment Research Center. Learn more about me at
markxu.com/about
All
Posts
Comments
New
Top
Old
Page
1
Estimating Tail Risk in Neural Networks
Mark Xu
13 Sep 2024 20:00 UTC
68
points
9
comments
23
min read
LW
link
(www.alignment.org)
Backdoors as an analogy for deceptive alignment
Jacob_Hilton
and
Mark Xu
6 Sep 2024 15:30 UTC
104
points
2
comments
8
min read
LW
link
(www.alignment.org)
If you weren’t such an idiot...
kave
and
Mark Xu
2 Mar 2024 0:01 UTC
147
points
74
comments
2
min read
LW
link
(markxu.com)
ARC is hiring theoretical researchers
paulfchristiano
,
Jacob_Hilton
and
Mark Xu
12 Jun 2023 18:50 UTC
126
points
12
comments
4
min read
LW
link
(www.alignment.org)
How to do theoretical research, a personal perspective
Mark Xu
19 Aug 2022 19:41 UTC
91
points
6
comments
15
min read
LW
link
ELK prize results
paulfchristiano
and
Mark Xu
9 Mar 2022 0:01 UTC
138
points
50
comments
21
min read
LW
link
ELK First Round Contest Winners
Mark Xu
and
paulfchristiano
26 Jan 2022 2:56 UTC
65
points
6
comments
1
min read
LW
link
ARC’s first technical report: Eliciting Latent Knowledge
paulfchristiano
,
Mark Xu
and
Ajeya Cotra
14 Dec 2021 20:09 UTC
225
points
90
comments
1
min read
LW
link
3
reviews
(docs.google.com)
ARC is hiring!
paulfchristiano
and
Mark Xu
14 Dec 2021 20:09 UTC
64
points
2
comments
1
min read
LW
link
Your Time Might Be More Valuable Than You Think
Mark Xu
18 Oct 2021 0:55 UTC
56
points
10
comments
6
min read
LW
link
(markxu.com)
The Simulation Hypothesis Undercuts the SIA/Great Filter Doomsday Argument
Mark Xu
and
CarlShulman
1 Oct 2021 22:23 UTC
43
points
11
comments
7
min read
LW
link
Fractional progress estimates for AI timelines and implied resource requirements
Mark Xu
and
CarlShulman
15 Jul 2021 18:43 UTC
55
points
6
comments
7
min read
LW
link
Intermittent Distillations #4: Semiconductors, Economics, Intelligence, and Technological Progress.
Mark Xu
8 Jul 2021 22:14 UTC
81
points
9
comments
10
min read
LW
link
Anthropic Effects in Estimating Evolution Difficulty
Mark Xu
5 Jul 2021 4:02 UTC
12
points
2
comments
3
min read
LW
link
An Intuitive Guide to Garrabrant Induction
Mark Xu
3 Jun 2021 22:21 UTC
145
points
20
comments
24
min read
LW
link
Rogue AGI Embodies Valuable Intellectual Property
Mark Xu
and
CarlShulman
3 Jun 2021 20:37 UTC
71
points
9
comments
3
min read
LW
link
Intermittent Distillations #3
Mark Xu
15 May 2021 7:13 UTC
21
points
1
comment
11
min read
LW
link
Pre-Training + Fine-Tuning Favors Deception
Mark Xu
8 May 2021 18:36 UTC
27
points
3
comments
3
min read
LW
link
Less Realistic Tales of Doom
Mark Xu
6 May 2021 23:01 UTC
113
points
13
comments
4
min read
LW
link
Agents Over Cartesian World Models
Mark Xu
and
evhub
27 Apr 2021 2:06 UTC
66
points
4
comments
27
min read
LW
link
Back to top
Next