RSS

Mark Xu

Karma: 4,327

I do alignment research at the Alignment Research Center. Learn more about me at markxu.com/​about

Es­ti­mat­ing Tail Risk in Neu­ral Networks

Mark Xu13 Sep 2024 20:00 UTC
68 points
9 comments23 min readLW link
(www.alignment.org)

Back­doors as an anal­ogy for de­cep­tive alignment

6 Sep 2024 15:30 UTC
104 points
2 comments8 min readLW link
(www.alignment.org)

If you weren’t such an idiot...

2 Mar 2024 0:01 UTC
147 points
74 comments2 min readLW link
(markxu.com)

ARC is hiring the­o­ret­i­cal researchers

12 Jun 2023 18:50 UTC
126 points
12 comments4 min readLW link
(www.alignment.org)

How to do the­o­ret­i­cal re­search, a per­sonal perspective

Mark Xu19 Aug 2022 19:41 UTC
91 points
6 comments15 min readLW link

ELK prize results

9 Mar 2022 0:01 UTC
138 points
50 comments21 min readLW link

ELK First Round Con­test Winners

26 Jan 2022 2:56 UTC
65 points
6 comments1 min readLW link

ARC’s first tech­ni­cal re­port: Elic­it­ing La­tent Knowledge

14 Dec 2021 20:09 UTC
225 points
90 comments1 min readLW link3 reviews
(docs.google.com)

ARC is hiring!

14 Dec 2021 20:09 UTC
64 points
2 comments1 min readLW link

Your Time Might Be More Valuable Than You Think

Mark Xu18 Oct 2021 0:55 UTC
56 points
10 comments6 min readLW link
(markxu.com)

The Si­mu­la­tion Hy­poth­e­sis Un­der­cuts the SIA/​Great Filter Dooms­day Argument

1 Oct 2021 22:23 UTC
43 points
11 comments7 min readLW link

Frac­tional progress es­ti­mates for AI timelines and im­plied re­source requirements

15 Jul 2021 18:43 UTC
55 points
6 comments7 min readLW link

In­ter­mit­tent Distil­la­tions #4: Semi­con­duc­tors, Eco­nomics, In­tel­li­gence, and Tech­nolog­i­cal Progress.

Mark Xu8 Jul 2021 22:14 UTC
81 points
9 comments10 min readLW link

An­thropic Effects in Es­ti­mat­ing Evolu­tion Difficulty

Mark Xu5 Jul 2021 4:02 UTC
12 points
2 comments3 min readLW link

An In­tu­itive Guide to Garrabrant Induction

Mark Xu3 Jun 2021 22:21 UTC
145 points
20 comments24 min readLW link

Rogue AGI Em­bod­ies Valuable In­tel­lec­tual Property

3 Jun 2021 20:37 UTC
71 points
9 comments3 min readLW link

In­ter­mit­tent Distil­la­tions #3

Mark Xu15 May 2021 7:13 UTC
21 points
1 comment11 min readLW link

Pre-Train­ing + Fine-Tun­ing Fa­vors Deception

Mark Xu8 May 2021 18:36 UTC
27 points
3 comments3 min readLW link

Less Real­is­tic Tales of Doom

Mark Xu6 May 2021 23:01 UTC
113 points
13 comments4 min readLW link

Agents Over Carte­sian World Models

27 Apr 2021 2:06 UTC
66 points
4 comments27 min readLW link