RSS

Mark Xu

Karma: 4,449

I do alignment research at the Alignment Research Center. Learn more about me at markxu.com/​about

Es­ti­mat­ing Tail Risk in Neu­ral Networks

Mark XuSep 13, 2024, 8:00 PM
68 points
9 comments23 min readLW link
(www.alignment.org)

Back­doors as an anal­ogy for de­cep­tive alignment

Sep 6, 2024, 3:30 PM
104 points
2 comments8 min readLW link
(www.alignment.org)

If you weren’t such an idiot...

Mar 2, 2024, 12:01 AM
156 points
74 comments2 min readLW link
(markxu.com)

ARC is hiring the­o­ret­i­cal researchers

Jun 12, 2023, 6:50 PM
126 points
12 comments4 min readLW link
(www.alignment.org)

How to do the­o­ret­i­cal re­search, a per­sonal perspective

Mark XuAug 19, 2022, 7:41 PM
91 points
6 comments15 min readLW link

ELK prize results

Mar 9, 2022, 12:01 AM
138 points
50 comments21 min readLW link

ELK First Round Con­test Winners

Jan 26, 2022, 2:56 AM
65 points
6 comments1 min readLW link

ARC’s first tech­ni­cal re­port: Elic­it­ing La­tent Knowledge

Dec 14, 2021, 8:09 PM
228 points
90 comments1 min readLW link3 reviews
(docs.google.com)

ARC is hiring!

Dec 14, 2021, 8:09 PM
64 points
2 comments1 min readLW link

Your Time Might Be More Valuable Than You Think

Mark XuOct 18, 2021, 12:55 AM
57 points
10 comments6 min readLW link
(markxu.com)

The Si­mu­la­tion Hy­poth­e­sis Un­der­cuts the SIA/​Great Filter Dooms­day Argument

Oct 1, 2021, 10:23 PM
43 points
11 comments7 min readLW link

Frac­tional progress es­ti­mates for AI timelines and im­plied re­source requirements

Jul 15, 2021, 6:43 PM
55 points
6 comments7 min readLW link

In­ter­mit­tent Distil­la­tions #4: Semi­con­duc­tors, Eco­nomics, In­tel­li­gence, and Tech­nolog­i­cal Progress.

Mark XuJul 8, 2021, 10:14 PM
81 points
9 comments10 min readLW link

An­thropic Effects in Es­ti­mat­ing Evolu­tion Difficulty

Mark XuJul 5, 2021, 4:02 AM
13 points
2 comments3 min readLW link

An In­tu­itive Guide to Garrabrant Induction

Mark XuJun 3, 2021, 10:21 PM
149 points
20 comments24 min readLW link

Rogue AGI Em­bod­ies Valuable In­tel­lec­tual Property

Jun 3, 2021, 8:37 PM
71 points
9 comments3 min readLW link

In­ter­mit­tent Distil­la­tions #3

Mark XuMay 15, 2021, 7:13 AM
21 points
1 comment11 min readLW link

Pre-Train­ing + Fine-Tun­ing Fa­vors Deception

Mark XuMay 8, 2021, 6:36 PM
27 points
3 comments3 min readLW link

Less Real­is­tic Tales of Doom

Mark XuMay 6, 2021, 11:01 PM
113 points
13 comments4 min readLW link

Agents Over Carte­sian World Models

Apr 27, 2021, 2:06 AM
67 points
4 comments27 min readLW link