RSS

Jacob_Hilton

Karma: 1,487

Ja­cob_Hil­ton’s Shortform

Jacob_HiltonMay 1, 2025, 12:58 AM
6 points
1 commentLW link

A bird’s eye view of ARC’s research

Jacob_HiltonOct 23, 2024, 3:50 PM
119 points
12 comments7 min readLW link
(www.alignment.org)

Back­doors as an anal­ogy for de­cep­tive alignment

Sep 6, 2024, 3:30 PM
104 points
2 comments8 min readLW link
(www.alignment.org)

For­mal ver­ifi­ca­tion, heuris­tic ex­pla­na­tions and sur­prise accounting

Jacob_HiltonJun 25, 2024, 3:40 PM
156 points
11 comments9 min readLW link
(www.alignment.org)

ARC is hiring the­o­ret­i­cal researchers

Jun 12, 2023, 6:50 PM
126 points
12 comments4 min readLW link
(www.alignment.org)

The effect of hori­zon length on scal­ing laws

Jacob_HiltonFeb 1, 2023, 3:59 AM
23 points
2 comments1 min readLW link
(arxiv.org)

Scal­ing Laws for Re­ward Model Overoptimization

Oct 20, 2022, 12:20 AM
103 points
13 comments1 min readLW link
(arxiv.org)

Com­mon mis­con­cep­tions about OpenAI

Jacob_HiltonAug 25, 2022, 2:02 PM
238 points
154 comments5 min readLW link1 review

How much al­ign­ment data will we need in the long run?

Jacob_HiltonAug 10, 2022, 9:39 PM
37 points
15 comments4 min readLW link

Deep learn­ing cur­ricu­lum for large lan­guage model alignment

Jacob_HiltonJul 13, 2022, 9:58 PM
57 points
3 comments1 min readLW link
(github.com)

Pro­ce­du­rally eval­u­at­ing fac­tual ac­cu­racy: a re­quest for research

Jacob_HiltonMar 30, 2022, 4:37 PM
25 points
2 comments6 min readLW link

Truth­ful LMs as a warm-up for al­igned AGI

Jacob_HiltonJan 17, 2022, 4:49 PM
65 points
14 comments13 min readLW link

Sta­tion­ary al­gorith­mic probability

Jacob_HiltonApr 29, 2017, 5:23 PM
3 points
7 comments1 min readLW link
(www.jacobh.co.uk)