RSS

Jacob_Hilton

Karma: 1,440

A bird’s eye view of ARC’s research

Jacob_Hilton23 Oct 2024 15:50 UTC
116 points
12 comments7 min readLW link
(www.alignment.org)

Back­doors as an anal­ogy for de­cep­tive alignment

6 Sep 2024 15:30 UTC
104 points
2 comments8 min readLW link
(www.alignment.org)

For­mal ver­ifi­ca­tion, heuris­tic ex­pla­na­tions and sur­prise accounting

Jacob_Hilton25 Jun 2024 15:40 UTC
156 points
11 comments9 min readLW link
(www.alignment.org)

ARC is hiring the­o­ret­i­cal researchers

12 Jun 2023 18:50 UTC
126 points
12 comments4 min readLW link
(www.alignment.org)

The effect of hori­zon length on scal­ing laws

Jacob_Hilton1 Feb 2023 3:59 UTC
23 points
2 comments1 min readLW link
(arxiv.org)

Scal­ing Laws for Re­ward Model Overoptimization

20 Oct 2022 0:20 UTC
103 points
13 comments1 min readLW link
(arxiv.org)

Com­mon mis­con­cep­tions about OpenAI

Jacob_Hilton25 Aug 2022 14:02 UTC
251 points
147 comments5 min readLW link1 review

How much al­ign­ment data will we need in the long run?

Jacob_Hilton10 Aug 2022 21:39 UTC
37 points
15 comments4 min readLW link

Deep learn­ing cur­ricu­lum for large lan­guage model alignment

Jacob_Hilton13 Jul 2022 21:58 UTC
57 points
3 comments1 min readLW link
(github.com)

Pro­ce­du­rally eval­u­at­ing fac­tual ac­cu­racy: a re­quest for research

Jacob_Hilton30 Mar 2022 16:37 UTC
25 points
2 comments6 min readLW link

Truth­ful LMs as a warm-up for al­igned AGI

Jacob_Hilton17 Jan 2022 16:49 UTC
65 points
14 comments13 min readLW link

Sta­tion­ary al­gorith­mic probability

Jacob_Hilton29 Apr 2017 17:23 UTC
3 points
7 comments1 min readLW link
(www.jacobh.co.uk)