Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Jacob_Hilton
Karma:
1,440
All
Posts
Comments
New
Top
Old
A bird’s eye view of ARC’s research
Jacob_Hilton
23 Oct 2024 15:50 UTC
116
points
12
comments
7
min read
LW
link
(www.alignment.org)
Backdoors as an analogy for deceptive alignment
Jacob_Hilton
and
Mark Xu
6 Sep 2024 15:30 UTC
104
points
2
comments
8
min read
LW
link
(www.alignment.org)
Formal verification, heuristic explanations and surprise accounting
Jacob_Hilton
25 Jun 2024 15:40 UTC
156
points
11
comments
9
min read
LW
link
(www.alignment.org)
ARC is hiring theoretical researchers
paulfchristiano
,
Jacob_Hilton
and
Mark Xu
12 Jun 2023 18:50 UTC
126
points
12
comments
4
min read
LW
link
(www.alignment.org)
The effect of horizon length on scaling laws
Jacob_Hilton
1 Feb 2023 3:59 UTC
23
points
2
comments
1
min read
LW
link
(arxiv.org)
Scaling Laws for Reward Model Overoptimization
leogao
,
John Schulman
and
Jacob_Hilton
20 Oct 2022 0:20 UTC
103
points
13
comments
1
min read
LW
link
(arxiv.org)
Common misconceptions about OpenAI
Jacob_Hilton
25 Aug 2022 14:02 UTC
251
points
147
comments
5
min read
LW
link
1
review
How much alignment data will we need in the long run?
Jacob_Hilton
10 Aug 2022 21:39 UTC
37
points
15
comments
4
min read
LW
link
Deep learning curriculum for large language model alignment
Jacob_Hilton
13 Jul 2022 21:58 UTC
57
points
3
comments
1
min read
LW
link
(github.com)
Procedurally evaluating factual accuracy: a request for research
Jacob_Hilton
30 Mar 2022 16:37 UTC
25
points
2
comments
6
min read
LW
link
Truthful LMs as a warm-up for aligned AGI
Jacob_Hilton
17 Jan 2022 16:49 UTC
65
points
14
comments
13
min read
LW
link
Stationary algorithmic probability
Jacob_Hilton
29 Apr 2017 17:23 UTC
3
points
7
comments
1
min read
LW
link
(www.jacobh.co.uk)
Back to top