Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Verification
Tag
Last edit:
9 Jul 2022 14:37 UTC
by
Tor Økland Barstad
Relevant
New
Old
Formal verification, heuristic explanations and surprise accounting
Jacob_Hilton
25 Jun 2024 15:40 UTC
156
points
11
comments
9
min read
LW
link
(www.alignment.org)
Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC
,
rajashree
,
Adrià Garriga-alonso
and
Jason Gross
24 Jun 2024 19:27 UTC
95
points
3
comments
8
min read
LW
link
(arxiv.org)
Making it harder for an AGI to “trick” us, with STVs
Tor Økland Barstad
9 Jul 2022 14:42 UTC
15
points
5
comments
22
min read
LW
link
Alignment with argument-networks and assessment-predictions
Tor Økland Barstad
13 Dec 2022 2:17 UTC
10
points
5
comments
45
min read
LW
link
No comments.
Back to top