RSS

Verification

TagLast edit: Jul 9, 2022, 2:37 PM by Tor Økland Barstad

For­mal ver­ifi­ca­tion, heuris­tic ex­pla­na­tions and sur­prise accounting

Jacob_HiltonJun 25, 2024, 3:40 PM
156 points
11 comments9 min readLW link
(www.alignment.org)

Com­pact Proofs of Model Perfor­mance via Mechanis­tic Interpretability

Jun 24, 2024, 7:27 PM
96 points
4 comments8 min readLW link
(arxiv.org)

Mak­ing it harder for an AGI to “trick” us, with STVs

Tor Økland BarstadJul 9, 2022, 2:42 PM
15 points
5 comments22 min readLW link

Align­ment with ar­gu­ment-net­works and as­sess­ment-predictions

Tor Økland BarstadDec 13, 2022, 2:17 AM
10 points
5 comments45 min readLW link
No comments.