RSS

Ansh Radhakrishnan

Karma: 599

Ansh Rad­hakr­ish­nan’s Shortform

Ansh Radhakrishnan10 Oct 2024 22:00 UTC
5 points
2 comments1 min readLW link

Scal­able Over­sight and Weak-to-Strong Gen­er­al­iza­tion: Com­pat­i­ble ap­proaches to the same problem

16 Dec 2023 5:49 UTC
73 points
3 comments6 min readLW link

An­thropic Fall 2023 De­bate Progress Update

Ansh Radhakrishnan28 Nov 2023 5:37 UTC
74 points
9 comments12 min readLW link

Mea­sur­ing and Im­prov­ing the Faith­ful­ness of Model-Gen­er­ated Rea­son­ing

18 Jul 2023 16:36 UTC
111 points
14 comments6 min readLW link

Causal scrub­bing: re­sults on in­duc­tion heads

3 Dec 2022 0:59 UTC
34 points
1 comment17 min readLW link

Causal scrub­bing: re­sults on a paren bal­ance checker

3 Dec 2022 0:59 UTC
34 points
2 comments30 min readLW link

Causal scrub­bing: Appendix

3 Dec 2022 0:58 UTC
18 points
4 comments20 min readLW link

Causal Scrub­bing: a method for rigor­ously test­ing in­ter­pretabil­ity hy­pothe­ses [Red­wood Re­search]

3 Dec 2022 0:58 UTC
205 points
35 comments20 min readLW link1 review

The Bio An­chors Forecast

Ansh Radhakrishnan2 Jun 2022 1:32 UTC
13 points
0 comments3 min readLW link

RLHF

Ansh Radhakrishnan12 May 2022 21:18 UTC
18 points
5 comments5 min readLW link

An In­side View of AI Alignment

Ansh Radhakrishnan11 May 2022 2:16 UTC
32 points
2 comments2 min readLW link