RSS

Rauno Arike

Karma: 319

[Question] What faith­ful­ness met­rics should gen­eral claims about CoT faith­ful­ness be based upon?

Rauno ArikeApr 8, 2025, 3:27 PM
24 points
0 comments4 min readLW link

On the Im­pli­ca­tions of Re­cent Re­sults on La­tent Rea­son­ing in LLMs

Rauno ArikeMar 31, 2025, 11:06 AM
31 points
6 comments13 min readLW link

The Best Lec­ture Series on Every Subject

Rauno ArikeMar 24, 2025, 8:03 PM
13 points
1 comment2 min readLW link

Rauno’s Shortform

Rauno ArikeNov 15, 2024, 12:08 PM
3 points
6 commentsLW link

A Dialogue on De­cep­tive Align­ment Risks

Rauno ArikeSep 25, 2024, 4:10 PM
11 points
0 comments18 min readLW link

[In­terim re­search re­port] Eval­u­at­ing the Goal-Direct­ed­ness of Lan­guage Models

Jul 18, 2024, 6:19 PM
40 points
4 comments11 min readLW link

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

Oct 3, 2023, 7:45 AM
17 points
0 comments5 min readLW link

Ex­plor­ing the Lot­tery Ticket Hypothesis

Rauno ArikeApr 25, 2023, 8:06 PM
58 points
3 comments11 min readLW link

[Question] Re­quest for Align­ment Re­search Pro­ject Recommendations

Rauno ArikeSep 3, 2022, 3:29 PM
10 points
2 comments1 min readLW link

Coun­ter­ing ar­gu­ments against work­ing on AI safety

Rauno ArikeJul 20, 2022, 6:23 PM
7 points
2 comments7 min readLW link

Clar­ify­ing the con­fu­sion around in­ner alignment

Rauno ArikeMay 13, 2022, 11:05 PM
31 points
0 comments11 min readLW link