Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Rauno Arike
Karma:
189
All
Posts
Comments
New
Top
Old
Rauno’s Shortform
Rauno Arike
15 Nov 2024 12:08 UTC
3
points
6
comments
1
min read
LW
link
A Dialogue on Deceptive Alignment Risks
Rauno Arike
25 Sep 2024 16:10 UTC
11
points
0
comments
18
min read
LW
link
[Interim research report] Evaluating the Goal-Directedness of Language Models
Rauno Arike
,
Elizabeth Donoway
and
Marius Hobbhahn
18 Jul 2024 18:19 UTC
39
points
4
comments
11
min read
LW
link
Early Experiments in Reward Model Interpretation Using Sparse Autoencoders
lukemarks
,
Amirali Abdullah
,
Rauno Arike
,
Fazl
and
nothoughtsheadempty
3 Oct 2023 7:45 UTC
17
points
0
comments
5
min read
LW
link
Exploring the Lottery Ticket Hypothesis
Rauno Arike
25 Apr 2023 20:06 UTC
54
points
3
comments
11
min read
LW
link
[Question]
Request for Alignment Research Project Recommendations
Rauno Arike
3 Sep 2022 15:29 UTC
10
points
2
comments
1
min read
LW
link
Countering arguments against working on AI safety
Rauno Arike
20 Jul 2022 18:23 UTC
7
points
2
comments
7
min read
LW
link
Clarifying the confusion around inner alignment
Rauno Arike
13 May 2022 23:05 UTC
31
points
0
comments
11
min read
LW
link
Back to top