RSS

Jan Wehner

Karma: 58

I’m a PhD student working on AI Safety at the CISPA Helmholtz Center for Information Security. Currently, I’m working on Activation Engineering.

Feel free to contact me: jan.wehner@cispa.de

Saar­brücken Ger­many—ACX Mee­tups Every­where Fall 2024

Jan Wehner29 Aug 2024 18:37 UTC
2 points
0 comments1 min readLW link

An In­tro­duc­tion to Rep­re­sen­ta­tion Eng­ineer­ing—an ac­ti­va­tion-based paradigm for con­trol­ling LLMs

Jan Wehner14 Jul 2024 10:37 UTC
35 points
5 comments17 min readLW link

Im­mu­niza­tion against harm­ful fine-tun­ing attacks

6 Jun 2024 15:17 UTC
4 points
0 comments12 min readLW link

Train­ing-time do­main au­tho­riza­tion could be helpful for safety

25 May 2024 15:10 UTC
15 points
4 comments7 min readLW link

Data for IRL: What is needed to learn hu­man val­ues?

Jan Wehner3 Oct 2022 9:23 UTC
18 points
6 comments12 min readLW link

In­tro­duc­tion to Effec­tive Altru­ism: How to do good with your career

Jan Wehner7 Sep 2022 18:12 UTC
1 point
0 comments1 min readLW link