RSS

Jan Wehner

Karma: 70

I’m a PhD student working on AI Safety at the CISPA Helmholtz Center for Information Security. Currently, I’m working on Activation Engineering.

Feel free to contact me: jan.wehner@cispa.de

Open Challenges in Rep­re­sen­ta­tion Engineering

Apr 3, 2025, 7:21 PM
12 points
0 comments5 min readLW link

Saar­brücken Ger­many—ACX Mee­tups Every­where Fall 2024

Jan WehnerAug 29, 2024, 6:37 PM
2 points
0 comments1 min readLW link

An In­tro­duc­tion to Rep­re­sen­ta­tion Eng­ineer­ing—an ac­ti­va­tion-based paradigm for con­trol­ling LLMs

Jan WehnerJul 14, 2024, 10:37 AM
37 points
6 comments17 min readLW link

Im­mu­niza­tion against harm­ful fine-tun­ing attacks

Jun 6, 2024, 3:17 PM
4 points
0 comments12 min readLW link

Train­ing-time do­main au­tho­riza­tion could be helpful for safety

May 25, 2024, 3:10 PM
15 points
4 comments7 min readLW link

Data for IRL: What is needed to learn hu­man val­ues?

Jan WehnerOct 3, 2022, 9:23 AM
18 points
6 comments12 min readLW link

In­tro­duc­tion to Effec­tive Altru­ism: How to do good with your career

Jan WehnerSep 7, 2022, 6:12 PM
1 point
0 comments1 min readLW link