RSS

Joar Skalse

Karma: 733

My name is pronounced “YOO-ar SKULL-se” (the “e” is not silent). I’m a PhD student at Oxford University, and I have worked in several different areas of AI safety research. For a few highlights, see:

  1. Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

  2. Misspecification in Inverse Reinforcement Learning

  3. STARC: A General Framework For Quantifying Differences Between Reward Functions

  4. Risks from Learned Optimization in Advanced Machine Learning Systems

  5. Is SGD a Bayesian sampler? Well, almost

Some of my recent research on the theoretical foundations of reward learning is also described in this sequence.

For a full list of all my research, see my Google Scholar.

How to Con­tribute to The­o­ret­i­cal Re­ward Learn­ing Research

Joar SkalseFeb 28, 2025, 7:27 PM
16 points
0 comments21 min readLW link

Other Papers About the The­ory of Re­ward Learning

Joar SkalseFeb 28, 2025, 7:26 PM
16 points
0 comments5 min readLW link

Defin­ing and Char­ac­ter­is­ing Re­ward Hacking

Joar SkalseFeb 28, 2025, 7:25 PM
15 points
0 comments4 min readLW link

Misspeci­fi­ca­tion in In­verse Re­in­force­ment Learn­ing—Part II

Joar SkalseFeb 28, 2025, 7:24 PM
9 points
0 comments7 min readLW link

STARC: A Gen­eral Frame­work For Quan­tify­ing Differ­ences Between Re­ward Functions

Joar SkalseFeb 28, 2025, 7:24 PM
11 points
0 comments8 min readLW link

Misspeci­fi­ca­tion in In­verse Re­in­force­ment Learning

Joar SkalseFeb 28, 2025, 7:24 PM
19 points
0 comments11 min readLW link

Par­tial Iden­ti­fi­a­bil­ity in Re­ward Learning

Joar SkalseFeb 28, 2025, 7:23 PM
15 points
0 comments12 min readLW link

The The­o­ret­i­cal Re­ward Learn­ing Re­search Agenda: In­tro­duc­tion and Motivation

Joar SkalseFeb 28, 2025, 7:20 PM
25 points
4 comments14 min readLW link

Towards Guaran­teed Safe AI: A Frame­work for En­sur­ing Ro­bust and Reli­able AI Systems

Joar SkalseMay 17, 2024, 7:13 PM
67 points
10 comments2 min readLW link