My name is pronounced “YOO-ar SKULL-se” (the “e” is not silent). I’m a PhD student at Oxford University, and I have worked in several different areas of AI safety research. For a few highlights, see:
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
STARC: A General Framework For Quantifying Differences Between Reward Functions
Risks from Learned Optimization in Advanced Machine Learning Systems
Some of my recent research on the theoretical foundations of reward learning is also described in this sequence.
For a full list of all my research, see my Google Scholar.
No, that is not a misinterpretation: I do think that this research agenda has the potential to get pretty close to solving outer alignment. More specifically, if it is (practically) possible to solve outer alignment through some form of reward learning, then I think this research agenda will establish how that can be done (and prove that this method works), and if it isn’t possible, then I think this research agenda will produce a precise understanding of why that isn’t possible (which would in turn help to inform subsequent research). I don’t think this research agenda is the only way to solve outer alignment, but I think it is the most promising way to do it.