RSS

Joseph Bloom

Karma: 1,193

I run the White Box Evaluations Team at the UK AI Security Institute. This is primarily a mechanistic interpretability team focussed on estimating and addressing risks associated with deceptive alignment. I’m a MATS 5.0 and ARENA 1.0 Alumni. Previously, I cofounded the AI Safety Research Infrastructure Org Decode Research and conducted independent research into mechanistic interpretability of decision transformers. I studied computational biology and statistics at the University of Melbourne in Australia.