Nice! This is very readable—a lot like a collection of blog posts rather than a dense book. Hopefully if you want more density you can check the references, though I haven’t gotten into them.
Skimming it sparked some thoughts about translating some of the described interpretability methods from “shallow” AI where the features aren’t learned much to “deep” AI where some complicated features get learned, just by swapping “input features” for “features in a latent space.”
Nice! This is very readable—a lot like a collection of blog posts rather than a dense book. Hopefully if you want more density you can check the references, though I haven’t gotten into them.
Skimming it sparked some thoughts about translating some of the described interpretability methods from “shallow” AI where the features aren’t learned much to “deep” AI where some complicated features get learned, just by swapping “input features” for “features in a latent space.”