Thanks—any good examples spring to mind off the top of your head?
I’m not sure my desire to do interpretability comes from capabilities curiosity, but it certainly comes in part from interpretability curiosity; I’d really like to know what the hell is going on in there...
Thanks—any good examples spring to mind off the top of your head?
I’m not sure my desire to do interpretability comes from capabilities curiosity, but it certainly comes in part from interpretability curiosity; I’d really like to know what the hell is going on in there...