The learned algorithms will not always be simple enough to be interpretable. But I agree we should try to interpret as much as we can. What we are trying to predict is the behavior of future, more powerful models. I think toy models can sometimes have characteristics that are absent from current language models but those characteristics may be integrated into or emerge from more advanced systems that we build.
The learned algorithms will not always be simple enough to be interpretable. But I agree we should try to interpret as much as we can. What we are trying to predict is the behavior of future, more powerful models. I think toy models can sometimes have characteristics that are absent from current language models but those characteristics may be integrated into or emerge from more advanced systems that we build.