Hello! This is a personal project I’ve been working on. I plan to refine it based on feedback. If you braved the length of this paper, please let me know what you think! I have tried to make it as easy and interesting a read as possible while still delving deep into my thoughts about interpretability and how we can solve it.
Also please share it with people who find this topic interesting, given my lone wolf researcher position and the length of the paper, it is hard to spread it around to get feedback.
Very happy to answer any questions, delve into counterarguments etc.
Hello! This is a personal project I’ve been working on. I plan to refine it based on feedback. If you braved the length of this paper, please let me know what you think! I have tried to make it as easy and interesting a read as possible while still delving deep into my thoughts about interpretability and how we can solve it.
Also please share it with people who find this topic interesting, given my lone wolf researcher position and the length of the paper, it is hard to spread it around to get feedback.
Very happy to answer any questions, delve into counterarguments etc.