How ambitious would it be to primarily focus on interpretability as an independent researcher (or as an employee/research engineer)?
If I’ve inferred correctly, one of this article’s goals is to increase the number of contributors in the space. I generally agree with how impactful interpretability can be, but I am little more risk averse when it comes to it being my career path.
For context, I have just graduated and I have a decent amount of experience in Python and other technologies. With those skills, I was hoping to tackle multiple low-hanging fruits in interpretability immediately, yet I am not 100% convinced in giving my all to these since I worry about my job security and chances of success.
1.) The space is both niche and research-oriented, which might not help me land future technical roles or higher education.
2.) I’ve anecdotally observed that most entry-level roles and fellowships in the space look for people with prior engineering work experience that is not often seen from fresh graduates. It might be hard to contribute and sustain myself from the get-go.
Does this mean I should be doing heavy engineering work before entering the interpretability space? Would it be possible for me to do both without sacrificing quality? If I do give 100% of my time to interpretability, would I still have other engineering job options if the space does not progress?
I am highly interested in knowing your thoughts on how younger devs/researchers can contribute without having to worry about job security, getting paid, or lacking future engineering skills (e.g. deployment, dev work outside notebook environments, etc.).
I primarily see mechanistic interpretability as a potential path towards understanding how models develop capabilities and processes—especially those that may represent misalignment. Hence, I view it as a means to monitor and align, not so much as to directly improve systems (unless of course we are able to include interpretability in the training loop).
How ambitious would it be to primarily focus on interpretability as an independent researcher (or as an employee/research engineer)?
If I’ve inferred correctly, one of this article’s goals is to increase the number of contributors in the space. I generally agree with how impactful interpretability can be, but I am little more risk averse when it comes to it being my career path.
For context, I have just graduated and I have a decent amount of experience in Python and other technologies. With those skills, I was hoping to tackle multiple low-hanging fruits in interpretability immediately, yet I am not 100% convinced in giving my all to these since I worry about my job security and chances of success.
1.) The space is both niche and research-oriented, which might not help me land future technical roles or higher education.
2.) I’ve anecdotally observed that most entry-level roles and fellowships in the space look for people with prior engineering work experience that is not often seen from fresh graduates. It might be hard to contribute and sustain myself from the get-go.
Does this mean I should be doing heavy engineering work before entering the interpretability space? Would it be possible for me to do both without sacrificing quality? If I do give 100% of my time to interpretability, would I still have other engineering job options if the space does not progress?
I am highly interested in knowing your thoughts on how younger devs/researchers can contribute without having to worry about job security, getting paid, or lacking future engineering skills (e.g. deployment, dev work outside notebook environments, etc.).
Did I understand your question correctly? Are you viewing interpretability work as a means to improve AI systems and their capabilities?
I primarily see mechanistic interpretability as a potential path towards understanding how models develop capabilities and processes—especially those that may represent misalignment. Hence, I view it as a means to monitor and align, not so much as to directly improve systems (unless of course we are able to include interpretability in the training loop).