I don’t find myself excited about the work. I’ve never been properly nerd-sniped by a mechanistic interpretability problem, and I find the day-to-day work to be more drudgery than exciting, even though the overall goal of the field seems like a good one.
When left to do largely independent work, after doing the obvious first thing or two (“obvious” at the level of “These techniques are in Neel’s demos”) I find it hard to figure out what to do next, and hard to motivate myself to do more things if I do think of them because of the above drudgery.
I find myself having difficulty backchaining from the larger goal to the smaller one. I think this is a combination of a motivational issue and having less grasp on the concepts.
By contrast, in evaluations, none of this is true. I am able to solve problems more effectively, I find myself actively interested in problems, (the ones I’m working on and ones I’m not) and I find myself more able to solve problems and reason about how they matter for the bigger picture.
I’m not sure how much of each is a contributor, but I suspect that if I was sufficiently excited about the day-to-day work, all the other problems would be much more fixable. There’s a sense of reluctance, a sense of burden, that saps a lot of energy when it comes to doing this kind of work.
As for #2, I guess I should clarify what I mean, since there’s two ways you could view “not suited”.
I will never be able to become good enough at this for my funding to be net-positive. There are fundamental limitations to my ability to succeed in this field.
I should not be in this field. The amount of resources required to make me competitive in this field is significantly larger than other people who would do equally good work, and this is not true for other subfields in alignment.
I view my use of “I’m not suited” more like 2 than 1. I think there’s a reasonable chance that, given enough time with proper effort and mentorship in a proper organisational setting (being in a setting like this is important for me to reliably complete work that doesn’t excite me), I could eventually do okay at this field. But I also think that there are other people who would do better, faster, and be a better use of an organisation’s money than me.
This doesn’t feel like the case in evals. I feel like I can meaningfully contribute immediately, and I’m sufficiently motivated and knowledgable that I can understand the difference between my job and my mission (making AI go well) and feel confident that I can take actions to succeed in both of them.
If Omega came down from the sky and said “Mechanistic interpretability is the only way you will have any impact on AI alignment—it’s this or nothing” I might try anyway. But I’m not in that position, and I’m actually very glad I’m not.
Concrete feedback signals I’ve received:
I don’t find myself excited about the work. I’ve never been properly nerd-sniped by a mechanistic interpretability problem, and I find the day-to-day work to be more drudgery than exciting, even though the overall goal of the field seems like a good one.
When left to do largely independent work, after doing the obvious first thing or two (“obvious” at the level of “These techniques are in Neel’s demos”) I find it hard to figure out what to do next, and hard to motivate myself to do more things if I do think of them because of the above drudgery.
I find myself having difficulty backchaining from the larger goal to the smaller one. I think this is a combination of a motivational issue and having less grasp on the concepts.
By contrast, in evaluations, none of this is true. I am able to solve problems more effectively, I find myself actively interested in problems, (the ones I’m working on and ones I’m not) and I find myself more able to solve problems and reason about how they matter for the bigger picture.
I’m not sure how much of each is a contributor, but I suspect that if I was sufficiently excited about the day-to-day work, all the other problems would be much more fixable. There’s a sense of reluctance, a sense of burden, that saps a lot of energy when it comes to doing this kind of work.
As for #2, I guess I should clarify what I mean, since there’s two ways you could view “not suited”.
I will never be able to become good enough at this for my funding to be net-positive. There are fundamental limitations to my ability to succeed in this field.
I should not be in this field. The amount of resources required to make me competitive in this field is significantly larger than other people who would do equally good work, and this is not true for other subfields in alignment.
I view my use of “I’m not suited” more like 2 than 1. I think there’s a reasonable chance that, given enough time with proper effort and mentorship in a proper organisational setting (being in a setting like this is important for me to reliably complete work that doesn’t excite me), I could eventually do okay at this field. But I also think that there are other people who would do better, faster, and be a better use of an organisation’s money than me.
This doesn’t feel like the case in evals. I feel like I can meaningfully contribute immediately, and I’m sufficiently motivated and knowledgable that I can understand the difference between my job and my mission (making AI go well) and feel confident that I can take actions to succeed in both of them.
If Omega came down from the sky and said “Mechanistic interpretability is the only way you will have any impact on AI alignment—it’s this or nothing” I might try anyway. But I’m not in that position, and I’m actually very glad I’m not.