There is an attitude I see in AI safety from time to time when writing papers or doing projects:
People think more about doing a cool project rather than having a clear theory of change.
They spend a lot of time optimizing for being “publishable.”
I think it’s bad if we want to solve AI safety. On the other hand, having a clear theory of change is hard. Sometimes, it’s just so much easier to focus on an interesting problem instead of constantly asking yourself, “Is this really solving AI safety?”
How to approch this whole thing? Idk about you guys but this is draining for me.
Why would I publish papers in AI safety? Do people even read them? Am I doing it just to gain credibility? Aren’t there already too many papers?
The incentives for early career researchers are to blame for this mindset imo. Having legible output is a very good signal of competence for employers/grantors. I think it probably makes sense for the first or first couple project of a researcher to be more of a cool demo than clear steps towards a solution.
Unfortunately, some middle career and sometimes even senior researchers keep this habit of forward-chaining from what looks cool instead of backwards-chaining from good futures. Ok, the previous sentence was a bit too strong. No reasoning is pure backward-chaining or pure forward-chaining. But I think that a common failure mode is not thinking enough about theories of change.
Okay, this makes sense but doesn’t answer my question. Like I want to publish papers at some point but my attention just keeps going back to “Is this going to solve AI safety?” I guess people in mechanistic interpretability don’t keep thinking about it, they are more like “Hm… I have this interesting problem at hand...” and they try to solve it. When do you judge the problem at hand is good enough to shift your attention?
There is an attitude I see in AI safety from time to time when writing papers or doing projects:
People think more about doing a cool project rather than having a clear theory of change.
They spend a lot of time optimizing for being “publishable.”
I think it’s bad if we want to solve AI safety. On the other hand, having a clear theory of change is hard. Sometimes, it’s just so much easier to focus on an interesting problem instead of constantly asking yourself, “Is this really solving AI safety?”
How to approch this whole thing? Idk about you guys but this is draining for me.
Why would I publish papers in AI safety? Do people even read them? Am I doing it just to gain credibility? Aren’t there already too many papers?
The incentives for early career researchers are to blame for this mindset imo. Having legible output is a very good signal of competence for employers/grantors. I think it probably makes sense for the first or first couple project of a researcher to be more of a cool demo than clear steps towards a solution.
Unfortunately, some middle career and sometimes even senior researchers keep this habit of forward-chaining from what looks cool instead of backwards-chaining from good futures. Ok, the previous sentence was a bit too strong. No reasoning is pure backward-chaining or pure forward-chaining. But I think that a common failure mode is not thinking enough about theories of change.
Okay, this makes sense but doesn’t answer my question. Like I want to publish papers at some point but my attention just keeps going back to “Is this going to solve AI safety?” I guess people in mechanistic interpretability don’t keep thinking about it, they are more like “Hm… I have this interesting problem at hand...” and they try to solve it. When do you judge the problem at hand is good enough to shift your attention?