Your theory of change seems pretty indirect. Even if you do this project very successfully, to improve safety, you mostly need someone to read your writeup and do interventions accordingly. (Except insofar as your goal is just to inform AI safety people about various dynamics.)
There’s classic advice like find the target audience for your research and talk to them regularly so you know what’s helpful to them. For an exploratory project like this maybe you don’t really have a target audience. So… at least write down theories of change and keep them in mind and notice how various lower-level directions relate to them.
Your theory of change seems pretty indirect. Even if you do this project very successfully, to improve safety, you mostly need someone to read your writeup and do interventions accordingly. (Except insofar as your goal is just to inform AI safety people about various dynamics.)
There’s classic advice like find the target audience for your research and talk to them regularly so you know what’s helpful to them. For an exploratory project like this maybe you don’t really have a target audience. So… at least write down theories of change and keep them in mind and notice how various lower-level directions relate to them.