I will spend time reading posts and papers, improving coding skills as needed to run and interpret experiments, learning math as needed for writing up proofs, talking with concept-based interpretability researchers as well as other conceptual alignment researchers
I feel like this is missing the bit where you write proofs, run and interpret experiments, etc.
I had thought that that would be implicit in why I’m picking up those skills/that knowledge? I agree that it’s not great that I’m finding that some of my initial ideas for things to do are infeasible or unhelpful such that I don’t feel like I have concrete theorems to want to try to prove here, or specific experiments I expect to want to run. I think a lot of next week is going to be reading up on natural latents/abstractions even more deeply than before when I was learning about them previously and trying to find somewhere a proof needs to go.
I feel like this is missing the bit where you write proofs, run and interpret experiments, etc.
I had thought that that would be implicit in why I’m picking up those skills/that knowledge? I agree that it’s not great that I’m finding that some of my initial ideas for things to do are infeasible or unhelpful such that I don’t feel like I have concrete theorems to want to try to prove here, or specific experiments I expect to want to run. I think a lot of next week is going to be reading up on natural latents/abstractions even more deeply than before when I was learning about them previously and trying to find somewhere a proof needs to go.