AI Safety Researcher @AE Studio
Currently researching a neglected prior for cooperation and honesty inspired by the cognitive neuroscience of altruism called self-other overlap in state-of-the-art ML models.
Previous SRF @ SERI 21′, MLSS & Student Researcher @ CAIS 22′ and LTFF grantee.
I do not think we should only work on approaches that work on any AI, I agree that would constitute a methodological mistake. I found a framing that general to not be very conducive to progress.
You are right that we still have the chance to shape the internals of TAI, even though there are a lot of hoops to go through to make that happen. We think that this is still worthwhile, which is why we stated our interest in potentially helping with the development and deployment of provably safe architectures, even though they currently seem less competitive.
In my response, I was trying to highlight the point that whenever we can, we should keep our assumptions to a minimum given the uncertainty we are under. Having that said, it is reasonable to have some working assumptions that allow progress to be made in the first place as long as they are clearly stated.
I also agree with Davidad about the importance of governance for the successful implementation of a technical AI Safety plan as well as with your claim that proliferation risks are important, with the caveat that I am less worried about proliferation risks in a world with very short timelines.