My reasoning is partly that we know that large AGI-outfits do not necessarily publish their insights into the capabilities of their systems and architectures. But it seems to me to be quite important to develop a strong understanding of these capabilities.
Given that I would use existing techniques in a toy scenario, I think it’s very unlikely that I would create new capabilities. Maybe I would discover unknown capabilities but these would have existed in similar systems anyway. And of course what discoveries I would decide to publish is a separate question altogether.
I also wouldn’t call this “safety research”, though I think such a model might downstream be useful for prosaic alignment. My motivation is mostly to understand whether AGI is 5 years away or 30. And to know which breakthroughs fill the remaining gaps and which don’t.
My reasoning is partly that we know that large AGI-outfits do not necessarily publish their insights into the capabilities of their systems and architectures. But it seems to me to be quite important to develop a strong understanding of these capabilities.
Given that I would use existing techniques in a toy scenario, I think it’s very unlikely that I would create new capabilities. Maybe I would discover unknown capabilities but these would have existed in similar systems anyway. And of course what discoveries I would decide to publish is a separate question altogether.
I also wouldn’t call this “safety research”, though I think such a model might downstream be useful for prosaic alignment. My motivation is mostly to understand whether AGI is 5 years away or 30. And to know which breakthroughs fill the remaining gaps and which don’t.