Thanks for writing the post, it was insightful to me.
“This model is largely used for alignment and other safety research, e.g. it would compress 100 years of human AI safety research into less than a year”
In your mind, what would be the best case outcome of such “alignment and other safety research”? What would it achieve?
I’m expecting something like “solve the alignment problem”. I’m also expecting you to think this might mean that advanced AI would be intent-aligned, that is, it would try to do what a user wants it to do, while not taking over the world. Is that broadly correct?
If so, the biggest missing piece for me is to understand how this would help to avoid that someone else builds an unaligned AI somewhere else with sufficient capabilities to take over. DeepSeek released a model with roughly comparable capabilities nine weeks after OpenAI’s o1, probably without stealing weights. It seems to me that you have about nine weeks to make sure others don’t build an unsafe AI. What’s your plan to achieve that and how would the alignment and other safety research help?
Thanks for writing the post, it was insightful to me.
“This model is largely used for alignment and other safety research, e.g. it would compress 100 years of human AI safety research into less than a year”
In your mind, what would be the best case outcome of such “alignment and other safety research”? What would it achieve?
I’m expecting something like “solve the alignment problem”. I’m also expecting you to think this might mean that advanced AI would be intent-aligned, that is, it would try to do what a user wants it to do, while not taking over the world. Is that broadly correct?
If so, the biggest missing piece for me is to understand how this would help to avoid that someone else builds an unaligned AI somewhere else with sufficient capabilities to take over. DeepSeek released a model with roughly comparable capabilities nine weeks after OpenAI’s o1, probably without stealing weights. It seems to me that you have about nine weeks to make sure others don’t build an unsafe AI. What’s your plan to achieve that and how would the alignment and other safety research help?