In my world model, it is more governable and reduce the x-risk for several reasons (if timelines are short).
1-Pretraining a big model and run millions of it at the same time give us a very fast takeoff speed and way less time for preparation, and iteration compared to 50000 slow and compute heavy models.
2-their deployment(pre-AGI O3 type systems) to the public will eat away a huge amount of compute, and may delay pretraining of larger models if they are profitable enough.
3-it is way easier to check the capabilities and shortcomings this way.
In my world model, it is more governable and reduce the x-risk for several reasons (if timelines are short).
1-Pretraining a big model and run millions of it at the same time give us a very fast takeoff speed and way less time for preparation, and iteration compared to 50000 slow and compute heavy models.
2-their deployment(pre-AGI O3 type systems) to the public will eat away a huge amount of compute, and may delay pretraining of larger models if they are profitable enough.
3-it is way easier to check the capabilities and shortcomings this way.