With #3, I think you fell into the trap of being overly-specific and overly-committed to a specific organizational strategy. It would be very reasonable to assume that OA would be working on multimodal, because you need that for efficiency & generalization & ability to do things like text instructions to control a robot arm, and indeed, I quote TR about how they are working hard on large multimodal self-supervised Transformers… but you assumed that would have to be the “GPT-3”, instead of a parallel project while GPT-3 winds up being a scaled up GPT-2. It would have made more sense to split the predictions and try to be agnostic about whether OA would choose to do 2 big models or attempt 1 multimodal model, since it could be the case that the multimodal stuff would not mature in time (as seems to be the case), and predict instead more end outcomes like “human-level text article generation” or “models with >100b parameters”, since there are many possible routes to relatively few outcomes of interest.
With #3, I think you fell into the trap of being overly-specific and overly-committed to a specific organizational strategy. It would be very reasonable to assume that OA would be working on multimodal, because you need that for efficiency & generalization & ability to do things like text instructions to control a robot arm, and indeed, I quote TR about how they are working hard on large multimodal self-supervised Transformers… but you assumed that would have to be the “GPT-3”, instead of a parallel project while GPT-3 winds up being a scaled up GPT-2. It would have made more sense to split the predictions and try to be agnostic about whether OA would choose to do 2 big models or attempt 1 multimodal model, since it could be the case that the multimodal stuff would not mature in time (as seems to be the case), and predict instead more end outcomes like “human-level text article generation” or “models with >100b parameters”, since there are many possible routes to relatively few outcomes of interest.