My guess is they will make tweaks on the tokenization part. If I was given the task of adding more modalities to a Transformer, I would probably be scretching my head thinking if there was a way to universally tokenize any type of data. But that’s an optimistic scenario, I don’t expect them to come up with such a solution right now. So I think they will just add new state-of-the-art tokenizers, like ViT-VQGAN for images. Other than that, I am mostly curious if we will be able to more clearly observe transfer learning when increasing the model size. That I think is the most important information that can come from GATO 2, because then we will be more equiped to achieve Chincilla’s scaling laws without some potential slowdown. I am betting that we will, as long as tasks are not that far away from each other and we take the time to build good tokenizers.
My guess is they will make tweaks on the tokenization part. If I was given the task of adding more modalities to a Transformer, I would probably be scretching my head thinking if there was a way to universally tokenize any type of data. But that’s an optimistic scenario, I don’t expect them to come up with such a solution right now. So I think they will just add new state-of-the-art tokenizers, like ViT-VQGAN for images. Other than that, I am mostly curious if we will be able to more clearly observe transfer learning when increasing the model size. That I think is the most important information that can come from GATO 2, because then we will be more equiped to achieve Chincilla’s scaling laws without some potential slowdown. I am betting that we will, as long as tasks are not that far away from each other and we take the time to build good tokenizers.