Amal comments on Expectations for Gemini: hopefully not a big deal

Amal 2 Oct 2023 19:06 UTC
7 points
0
My guess is that it will be a scaled-up Gato—https://www.lesswrong.com/posts/7kBah8YQXfx6yfpuT/what-will-the-scaled-up-gato-look-like-updated-with. I think there might be some interesting features when the models are fully multi-modal—e.g. being able to play games, perform simple actions on a computer etc. Based on the announcement from google I would expect full multimodal training—image, audio, video, text in/out. Based on deepmind’s hiring needs I would expect they want it to also generate audio/video and extend the model to robotics (the brain of something similar to a Tesla Bot) in the near future. Elon claims that training just from video input/output can result in full self-driving, so I’m very curious what training on youtube videos can achieve. If they’ve managed to make a solid progress in long-term planning/reasoning and can deploy the model with a sufficiently small latency it might be a quite significant release, that could simplify many office jobs.