the gears to ascension comments on OpenAI releases GPT-4o, natively interfacing with text, voice and vision

the gears to ascension 13 May 2024 21:56 UTC
5 points
1
GPT-4o is apparently trained on text, voice and vision so that everything is done natively. You can now interrupt it mid-sentence.
this means it knows about time in a much deeper sense than previous large public models. I wonder how far that goes.
- cubefox 14 May 2024 1:20 UTC
  4 points
  2
  Parent
  Gemini also supported audio natively.
  - the gears to ascension 14 May 2024 6:28 UTC
    4 points
    0
    Parent
    oh, interesting, okay. I certainly didn’t notice any strong effect like this when talking to gemini previously.
- Simon Lermen 14 May 2024 13:59 UTC
  3 points
  0
  Parent
  It seems to be able to understand video rather than just images from the demos, I’d assume that will give it much better time understanding too. (Gemini also has video input)
- Jacob G-W 13 May 2024 22:05 UTC
  3 points
  0
  Parent
  Are you saying this because temporal understanding is necessary for audio? Are there any tests that could be done with just the text interface to see if it understands time better? I can’t really think of any (besides just doing off vibes after a bunch of interaction).
  - the gears to ascension 13 May 2024 22:08 UTC
    4 points
    0
    Parent
    I imagine its music skills are a good bit stronger. it’s more of a statement of curiosity regarding longer term time reasoning, like on the scale of hours to days.