avturchin comments on avturchin’s Shortform

avturchin 29 Apr 2024 18:44 UTC
7 points
−11
ChatGPT 4.5 is on preview at https://chat.lmsys.org/ under name gpt-2.

It calls itself ChatGPT 2.0 in a text art drawing https://twitter.com/turchin/status/1785015421688799492
- gwern 29 Apr 2024 23:28 UTC
  11 points
  5
  Parent
  https://rentry.org/GPT2
  
  I ran out of tokens quickly trying out poetry but I didn’t get the impression that this is a big leap over GPT-4 like GPT-5 presumably is designed to be. (It could, I suppose, be a half-baked GPT-5 similar to ‘Prometheus’ for GPT-4.) My overall impression from poetry was that it was a GPT-4 which isn’t as RLHF-damaged as usual, and more like Claude in having a RLAIF-y creative style. So I could believe it’s a better GPT-4 where they are experimenting with new tuning/personality to reduce the ChatGPT-bureaucratese.
  
  HN: https://news.ycombinator.com/item?id=40199715
  - avturchin 30 Apr 2024 10:11 UTC
    4 points
    0
    Parent
    It failed my favorite test: draw a world map in text art.
- peterbarnett 29 Apr 2024 18:55 UTC
  11 points
  2
  Parent
  Related market on Manifold:
- metachirality 29 Apr 2024 19:05 UTC
  9 points
  3
  Parent
  We don’t actually know if it’s GPT 4.5 for sure. It could be an alternative training run that preceded the current version of ChatGPT 4 or even a different model entirely.
  - faul_sname 29 Apr 2024 23:27 UTC
    2 points
    0
    Parent
    It might be informative to try to figure out when its knowledge cutoff is (right now I can’t do so, as it’s at it’s rate limit).
    - O O 30 Apr 2024 3:59 UTC
      3 points
      0
      Parent
      https://rentry.org/gpt2
      
      Rumored to be 11-2023
    - avturchin 30 Apr 2024 10:10 UTC
      2 points
      0
      Parent
      It claims to have knowledge cutoff as of Nov 2023, but failed to tell what happened on October 7 and hallucinated.
- bruberu 29 Apr 2024 21:14 UTC
  5 points
  1
  Parent
  By using @Sergii’s list reversal benchmark, it seems that this model seems to fail reversing a list of 10 random numbers from 1-10 from random.org about half the time. This is compared to GPT-4′s supposed ability to reverse lists of 20 numbers fairly well, and ChatGPT 3.5 seemed to have no trouble itself, although since it isn’t a base model, this comparison could potentially be invalid.
  This does significantly update me towards believing that this is probably not better than GPT-4.
  - O O 29 Apr 2024 22:14 UTC
    3 points
    1
    Parent
    Seems correct to me (and it did work for a handful of 10 int lists I manually came up with). More impressively, it does this correctly as well:
    - bruberu 29 Apr 2024 22:38 UTC
      7 points
      0
      Parent
      OK, what I actually did was not realize that the link provided did not link directly to gpt2-chatbot (instead, the front page just compares two random chatbots from a list). After figuring that out, I reran my tests; it was able to do 20, 40, and 100 numbers perfectly.
      I’ve retracted my previous comments.
      - bruberu 29 Apr 2024 23:44 UTC
        5 points
        0
        Parent
        As for one more test, it was rather close on reversing 400 numbers:
        Given these results, it seems pretty obvious that this is a rather advanced model (although Claude Opus was able to do it perfectly, so it may not be SOTA).
        Going back to the original question of where this model came from, I have trouble putting the chance of this necessarily coming from OpenAI above 50%, mainly due to questions about how exactly this was publicized. It seems to be a strange choice to release an unannounced model in Chatbot Arena, especially without any sort of associated update on GitHub for the model (which would be in https://github.com/lm-sys/FastChat/blob/851ef88a4c2a5dd5fa3bcadd9150f4a1f9e84af1/fastchat/model/model_registry.py#L228 ). However, I think I still have some pretty large error margins, given how little information I can really find.
        gwern 30 Apr 2024 0:10 UTC
        7 points
        5
        Parent
        Nah, it’s just a PR stunt. Remember when DeepMind released AlphaGo Master by simply running a ‘Magister’ Go player online which went undefeated?* Everyone knew it was DeepMind simply because who else could it be? And IIRC, didn’t OA also pilot OA5 ‘anonymously’ on DoTA2 ladders? Or how about when Mistral released torrents? (If they had really wanted a blind test, they wouldn’t’ve called it “gpt2”, or they could’ve just rolled it out to a subset of ChatGPT users, who would have no way of knowing the model underneath the interface had been swapped out.)
        
        * One downside of that covert testing: DM AFAIK never released a paper on AG Master, or all the complicated & interesting things they were trying before they hit upon the AlphaZero approach.
    - bruberu 29 Apr 2024 22:21 UTC
      1 point
      0
      Parent
      Interesting; maybe it’s an artifact of how we formatted our questions? Or, potentially, the training samples with larger ranges of numbers were higher quality? You could try it like how I did in this failing example:
      When I tried this same list with your prompt, both responses were incorrect:
- p.b. 30 Apr 2024 12:28 UTC
  1 point
  0
  Parent
  I tried some chess but’s it’s still pretty bad. Not noticeably better GPT4.