kave comments on Google Gemini Announced

kave 6 Dec 2023 20:44 UTC
11 points
0
Table 2 seems to provide a more direct comparison.
- dgros 6 Dec 2023 22:09 UTC
  6 points
  0
  Parent
  In particular, in the five tasks (MMLU, MATH, BIG-Bench, Natural2Code, WMT23) where they report going to the GPT-4 API, they report an average of ~1 point improvement. This experiment setting seems comparable, and not evidence they are underperforming GPT-4.
  
  However, all these settings are different from how ChatGPT-like systems are mostly being used (where mostly zero-shot). So difficult to judge the success of their instruction-tuning for use in this setting.
  
  (apologies if this point posted twice. Lesswrong was showing errors when tried to post.)