eggsyntax comments on Language Models Model Us

eggsyntax 18 May 2024 1:54 UTC
4 points
1
That certainly seems plausible—it would be interesting to compare to a base model at some point, although with recent changes to the OpenAI API, I’m not sure if there would be a good way to pull the right token probabilities out.
@Jessica Rumbelow also suggested that that debiasing process could be a reason why there weren’t significant score differences between the main model tested, older GPT-3.5, and the newest GPT-4.
- Martin Vlach 18 May 2024 10:27 UTC
  4 points
  1
  Parent
  As the Llama3 70B base model is said very clean( unlike base DeepSeek for example, which is instruction-spoiled already) and similarly capable to GPT3.5, you could explore that hypothesis.
  Details: Check Groq or TogetherAI for free inference, not sure if test data would fit Llama3 context window.
  - eggsyntax 18 May 2024 12:00 UTC
    1 point
    0
    Parent
    Thanks!