Alpaca: A Strong Open-Source Instruction-Following Model

sanxiyn14 Mar 2023 2:41 UTC

26 points

We performed a blind pairwise comparison between text-davinci-003 and Alpaca 7B, and we found that these two models have very similar performance: Alpaca wins 90 versus 89 comparisons against text-davinci-003.

Interestingly, Alpaca is trained using supervised finetuning, not RLHF. (text-davinci-003 is trained using RLHF.) This seems to confirm my suspicion that while RLHF improves performance it is not essential.

What links here?

Vladimir_Nesov's comment on All AGI Safety questions welcome (especially basic ones) [April 2023] by steven0461 (8 Apr 2023 19:54 UTC; 5 points)
Vladimir_Nesov's comment on ChatGPT (and now GPT4) is very easily distracted from its rules by dmcs (19 Mar 2023 18:14 UTC; 3 points)
Vladimir_Nesov's comment on Would you be a better RLHF labeler than GPT-4? by kache (27 Mar 2023 18:20 UTC; 2 points)

sanxiyn14 Mar 2023 2:41 UTC

26 points

2 comments1 min readLW link