I’m puzzled by this as well. For a moment I thought maybe PaLM used an encoder-decoder architecture, but no it uses next-word prediction just like GPT-3. Not sure what GPT-3 has that PaLM lacks. A model with the parameter count of PaLM and training dateset size of Chinchilla would be a better hypothetical for “Great Palm”.
I’m puzzled by this as well. For a moment I thought maybe PaLM used an encoder-decoder architecture, but no it uses next-word prediction just like GPT-3. Not sure what GPT-3 has that PaLM lacks. A model with the parameter count of PaLM and training dateset size of Chinchilla would be a better hypothetical for “Great Palm”.