Also, quickly looking into how LLM token sampling works nowadays, you may also need to set the parameters top_p to 0, and top_k to 1 to get it to actually function like argmax. Looks like these can only be set through the API if you’re using ChatGPT or similar proprietary LLMs. Maybe I’ll try experimenting with this when I find the time, if nothing else to rule out the possibility of such a seemingly obvious thing being missed.
Also, quickly looking into how LLM token sampling works nowadays, you may also need to set the parameters top_p to 0, and top_k to 1 to get it to actually function like argmax. Looks like these can only be set through the API if you’re using ChatGPT or similar proprietary LLMs. Maybe I’ll try experimenting with this when I find the time, if nothing else to rule out the possibility of such a seemingly obvious thing being missed.