Remember that cool project where Redwood made a simple web app to allow humans to challenge themselves against language models in predicting next tokens on web data?
I’d love to see something similar done for the LLM arena, so we could compare the ELO scores of human users to the scores of LLMs.https://lmsys.org/blog/2023-05-03-arena/
Remember that cool project where Redwood made a simple web app to allow humans to challenge themselves against language models in predicting next tokens on web data? I’d love to see something similar done for the LLM arena, so we could compare the ELO scores of human users to the scores of LLMs.https://lmsys.org/blog/2023-05-03-arena/