Have you ever tried to label your own data? Have you ever tried to run an evaluation, yourself, on a system that you’ve built? It’s difficult. We, humans, are imperfect. We are terrible at labeling. We are also expensive, and difficult to scale.
Lets say you wanted to create a high quality instruct dataset, today. What would generate a better output? A high resource model? Or, a farm of hired humans?
How many human RLHF labels are incorrect?
How many benchmarks are our machines now superhuman at?
Would you be a better RLHF labeler than GPT-4?
Have you ever tried to label your own data? Have you ever tried to run an evaluation, yourself, on a system that you’ve built? It’s difficult. We, humans, are imperfect. We are terrible at labeling. We are also expensive, and difficult to scale.
Lets say you wanted to create a high quality instruct dataset, today. What would generate a better output? A high resource model? Or, a farm of hired humans?
How many human RLHF labels are incorrect?
How many benchmarks are our machines now superhuman at?
If we’re not bootstrapped, we will be soon.