Yeah, it’s not a watertight argument and somewhat based on my current interpretation of past progress and projects in the making.
1. Intuitively, I would say for the problems we’re facing in evals, a ton of progress is bottlenecked by running fairly simple experiments and iterating fast. A reasonable part of it feels very parallelizable and the skill required is quite reachable for many people. 2. Most evals questions feel like we have a decent number of “obvious things” to try and since we have very tight feedback loops, making progress feels quite doable.
Intuitively, the “hardness level” to get to a robust science of evals and good coverage may be similar to going from the first transformer to GPT-3.5; You need to make a lot of design choices along the way, lots of research and spend some money but ultimately it’s just “do much more of the process you’re currently doing” (but we should probably spend more resources and intensify our efforts because I don’t feel like we’re on pace).
In contrast, there are other questions like “how do we fully map the human brain” that just seem like they come with a lot more fundamental questions along the way.
Yeah, it’s not a watertight argument and somewhat based on my current interpretation of past progress and projects in the making.
1. Intuitively, I would say for the problems we’re facing in evals, a ton of progress is bottlenecked by running fairly simple experiments and iterating fast. A reasonable part of it feels very parallelizable and the skill required is quite reachable for many people.
2. Most evals questions feel like we have a decent number of “obvious things” to try and since we have very tight feedback loops, making progress feels quite doable.
Intuitively, the “hardness level” to get to a robust science of evals and good coverage may be similar to going from the first transformer to GPT-3.5; You need to make a lot of design choices along the way, lots of research and spend some money but ultimately it’s just “do much more of the process you’re currently doing” (but we should probably spend more resources and intensify our efforts because I don’t feel like we’re on pace).
In contrast, there are other questions like “how do we fully map the human brain” that just seem like they come with a lot more fundamental questions along the way.