The main thing here is that as models become more capable and general in the near-term future, I expect there will be intense demand for models that can solve ever larger and more complex problems. For these models, people will be willing to pay the costs of high latency, given the benefit of increased quality. We’ve already seen this in the way people prefer GPT-4 to GPT-3.5 in a large fraction of cases (for me, a majority of cases).
I expect this trend will continue into the foreseeable future until at least the period slightly after we’ve automated most human labor, and potentially into the very long-run too depending on physical constraints. I am not sufficiently educated about physical constraints here to predict what will happen “deep into the singularity”, but it’s important to note that physical constraints can cut both ways here.
To the extent that physics permits extremely useful models by virtue of them being very large and capable, you should expect people to optimize heavily for that despite the cost in terms of latency. By contrast, to the extent physics permits extremely useful models by virtue of them being very fast, then you should expect people to optimize heavily for that despite the cost in terms of quality. The balance that we strike here is not a simple function of how far we are from some abstract physical limit, but instead a function of how these physical constraints trade off against each other.
There is definitely a conceivable world in which the correct balance still favors much-faster-than-human-level latency, but it’s not clear to me that this is the world we actually live in. My intuitive, random speculative guess is that we live in the world where, for the most complex tasks that bottleneck important economic decision-making, people will optimize heavily for model quality at the cost of latency until settling on something within 1-2 OOMs of human-level latency.
The main thing here is that as models become more capable and general in the near-term future, I expect there will be intense demand for models that can solve ever larger and more complex problems. For these models, people will be willing to pay the costs of high latency, given the benefit of increased quality. We’ve already seen this in the way people prefer GPT-4 to GPT-3.5 in a large fraction of cases (for me, a majority of cases).
I expect this trend will continue into the foreseeable future until at least the period slightly after we’ve automated most human labor, and potentially into the very long-run too depending on physical constraints. I am not sufficiently educated about physical constraints here to predict what will happen “deep into the singularity”, but it’s important to note that physical constraints can cut both ways here.
To the extent that physics permits extremely useful models by virtue of them being very large and capable, you should expect people to optimize heavily for that despite the cost in terms of latency. By contrast, to the extent physics permits extremely useful models by virtue of them being very fast, then you should expect people to optimize heavily for that despite the cost in terms of quality. The balance that we strike here is not a simple function of how far we are from some abstract physical limit, but instead a function of how these physical constraints trade off against each other.
There is definitely a conceivable world in which the correct balance still favors much-faster-than-human-level latency, but it’s not clear to me that this is the world we actually live in. My intuitive, random speculative guess is that we live in the world where, for the most complex tasks that bottleneck important economic decision-making, people will optimize heavily for model quality at the cost of latency until settling on something within 1-2 OOMs of human-level latency.