the reason why etched was less bandwidth limited is they traded latency for throughput by batching prompts and completions together. Gpus could also do that but they don’t to improve latency
the reason why etched was less bandwidth limited is they traded latency for throughput by batching prompts and completions together. Gpus could also do that but they don’t to improve latency