Above 99% certainty:
Run inference in reasonable latency (e.g. < 1 second for text completion) on a typical home gaming computer (i.e. one with a single high-powered GPU).
Sigh. Even this one may have fallen depending on how you evaluate llama 7b performance. Like under 1 second for how many tokens?
Above 99% certainty:
Run inference in reasonable latency (e.g. < 1 second for text completion) on a typical home gaming computer (i.e. one with a single high-powered GPU).
Sigh. Even this one may have fallen depending on how you evaluate llama 7b performance. Like under 1 second for how many tokens?