I’d say it’s pretty low bandwidth compared to the wealth of information that must exist in the intermediate layers. Even just the distribution of logits gets collapsed into a single returned value. You could definitely send back more than just that, but the question is whether it’s workable or if it just adds confusion.
I’d say it’s pretty low bandwidth compared to the wealth of information that must exist in the intermediate layers. Even just the distribution of logits gets collapsed into a single returned value. You could definitely send back more than just that, but the question is whether it’s workable or if it just adds confusion.