My understanding is that GPT style transformer architecture already incorporates random seeds at various points. In which case, adding this functionality to the random seeds wouldn’t cause any significant “cost” in terms of competing with other implementations.
My understanding is that GPT style transformer architecture already incorporates random seeds at various points. In which case, adding this functionality to the random seeds wouldn’t cause any significant “cost” in terms of competing with other implementations.