Each element of the GatePattern matrix, denoted as GatePatternij, is constrained to the interval [0,1). This means that for all i,j, where i indexes the query positions and j indexes the key positions:
0≤GatePatternij<1
Why is this strictly less than 1? Surely if the dot product is 1.1 and you clamp, it gets clamped to exactly 1
Thank you for the catch, that is correct, it should be [0, 1]. This was a relic I missed of an older alternative where we were using a modified tanh function to bound [0, 1), I’ll update above accordingly!
Why is this strictly less than 1? Surely if the dot product is 1.1 and you clamp, it gets clamped to exactly 1
Thank you for the catch, that is correct, it should be [0, 1]. This was a relic I missed of an older alternative where we were using a modified tanh function to bound [0, 1), I’ll update above accordingly!