Is it actually true that you only trained on 5% of the dataset for filtering (I’m assuming training for 20 epochs)?
For filtering it was 25% of best scores, so we effectively trained for 4 epochs.
(We had different threshold for filtering and conditional training, note that we filter at document level but condition at sentence level.)
Is it actually true that you only trained on 5% of the dataset for filtering (I’m assuming training for 20 epochs)?
For filtering it was 25% of best scores, so we effectively trained for 4 epochs.
(We had different threshold for filtering and conditional training, note that we filter at document level but condition at sentence level.)