We compute AUROC(all(sensor_preds), all(sensors)). This is somewhat weird, and it would have been slightly better to do a) (thanks for pointing it out!), but I think the numbers for both should be close since we balance classes (for most settings, if I recall correctly) and the estimates are calibrated (since they are trained in-distribution, there is no generalization question here), so it doesn’t matter much.
The relevant pieces of code can be found by searching for “sensor auroc”:
cat_positives = torch.cat([one_data[“sensor_logits”][:, i][one_data[“passes”][:, i]] for i in range(nb_sensors)])
cat_negatives = torch.cat([one_data[“sensor_logits”][:, i][~one_data[“passes”][:, i]] for i in range(nb_sensors)])
m, s = compute_boostrapped_auroc(cat_positives, cat_negatives)
print(f”sensor auroc pn {m:.3f}±{s:.3f}”)
is individual measurement prediction AUROC a) or b)
a) mean(AUROC(sensor_i_pred, sensor_i))
b) AUROC(all(sensor_preds), all(sensors))
We compute AUROC(all(sensor_preds), all(sensors)). This is somewhat weird, and it would have been slightly better to do a) (thanks for pointing it out!), but I think the numbers for both should be close since we balance classes (for most settings, if I recall correctly) and the estimates are calibrated (since they are trained in-distribution, there is no generalization question here), so it doesn’t matter much.
The relevant pieces of code can be found by searching for “sensor auroc”:
oh I see, by all(sensor_preds) I meant sum([logit_i] for i in n_sensors) (the probability that all sensors are activated). Makes sense, thanks!