There were 24 songs total. These were played to 33 participants and their neurological responses were recorded.
Then R was used to make 10,000 synthpop songs. These then had SYNTHETIC NEUROLOGICAL DATA GENERATED (!!!???) based on the actually collected neurological data. Synthetic hit/flop labels were also generated. They did that too. Half of this was held back for a validation set and the other half used to train the model.
This model was then used to label the original 24 songs. And got 23 right. Did I mention their fancy neural net contributed like 1% to the final model?
This isn’t data leakage, it’s a data deluge. They used their original data to label synthetic data and then trained the model on synthetic data. And then had the gall to publish when their model was able to label their original data.
https://www.frontiersin.org/articles/10.3389/frai.2023.1154663/full
Surely this won’t end badly.
There were 24 songs total. These were played to 33 participants and their neurological responses were recorded.
Then R was used to make 10,000 synthpop songs. These then had SYNTHETIC NEUROLOGICAL DATA GENERATED (!!!???) based on the actually collected neurological data. Synthetic hit/flop labels were also generated. They did that too. Half of this was held back for a validation set and the other half used to train the model.
This model was then used to label the original 24 songs. And got 23 right. Did I mention their fancy neural net contributed like 1% to the final model?
This isn’t data leakage, it’s a data deluge. They used their original data to label synthetic data and then trained the model on synthetic data. And then had the gall to publish when their model was able to label their original data.