One thing that makes this hard to automate is human imprecision in generating a recording, espeically with rhythm: notes encode frequencies but also timings and durations, and humans performing a song will never get those things exactly precise (nor should they—good performance tends to involve being a little free with rhythms in ways that shouldn’t be directly reflected in the sheet music), so any automatic transcriber will get silly-looking slightly off rhythms that still need judgment to adjust.
This seems solvable by using multiple recordings and averaging, yes?
Also, if the transcription to sheet-music form is accurate w.r.t. the recording, and the recording is acceptable w.r.t. the intended notes, then the transcription ought to be close enough to the intended notes. Or am I misunderstanding?
[edit: one issue is that some irregularities will in fact be correlated across takes and STILL shouldn’t be written down—like, sometimes a song will slow down gradually over the course of a couple measures, and the way to deal with that is to write the notes as though no slowdown is happening and then write “rit.” (means “slow down”) over the staff, NOT to write gradually longer notes; this might be tunable post facto but I think that itself would take human (or really good AI) judgment that’s not necessarily much easier than just transcribing it manually to start]
re point 2 - the thing is you’d get a really irregular-looking hard to read thing that nobody could sightread. (actually this is already somewhat true for a lot of folk-style songs that sound intuitive but look really confusing when written down)
One thing that makes this hard to automate is human imprecision in generating a recording, espeically with rhythm: notes encode frequencies but also timings and durations, and humans performing a song will never get those things exactly precise (nor should they—good performance tends to involve being a little free with rhythms in ways that shouldn’t be directly reflected in the sheet music), so any automatic transcriber will get silly-looking slightly off rhythms that still need judgment to adjust.
This seems solvable by using multiple recordings and averaging, yes?
Also, if the transcription to sheet-music form is accurate w.r.t. the recording, and the recording is acceptable w.r.t. the intended notes, then the transcription ought to be close enough to the intended notes. Or am I misunderstanding?
re point 1 - maybe? unsure
[edit: one issue is that some irregularities will in fact be correlated across takes and STILL shouldn’t be written down—like, sometimes a song will slow down gradually over the course of a couple measures, and the way to deal with that is to write the notes as though no slowdown is happening and then write “rit.” (means “slow down”) over the staff, NOT to write gradually longer notes; this might be tunable post facto but I think that itself would take human (or really good AI) judgment that’s not necessarily much easier than just transcribing it manually to start]
re point 2 - the thing is you’d get a really irregular-looking hard to read thing that nobody could sightread. (actually this is already somewhat true for a lot of folk-style songs that sound intuitive but look really confusing when written down)