Onsets and Frames
is a deep neural network that transcribes audio of piano performances to
NoteSequences
/MIDI
.
You can use your own piano music file (i.e. actual audio, not midi) for transcription:
Actual Transcription:
It Took:
Total Leaked Memory:
You can record piano audio from a microphone and transcribe it. If you're recording something other than piano (like your voice), this will be transcribed but it will probably be fairly noisy and incorrect.
Actual Transcription:
It Took:
Total Leaked Memory:
We verify the model can transcribe a short sequence of piano audio, first computing its mel spectogram.
Original Audio
Expected Transcription:
Actual Transcription:
Match:
It Took:
Total Leaked Memory:
Below we verify that the model properly transcribes a mel spectrogram to match the output of the Python TensorFlow implementation. We found it is necessary to process the convolution in batches for longer inputs to avoid a GPU timeout enforced in Chrome, and we verify that the transcription works properly with different batch sizes that exercise various edge cases.
Original Audio
Expected Transcription:
Actual Transcription:
Match:
It Took:
Actual Transcription:
Match:
It Took:
Actual Transcription:
Match:
It Took:
Actual Transcription:
Match:
It Took:
Actual Transcription:
Match:
It Took:
Total Leaked Memory: