MidiME
is a variational autoencoder that lets you personalize
your own MusicVAE
model with just a little data, so that samples
from MidiME
sound closer to your provided data. Normally training
a MusicVAE
model with new data would take days on a big GPU and
millions of data points, but with MidiME
you can do it
with a single MIDI file, and directly in the browser.
MidiME
works in parallel with a MusicVAE
model,
so the kind of results you get depend on which MusicVAE
checkpoint
you use. Here, we give examples of both training on melodies (the "monophonic"
section), and multi-instrument trios (the "polyphonic" section).
This is what you want the samples to sound like. For this example we
are using the mel_2bar_small
checkpoint, which is
monophonic -- if you use a polyphonic MIDI file, you're not
guaranteed to reconstruct the "main" melody, just a single instrument.
Try training on a single full song to get outputs that sound like variations on it, or train on multiple songs to get samples that combine various characteristics of them.
The MidiMe model works by training a variational autoencoder in the browser. The more steps you train for, the better the input reconstruction will be.
Before training:
After training:
It took:
You can now sample from MidiMe. If you trained for long enough, then these samples will be very similar to the input data. Contrast this with random samples from MusicVAE, which do not sound like the input data.
Each of the examples below is 5 concatenated 2-bar samples.
From MusicVAE
From trained MidiMe
In this example we are using the trio_4bar
checkpoint, which
uses 3 different instruments. You will notice in these results that
the reconstruction of your original melody is far worse (and might not
even won't necessarily sound like the input), but the random samples
are significantly more musical, and contain patterns from the original
We will again train a variational autoencoder in the browser. In this example we will train longer than we did in the monophonic case.
You'll notice that unlike the monophonic case, the reconstruction in this case is far worse. This is because this particular MusicVAE checkpoint is well suited for sampling, but not for data reconstruction (you can think of it as a lossier encoding of the training data).
Before training:
After training:
It took:
As before, we can now sample from MidiMe. The samples should sound more melodic and have more interesting patterns (that have been learnt from your input data), that the random samples from MusicVAE.
Each of the examples below is 5 concatenated 4-bar samples.
From MusicVAE
From trained MidiMe