MidiME

MidiME is a variational autoencoder that lets you personalize your own MusicVAE model with just a little data, so that samples from MidiME sound closer to your provided data. Normally training a MusicVAE model with new data would take days on a big GPU and millions of data points, but with MidiME you can do it with a single MIDI file, and directly in the browser.

Input data

MidiME works in parallel with a MusicVAE model, so the kind of results you get depend on which MusicVAE checkpoint you use. Here, we give examples of both training on melodies (the "monophonic" section), and multi-instrument trios (the "polyphonic" section).

1. Monophonic models

This is what you want the samples to sound like. For this example we are using the mel_2bar_small checkpoint, which is monophonic -- if you use a polyphonic MIDI file, you're not guaranteed to reconstruct the "main" melody, just a single instrument.

Try training on a single full song to get outputs that sound like variations on it, or train on multiple songs to get samples that combine various characteristics of them.

Training

The MidiMe model works by training a variational autoencoder in the browser. The more steps you train for, the better the input reconstruction will be.

Input reconstruction

Before training:

After training:

It took:

Random samples

You can now sample from MidiMe. If you trained for long enough, then these samples will be very similar to the input data. Contrast this with random samples from MusicVAE, which do not sound like the input data.

Each of the examples below is 5 concatenated 2-bar samples.

From MusicVAE

From trained MidiMe

2. Polyphonic models

In this example we are using the trio_4bar checkpoint, which uses 3 different instruments. You will notice in these results that the reconstruction of your original melody is far worse (and might not even won't necessarily sound like the input), but the random samples are significantly more musical, and contain patterns from the original

Training

We will again train a variational autoencoder in the browser. In this example we will train longer than we did in the monophonic case.

Input reconstruction

You'll notice that unlike the monophonic case, the reconstruction in this case is far worse. This is because this particular MusicVAE checkpoint is well suited for sampling, but not for data reconstruction (you can think of it as a lossier encoding of the training data).

Before training:

After training:

It took:

Random samples

As before, we can now sample from MidiMe. The samples should sound more melodic and have more interesting patterns (that have been learnt from your input data), that the random samples from MusicVAE.

Each of the examples below is 5 concatenated 4-bar samples.

From MusicVAE

From trained MidiMe