Noise-conditional score networks for music composition by annealed Langevin dynamics
We will introduce a new generative model for music composition, applying
Langevin dynamics to a gradient-based score matching algorithm based on Song
and Ermon, 2019. Unlike implicit models such as GANs, this learns a true,
explicit distribution of the input data.
Previous work has seen a success on modeling from continuous input manifolds,
such high-quality image inpainting and conditional sampling from MNIST,
CIFAR-10, and other datasets. However, it is an open question whether this
algorithm can be adjusted to perform well on discrete domains, such as music
scores.
We hope that Langevin dynamics and score matching can combine the
controllability and of Markov chain Monte Carlo, with the global view and fast
convergence of stochastic gradient descent, to generate high-quality structured,
compositions.
DeepBach is a simple and controllable autoregressive model for Bach chorale
generation, which are features that make it easy to train and use. In
particular, learning Bach chorales is an interesting task because the music is
highly structured (often following various “rules”), consistent, and often
complex.
However, there are many instances where DeepBach is unable to capture long-term
structure. Some
casual listeners
have remarked that the compositions “sound good but go nowhere”. This could be
due to a combination of vanishing LSTM gradients, and Gibbs sampling getting
stuck in 1-optimal local minima.
We believe by applying enough tricks, it should be possible to produce a model
that strongly avoids these local minima, while retaining
controllability.
It was seen in [Welling and Teh, 2011] that directing traditional MCMC
algorithms with learned supervision can greatly accelerate their convergence.
This is what motivates us to augment DeepBach’s approach with score matching.
It’s interesting to analyze other approaches that people have tried in the past:
We think that score matching and Langevin dynamics, by adding graded noise to
the distribution of data, has the potential to perform well on generative
sequence modeling tasks such as music composition, while maintaining the
controllability of models like DeepBach.
This project will be successful if we can implement a score matching algorithm
for music generation and evaluate its feasibility. In the best case, score
matching can be used to improve long-term patterns and interpretability.
However, due to the complexity of the algorithm, results are unclear, and we may
need various tricks or innovations to obtian convergence.
Our goal, then, is to determine the tractability and performance of a
score-matching approach in the discrete domain, which we think is very exciting.
[welling and teh, 2011]:
https://www.ics.uci.edu/~welling/publications/papers/stoclangevin_v6.pdf