Utilities for clustering of audio samples
This package contains experiments and utilities for unsupervised learning on acoustic recordings. This package is a use case of SpectralDistances.jl
using Pkg
pkg"add https://github.com/baggepinnen/DetectionIoTools.jl"
pkg"add https://github.com/baggepinnen/AudioClustering.jl"
The following code illustrates how to use SpectralDistances.jl to fit rational spectra to audio samples and extract the poles for use as features
using SpectralDistances, Glob
path = "/home/fredrikb/birds/" # path to a bunch of wav files
cd(path)
files = glob("*.wav")
const fs = 44100
na = 20
fitmethod = LS(na=na)
models = mapsoundfiles(files) do sound
sound = SpectralDistances.bp_filter(sound, (50/fs, 18000/fs))
SpectralDistances.fitmodel(fitmethod, sound)
end
We now have a vector of vectors with linear models fit to the sound files. To make this easier to work with, we flatten this structure to a single long vector and extract the poles (roots) of the linear systems to use as features
X = embeddings(models)
We now have some audio data, represented as poles of rational spectra, in a matrix X
. See https://baggepinnen.github.io/SpectralDistances.jl/latest/examples/#Examples-1 for examples of how to use this matrix for analysis of the signals, e.g., classification, clustering and detection.
A graph representation of X
can be obtained with
G = audiograph(X, 5; λ=0)
where k=5
is the number of nearest neighbors considered when building the graph. If λ=0
the graph will be weighted by distance, whereas if λ>0
the graph will be weigted according to adjacency under the kernel exp(-λ*d)
. The metric used is the Euclidean distance. If you want to use a more sophisticated distance, try, e.g.,
dist = OptimalTransportRootDistance(domain=Continuous(), p=2)
G = audiograph(X, 5, dist; λ=0)
Here, the Euclidean distance will be used to select neighbors, but the edges will be weighted using the provided distance. This avoids having to calculate a very large number of pairwise distances using the more expensive distance metric.
Any graph-based algorithm may now operate on G
, or on the field G.weight
. Further examples are available here.
The following snippets show how to preprocess data to a suitable form for clustering using this package:
using GLob
files = glob("*.wav") # Vector of file paths
const fs = Int(wavread(files[1])[2]) # Rread the sampling time
N = length(files)
using TotalLeastSquares # For lowrankfilter
function lrfilt(y)
yf = lowrankfilter(y, min(250, length(y)-1100), lag=10)
end
"Perform some simple threshold filtering and calculate a spectrogram"
function spec(sound)
@. sound = Float32(100000 * clamp(sound, -0.015f0, 0.015f0)) # the 100000 multiplier is to normalize the Float32 data for better numerical performance. Tune all parameters to you use case.
# sound = lrfilt(sound) # This is an alternative to the above which is *much* better, but also much slower
melspectrogram(sound, 100, 70, nmels=30, fs=fs, fmin=5) # Spend some time making sure spectrogram representation is good.
end
using ThreadTools # For tmap
spectrograms = tmap(files) do file
sound = spec(vec(wavread(file)[1]))
end
matrices = [Float32.(max.(normalize_spectrogram(s), 1e-7)) for s in spectrograms]
# matrices_masked = mask_filter.(matrices) # This is an alternative if the lowrankfilter is not used https://baggepinnen.github.io/SpectralDistances.jl/latest/distances/#SpectralDistances.mask_filter
inds, D = initialize_clusters(dist, matrices; init_multiplier = 10, N_seeds = 100)
patterns = matrices[inds] # These should be good cluster seeds
See docs entry Clustering using a distance matrix
See docs entry Clustering using features
inds, dists, D = knn_accelerated(dist, X, k, Xe=X; kwargs...)
Find the nearest neighbor from using distance metric dist
by first finding the k
nearest neighbors using Euclidean distance on embeddings produced from Xe
, and then using dist
do find the smallest distance within those k
.
X
is assumed to be a vector of something dist
can operate on, such as a vector of models from SpectralDistances. Xe
is by default the same as X
, or possibly something else, as long as embeddings(Xe)
is defined. A vector of models or spectrograms has this function defined.
D
is a sparse matrix with all the computed distances from dist
. This matrix contains raw distance measurements, to symmetrize, call SpectralDistances.symmetrize!(D)
. The returned dists
are already symmetrized.