项目作者: stefanGT44

项目描述 :
Real-time desktop audio histogram from scratch.
高级语言: Java
项目地址: git://github.com/stefanGT44/AudioVisualizer-RealTime-Histogram.git


AudioVisualizer-RealTime-Histogram

This is a small JavaFX desktop application for real-time audio visualization (histogram) built from scratch.

Overview

The application supports visualization and playback of .wav files or direct microphone input.

The audio is visually represented with a histogram - bars on the X axis represent frequencies ranging from 43Hz up to 22050Hz, the Y axis represents the amplitude (strength) of frequencies.

Alt text
Alt text

Implementation details

1. Calculating the Mel filter bank

The visualizer mimics the human auditory system, which is logarithmic in its nature. This means that as the frequencies get higher we can detect fewer changes in sound. Because of this the Mel scale is used for selecting which frequencies are shown in the visualizer, and for calculating their amplitudes. The Mel filter bank is an array of logarithmically spaced frequencies, the filters are triangular and overlapping which means the first filter starts at index 0, has a center at index 1 and ends at index 2, the second filter starts at index 1, has a center at 2 and ends at 3 etc. Since there are 64 bars (frequency bands) in the histogram, 64 filters are needed -> there are 66 frequencies in the filter bank array.

Alt text

2. Reading raw audio data slices (from .wav file or microphone) and (if .wav) writing to output (speakers) - playing audio

After the Mel filter bank is created, audio processing can begin.

3. Unpacking raw data into samples and applying the Hamming window function

Because we are taking fixed slices of audio we use a window function to smooth out the transition between two slices. This helps emphasize the key characteristics of each time slice.

4. Processing samples and drawing the Histogram

FFT (Fast Fourier transform) decomposes the sequence of samples (sound wave) into components of different frequencies (base harmonics - elementary sound waves). Using the Mel filter bank we know which components to use for computing the magnitude of each frequency band (bar on the X axis). After computing all 64 magnitudes, corresponding rectangles representing frequency bands (bars) are drawn and appropriately scaled.

Small lines on top of the bars on the graph represent the maximum magnitudes of the corresponding frequency bands during a session.

Sidenote

This was a small side project I had done in my spare time at the start of the 5th semester, influenced by studying audio processing for the course - Speech recognition at the Faculty of Computer Science in Belgrade.

Download

You can download the .jar files here.

To run the AudioVisualizerPlayer.jar it must be within the same folder as the audioFiles folder.

Contributors