QM Vamp Plugins: User Documentation

QM Vamp Plugins

The QM Vamp Plugin set is a library of Vamp audio feature extraction plugins developed at the Centre for Digital Music at Queen Mary, University of London. These plugins are provided as a single library file, made available in source and binary form for Windows, OS/X, and Linux via the SoundSoftware code site (see download page).

For more information about Vamp plugins, see http://www.vamp-plugins.org/ .

1. Note Onset Detector

2. Tempo and Beat Tracker

3. Bar and Beat Tracker

4. Key Detector

5. Tonal Change

6. Adaptive Spectrogram

7. Polyphonic Transcription

8. Segmenter

9. Similarity

10. Discrete Wavelet Transform

11. Constant-Q Spectrogram

12. Chromagram

13. Mel-Frequency Cepstral Coefficients

1. Note Onset Detector

System identifier – vamp:qm-vamp-plugins:qm-onsetdetector
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector
Links – Back to top of library documentation – Download location

Note Onset Detector analyses a single channel of audio and estimates the onset times of notes within the music – that is, the times at which notes and other audible events begin.

It calculates an onset likelihood function for each spectral frame, and picks peaks in a smoothed version of this function. The plugin is non-causal, returning all results at the end of processing.

Parameters

Onset Detection Function Type – The method used to calculate the onset likelihood function. The most versatile method is the default, "Complex Domain" (see reference, Duxbury et al 2003). "Spectral Difference" may be appropriate for percussive recordings, "Phase Deviation" for non-percussive music, and "Broadband Energy Rise" (see reference, Barry et al 2005) for identifying percussive onsets in mixed music.

Onset Detector Sensitivity – Sensitivity level for peak detection in the onset likelihood function. The higher the sensitivity, the more onsets will (rightly or wrongly) be detected. The peak picker does not have a simple threshold level; instead, this parameter controls the required "steepness" of the slopes in the smoothed detection function either side of a peak value, in order for that peak to be accepted as an onset.

Adaptive Whitening – This option evens out the temporal and frequency variation in the signal, which can yield improved performance in onset detection, for example in audio with big variations in dynamics.

Outputs

Note Onsets – The detected note onset times, returned as a single feature with timestamp but no value for each detected note.

Onset Detection Function – The raw note onset likelihood function that was calculated as the first step of the detection process.

Smoothed Detection Function – The note onset likelihood function following median filtering. This is the function from which sufficiently steep peak values are picked and classified as onsets.

References and Credits

Basic detection methods: C. Duxbury, J. P. Bello, M. Davies and M. Sandler, Complex domain Onset Detection for Musical Signals. In Proceedings of the 6th Conference on Digital Audio Effects (DAFx-03). London, UK. September 2003.

Adaptive whitening: D. Stowell and M. D. Plumbley, Adaptive whitening for improved real-time audio onset detection. In Proceedings of the International Computer Music Conference (ICMC'07), August 2007.

Percussion onset detector: D. Barry, D. Fitzgerald, E. Coyle and B. Lawlor, Drum Source Separation using Percussive Feature Detection and Spectral Modulation. ISSC 2005.

The Note Onset Detector Vamp plugin was written by Chris Duxbury, Juan Pablo Bello and Christian Landone.

2. Tempo and Beat Tracker

System identifier – vamp:qm-vamp-plugins:qm-tempotracker
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker
Links – Back to top of library documentation – Download location

Tempo and Beat Tracker analyses a single channel of audio and estimates the positions of metrical beats within the music (the equivalent of a human listener tapping their foot to the beat).

Parameters

Beat Tracking Method – The method used to track beats. The default, "New", uses a hybrid of the "Old" two-state beat tracking model (see reference Davies 2007) and a dynamic programming method (see reference Ellis 2007). A more detailed description is given below within the Bar and Beat Tracker plugin.

Onset Detection Function Type – The algorithm used to calculate the onset likelihood function. The most versatile method is the default, "Complex Domain" (see reference, Duxbury et al 2003). "Spectral Difference" may be appropriate for percussive recordings, "Phase Deviation" for non-percussive music, and "Broadband Energy Rise" (see reference, Barry et al 2005) for identifying percussive onsets in mixed music.

Outputs

Beats – The estimated beat locations, returned as a single feature, with timestamp but no value, for each beat, labelled with the corresponding estimated tempo at that beat.

Onset Detection Function – The raw note onset likelihood function used in beat estimation.

Tempo – The estimated tempo, returned as a feature each time the estimated tempo changes, with a single value for the tempo in beats per minute.

References and Credits

Beat tracking method: M. E. P. Davies and M. D. Plumbley. Context-dependent beat tracking of musical audio. In IEEE Transactions on Audio, Speech and Language Processing. Vol. 15, No. 3, pp1009-1020, 2007;
M. E. P. Davies and M. D. Plumbley. Beat Tracking With A Two State Model. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), Vol. 3, pp241-244 Philadelphia, USA, March 19-23, 2005;
D. P. W. Ellis. Beat Tracking by Dynamic Programming. In Journal of New Music Research. Vol. 37, No. 1, pp51-60, 2007.

Onset detection methods: C. Duxbury, J. P. Bello, M. Davies and M. Sandler, Complex domain Onset Detection for Musical Signals. In Proceedings of the 6th Conference on Digital Audio Effects (DAFx-03). London, UK. September 2003.

Percussion onset detector: D. Barry, D. Fitzgerald, E. Coyle and B. Lawlor, Drum Source Separation using Percussive Feature Detection and Spectral Modulation. ISSC 2005.

The Tempo and Beat Tracker Vamp plugin was written by Matthew Davies and Christian Landone.

3. Bar and Beat Tracker

System identifier – vamp:qm-vamp-plugins:qm-barbeattracker
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-barbeattracker
Links – Back to top of library documentation – Download location

Bar and Beat Tracker analyses a single channel of audio and estimates the positions of bar lines and the resulting counted metrical beat positions within the music (where the first beat of each bar is "1", the equivalent of counting in time to the music). It is closely related to the Tempo and Beat Tracker, producing the same results for beat position as that plugin's "New" beat tracking method.

Method

The plugin first calculates an onset detection function using the "Complex Domain" method (see Tempo and Beat Tracker).

The beat tracking method performs two passes over the onset detection function, first to estimate the tempo contour, and then given the tempo, to recover the beat locations.

To identify the tempo, the onset detection function is partitioned into 6-second frames with a 1.5-second increment. The autocorrelation function of each 6-second onset detection function is found and this is then passed through a perceptually weighted comb filterbank (see reference Davies 2007). The successive comb filterbank output signals are grouped together into a matrix of observations of periodicity through time. The best path of periodicity through these observations is found using the Viterbi algorithm, where the transition matrix is defined as a diagonal Gaussian.

Given the estimates of periodicity, the beat locations are recovered by applying the dynamic programming algorithm (see reference Ellis 2007). This process involves the calculation of a recursive cumulative score function and backtrace signal. The cumulative score indicates the likelihood of a beat existing at each sample of the onset detection function input, and the backtrace gives the location of the best previous beat given this point in time. Once the cumulative score and backtrace have been calculated for the whole input signal, the best path through beat locations is found by recursively sampling the backtrace signal from the end of the input signal back to the beginning. See reference Stark et al. 2009 for a description of the real-time implementation of the beat tracking algorithm.

Once the beat locations have been identified, the plugin makes a second pass over the input audio signal, partitioning it into beat synchronous frames. The audio within each beat frame is down-sampled to give a new sampling frequency of 2.8kHz. A beat-synchronous spectral representation is then calculated within each frame, from which a measure of beat spectral difference is calculated using Jensen-Shannon divergence. The bar boundaries are identified as those beat transitions leading to most consistent spectral change given the specified number of beats per bar.

Parameters

Beats per Bar – The number of beats per bar (or measure). The plugin assumes that the number of beats per bar is fixed throughout the music.

Outputs

Beats – The estimated beat locations, returned as a single feature, with timestamp but no value, for each beat, labelled with the number of that beat within the bar (e.g. consecutively 1, 2, 3, 4 for 4 beats to the bar).

Bars – The estimated bar line locations, returned as a single feature, with timestamp but no value, for each bar.

Beat Count – The estimated beat locations, returned as a single feature, with timestamp and a value corresponding to the number of that beat within the bar. This is similar to the Beats output except that it returns a counting function rather than a series of instants.

Beat Spectral Difference – The new-bar likelihood function used in bar line estimation.

References and Credits

Beat tracking method: A. M. Stark, M. E. P. Davies and M. D. Plumbley. Real-time beat-synchronous analysis of musical audio. To appear in Proceedings of 12th International Conference on Digital Audio Effects (DAFx). 2009;
M. E. P. Davies and M. D. Plumbley. Context-dependent beat tracking of musical audio. In IEEE Transactions on Audio, Speech and Language Processing. Vol. 15, No. 3, pp1009-1020, 2007;
D. P. W. Ellis. Beat Tracking by Dynamic Programming. In Journal of New Music Research. Vol. 37, No. 1, pp51-60, 2007.

Bar finding method: M. E. P. Davies and M. D. Plumbley. A spectral difference approach to extracting downbeats in musical audio. In Proceedings of 14th European Signal Processing Conference (EUSIPCO), Italy, 2006.

The Bar and Beat Tracker Vamp plugin was written by Matthew Davies and Adam Stark.

4. Key Detector

System identifier – vamp:qm-vamp-plugins:qm-keydetector
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector
Links – Back to top of library documentation – Download location

Key Detector analyses a single channel of audio and continuously estimates the key of the music by comparing the degree to which a block-by-block chromagram correlates to the stored key profiles for each major and minor key.

The key profiles are drawn from analysis of Book I of the Well Tempered Klavier by J S Bach, recorded at A=440 equal temperament.

Parameters

Tuning Frequency – The frequency of concert A in the music under analysis.

Window Length – The number of chroma analysis frames taken into account for key estimation. This controls how eager the key detector will be to return short-duration tonal changes as new key changes (the shorter the window, the more likely it is to detect a new key change).

Outputs

Tonic Pitch – The tonic pitch of each estimated key change, returned as a single-valued feature at the point where the key change is detected, with value counted from 1 to 12 where C is 1, C# or Db is 2, and so on up to B which is 12.

Key Mode – The major or minor mode of the estimated key, where major is 0 and minor is 1.

Key – The estimated key for each key change, returned as a single-valued feature at the point where the key change is detected, with value counted from 1 to 24 where 1-12 are the major keys and 13-24 are the minor keys, such that C major is 1, C# major is 2, and so on up to B major which is 12; then C minor is 13, Db minor is 14, and so on up to B minor which is 24.

Key Strength Plot – A grid representing the ongoing key "probability" throughout the music. This is returned as a feature for each chroma frame, containing 25 bins. Bins 1-12 are the major keys from C upwards; bins 14-25 are the minor keys from C upwards. The 13th bin is unused: it just provides space between the first and second halves of the feature if displayed in a single plot.

The outputs are also labelled with pitch or key as text.

References and Credits

Method: see K. Noland and M. Sandler. Signal Processing Parameters for Tonality Estimation. In Proceedings of Audio Engineering Society 122nd Convention, Vienna, 2007.

The Key Detector Vamp plugin was written by Katy Noland and Christian Landone.

5. Tonal Change

System identifier – vamp:qm-vamp-plugins:qm-tonalchange
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange
Links – Back to top of library documentation – Download location

Tonal Change analyses a single channel of audio, detecting harmonic changes such as chord boundaries.

Parameters

Gaussian smoothing – The window length for the internal smoothing operation, in chroma analysis frames. This controls how eager the tonal change detector will be to identify very short-term tonal changes. The default value of 5 is quite short, and may lead to more (not always meaningful) results being returned; for many purposes a larger value, closer to the maximum of 20, may be appropriate.

Chromagram minimum pitch – The MIDI pitch value (0-127) of the minimum pitch included in the internal chromagram analyis.

Chromagram maximum pitch – The MIDI pitch value (0-127) of the maximum pitch included in the internal chromagram analyis.

Chromagram tuning frequency – The frequency of concert A in the music under analysis.

Outputs

Transform to 6D Tonal Content Space – A representation of the musical content in a six-dimensional tonal space onto which the algorithm maps 12-bin chroma vectors extracted from the audio.

Tonal Change Detection Function – A function representing the estimated likelihood of a tonal change occurring in each spectral frame.

Tonal Change Positions – The resulting estimated positions of tonal changes.

References and Credits

Method: C. A. Harte, M. Gasser, and M. Sandler. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM workshop on Audio and Music Computing Multimedia, Santa Barbara, 2006.

The Tonal Change Vamp plugin was written by Chris Harte and Martin Gasser.

6. Adaptive Spectrogram

System identifier – vamp:qm-vamp-plugins:qm-adaptivespectrogram
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-adaptivespectrogram
Links – Back to top of library documentation – Download location

Adaptive Spectrogram produces a composite spectrogram from a set of series of short-time Fourier transforms at differing resolutions. Values are selected from these spectrograms by repeated subdivision by time and frequency in order to maximise an entropy function across each column.

Parameters

Number of resolutions – The number of distinct resolutions to calculate and use. The resolutions will be consecutive powers of two starting from the smallest resolution specified.

Smallest resolution – The smallest of the set of resolutions to use.

Omit alternate resolutions – Causes the plugin to ignore alternate resolutions (i.e. the smallest resolution multiplied by 2, 8, 32, etc) when composing a spectrogram. The smallest resolution specified, and its multiples by 4, 16, etc as applicable, will be retained. The total number of resolutions actually included in the resulting spectrogram will therefore be N/2 (for even N) or (N+1)/2 (for odd N) where N is the value of the "number of resolutions" parameter. This permits a wider range of resolutions to be included with less processing, at obvious cost in quality.

Multi-threaded processing – Enables multi-threading of the spectrogram calculation. This usually results in somewhat faster processing where multiple CPU cores are available.

As an example of the resolution parameters, if the "number of resolutions" is set to 5, "smallest resolution" to 128, and "omit alternate resolutions" is not used, the composite spectrogram will be calculated using spectrograms from 128, 256, 512, 1024, and 2048 point short-time Fourier transforms (with 50% overlap in each case). With "omit alternate resolutions" set, the same parameters would result in spectrograms from 128, 512, and 2048 point STFTs being used.

References and Credits

Method: X. Wen and M. Sandler. Composite spectrogram using multiple Fourier transforms. IET Signal Processing, 3(1):51-63, 2009.

The Adaptive Spectrogram Vamp plugin was written by Wen Xue and Chris Cannam.

7. Polyphonic Transcription

System identifier – vamp:qm-vamp-plugins:qm-transcription
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-transcription
Links – Back to top of library documentation – Download location

The Polyphonic Transcription plugin estimates a note transcription using MIDI pitch values from its input audio, returning a feature for each note (with timestamp and duration) whose value is the MIDI pitch number. Velocity is not estimated.

Although the published description of the method is described as real-time, the implementation used in this plugin is non-causal; it buffers its input to operate on in a single unit, doing all the real work after its entire input has been received, and is very memory intensive. However, it is relatively fast (faster than real-time) compared to other polyphonic transcription methods.

The plugin works best at 44.1KHz input sample rate, and is tuned for piano and guitar music.

References and Credits

Method: R. Zhou and J. D. Reiss. A Real-Time Polyphonic Music Transcription System. In Proceedings of the Fourth Music Information Retrieval Evaluation eXchange (MIREX), Philadelphia, USA, 2008;
R. Zhou and J. D. Reiss. A Real-Time Frame-Based Multiple Pitch Estimation Method Using the Resonator Time Frequency Image. Third Music Information Retrieval Evaluation eXchange (MIREX), Vienna, Austria, 2007.

The Polyphonic Transcription Vamp plugin was written by Ruohua Zhou.

8. Segmenter

System identifier – vamp:qm-vamp-plugins:qm-segmenter
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter
Links – Back to top of library documentation – Download location

Segmenter divides a single channel of music up into structurally consistent segments. It returns a numeric value (the segment type) for each moment at which a new segment starts.

For music with clearly tonally distinguishable sections such as verse, chorus, etc., segments with the same type may be expected to be similar to one another in some structural sense. For example, repetitions of the chorus are likely to share a segment type.

The plugin only attempts to identify similar segments; it does not attempt to label them. For example, it makes no attempt to tell you which segment is the chorus.

Note that this plugin does a substantial amount of processing after receiving all of the input audio data, before it produces any results.

Method

The method relies upon structural/timbral similarity to obtain the high-level song structure. This is based on the assumption that the distributions of timbre features are similar over corresponding structural elements of the music.

The algorithm works by obtaining a frequency-domain representation of the audio signal using a Constant-Q transform, a Chromagram or Mel-Frequency Cepstral Coefficients (MFCC) as underlying features (the particular feature is selectable as a parameter). The extracted features are normalised in accordance with the MPEG-7 standard (NASE descriptor), which means the spectrum is converted to decibel scale and each spectral vector is normalised by the RMS energy envelope. The value of this envelope is stored for each processing block of audio. This is followed by the extraction of 20 principal components per block using PCA, yielding a sequence of 21 dimensional feature vectors where the last element in each vector corresponds to the energy envelope.

A 40-state Hidden Markov Model is then trained on the whole sequence of features, with each state of the HMM corresponding to a specific timbre type. This process partitions the timbre-space of a given track into 40 possible types. The important assumption of the model is that the distribution of these features remain consistent over a structural segment. After training and decoding the HMM, the song is assigned a sequence of timbre-features according to specific timbre-type distributions for each possible structural segment.

The segmentation itself is computed by clustering timbre-type histograms. A series of histograms are created over a sliding window which are grouped into M clusters by an adapted soft k-means algorithm. Each of these clusters will correspond to a specific segment-type of the analyzed song. Reference histograms, iteratively updated during clustering, describe the timbre distribution for each segment. The segmentation arises from the final cluster assignments.

Parameters

Number of segment-types – The maximum number of clusters (segment-types) to be returned. The default is 10. Unlike many clustering algorithms, the constrained clustering used in this plugin does not produce too many clusters or vary significantly even if this is set too high. However, this parameter can be useful for limiting the number of expected segment-types.

Feature Type – The type of spectral feature used for segmentation. The available features are:

"Hybrid", the default, which uses a Constant-Q transform (see related plugin): this is generally effective for modern studio recordings;
"Chromatic", using a chromagram derived from the Constant-Q feature (see related plugin): this may be preferable for live, acoustic, or older recordings, in which repeated sections may be less consistent in sound;
"Timbral", using Mel-Frequency Cepstral Coefficients (see related plugin), which is more likely to result in classification by instrumentation rather than musical content.

Minimum segment duration – The approximate expected minimum duration for a segment, from 1 to 15 seconds. Changing this parameter may help the plugin to find musical sections rather than just following changes in the sound of the music, and also avoid wasting a segment-type cluster for timbrally distinct but too-short segments. The default of 4 seconds usually produces good results.

Outputs

Segmentation – The estimated segment boundaries, returned as a single feature with one value at each segment boundary, with the value representing the segment type number for the segment starting at that boundary.

References and Credits

Method: M. Levy and M. Sandler. Structural segmentation of musical audio by constrained clustering. IEEE Transactions on Audio, Speech, and Language Processing, February 2008.

Note that this plugin does not implement the beat-sychronous aspect of the segmentation method described in the paper.

The Segmenter Vamp plugin was written by Mark Levy. Thanks to George Fazekas for providing much of this documentation.

9. Similarity

System identifier – vamp:qm-vamp-plugins:qm-similarity
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity
Links – Back to top of library documentation – Download location

Similarity treats each channel of its audio input as a separate "track", and estimates how similar the tracks are to one another using a selectable similarity measure.

The plugin also returns the intermediate data used as a basis of the similarity measure; it can therefore be used on a single channel of input (with the resulting intermediate data then being applied in some other similarity or clustering algorithm, for example) if desired, as well as with multiple inputs.

Because of the way this plugin handles multiple inputs, by assuming that each channel represents a separate piece of music, it may not be appropriate for use directly in a general-purpose host (unless you actually want to do something like compare two stereo channels for timbral similarity, which is unlikely).

Parameters

Feature Type – The underlying audio feature used for the similarity measure. The available features are:

"Timbre", in which the distance between tracks is a symmetrised Kullback-Leibler divergence between Gaussian-modelled MFCC means and variances across each track, for the first 20 MFCCs including C0 (see related plugin);
"Chroma", which uses Kullback-Leibler divergence of mean chroma histogram (see related plugin);
"Rhythm", using the cosine distance between "beat spectrum" measures derived from a short sampled section of the track;
and combined "Timbre and Rhythm" and "Chroma and Rhythm" features.

Outputs

Distance Matrix – A matrix of the distance measures between input channels, returned as a series of vector features timestamped at one-second intervals. The distance from channel i to channel j appears as the j'th bin of the feature at time i.

Distance from First Channel – A single vector feature, timestamped at time zero, containing the distances between the first input channel and each of the input channels (including the first channel itself at bin 0, which should have zero distance).

Ordered Distances from First Channel – A pair of vector features, at times 0 and 1 second. The feature at time 0 contains the 1-based indices of the input channels in the order of similarity to the first input channel (so its first bin should always contain 1, as the first channel is most similar to itself). The feature at time 1 contains, in bin n, the distance between the first input channel and the channel with index found at bin n of the feature at time 0.

Feature Means – A series of vector features containing the mean values of each of the feature bins across the duration of each of the input channels. This output returns one feature for each input channel, timestamped at one-second intervals. The number of bins for each feature depends on the feature type; it will be 20 for MFCC features and 12 for chroma features. No features will be returned on this output if the feature type is purely rhythmic.

Feature Variances – Just as Feature Means, but variances.

Beat Spectra – A series of vector features containing the rhythmic autocorrelation profiles (beat spectra) for each of the input channels. This output returns one 512-bin feature for each input channel, timestamped at one-second intervals. No features will be returned on this output if the feature type contains no rhythm component.

References and Credits

Timbral similarity: M. Levy and M. Sandler. Lightweight measures for timbral similarity of musical audio. In Proceedings of the 1st ACM workshop on Audio and Music Computing Multimedia, Santa Barbara, 2006.

Combined rhythmic and timbral similarity: K. Jacobson. A Multifaceted Approach to Music Similarity. In Proceedings of the Seventh International Conference on Music Information Retrieval (ISMIR), 2006.

The Similarity Vamp plugin was written by Mark Levy, Kurt Jacobson and Chris Cannam.

10. Discrete Wavelet Transform

System identifier – vamp:qm-vamp-plugins:qm-dwt
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-dwt
Links – Back to top of library documentation – Download location

Discrete Wavelet Transform plugin performs the forward DWT on the signal. The wavelet coefficients are derived from a fast segmented DWT algorithm without block end effects. The DWT can be performed with various functions from a selection of wavelets up to the 16th scale.

The wavelet coefficients are returned as feature columns at a rate of half the sample rate of the signal to be analysed. To simulate multiresolution in the layer data table, the coefficient values at higher scales are copied multiple times according to the number of the scale. For example, for scale 2 each value will appear twice, at scale 3 they will be appear four times, at scale 4 there will be 8 times the same coefficient value in order to simulate the lower resolution at higher scales.

Parameters

Scales – Adjusts the number of scales of the DWT. The processing block size needs to be set to at least 2ⁿ, where n = number of scales.

Wavelet – Selects the wavelet function to be used for the transform. Wavelets from the following families are available: Daubechies, Symlets, Coiflets, Biorthogonal, Meyer.

References and Credits

Principles: S. Mallat. A theory for multiresolution signal decomposition: the wavelet representation. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 11 (1989), pp. 674-693;
P. Rajmic and J. Vlach. Real-Time Audio Processing via Segmented Wavelet Transform. In Proceedings of the 10th Int. Conference on Digital Audio Effects (DAFx-07), Bordeaux, France, September 10-15, 2007.

The Discrete Wavelet Transform plugin was written by Thomas Wilmering.

11. Constant-Q Spectrogram

System identifier – vamp:qm-vamp-plugins:qm-constantq
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq
Links – Back to top of library documentation – Download location

Constant-Q Spectrogram calculates a spectrogram based on a short-time windowed constant Q spectral transform. This is a spectrogram in which the ratio of centre frequency to resolution is constant for each frequency bin. The frequency bins correspond to the frequencies of "musical notes" rather than being linearly spaced in frequency as they are for the conventional DFT spectrogram.

The pitch range and the number of frequency bins per octave may be adjusted using the plugin's parameters. Note that the plugin's preferred step and block sizes are defined by these parameters, and the plugin will not accept any other block size than its preferred value.

Parameters

Minimum Pitch – The MIDI pitch value (0-127) corresponding to the lowest frequency to be included in the constant-Q transform.

Maximum Pitch – The MIDI pitch value (0-127) corresponding to the lowest frequency to be included in the constant-Q transform.

Tuning Frequency – The frequency of concert A in the music under analysis.

Bins per Octave – The number of constant-Q transform bins to be computed per octave.

Normalized – Whether to normalize each output column to unit maximum.

Outputs

Constant-Q Spectrogram – The calculated spectrogram, as a single feature per process block containing one bin for each pitch included in the spectrogram's range.

References and Credits

Principle: J. Brown. Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America, 89(1): 425-434, 1991.

The Constant-Q Spectrogram Vamp plugin was written by Christian Landone.

12. Chromagram

System identifier – vamp:qm-vamp-plugins:qm-chromagram
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram
Links – Back to top of library documentation – Download location

Chromagram calculates a constant Q spectral transform (as in the Constant Q Spectrogram plugin) and then wraps the frequency bin values into a single octave, with each bin containing the sum of the magnitudes from the corresponding bin in all octaves. The number of values in each feature vector returned by the plugin is therefore the same as the number of bins per octave configured for the underlying constant Q transform.

The pitch range and the number of frequency bins per octave for the transform may be adjusted using the plugin's parameters. Note that the plugin's preferred step and block sizes depend on these parameters, and the plugin will not accept any other block size than its preferred value.

Parameters

Minimum Pitch – The MIDI pitch value (0-127) corresponding to the lowest frequency to be included in the constant-Q transform used in calculating the chromagram.

Maximum Pitch – The MIDI pitch value (0-127) corresponding to the lowest frequency to be included in the constant-Q transform used in calculating the chromagram.

Tuning Frequency – The frequency of concert A in the music under analysis.

Bins per Octave – The number of constant-Q transform bins to be computed per octave, and thus the total number of bins present in the resulting chromagram.

Normalized – Whether to normalize each output column. Normalization may be to unit sum or unit maximum.

Outputs

Chromagram – The calculated chromagram, as a single feature per process block containing the number of bins given in the bins per octave parameter.

References and Credits

The Chromagram Vamp plugin was written by Christian Landone.

13. Mel-Frequency Cepstral Coefficients

System identifier – vamp:qm-vamp-plugins:qm-mfcc
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc
Links – Back to top of library documentation – Download location

Mel-Frequency Cepstral Coefficients calculates MFCCs from a single channel of audio. These coefficients, derived from a cosine transform of the mapping of an audio spectrum onto a frequency scale modelled on human auditory response, are widely used in speech recognition, music classification and other tasks.

Parameters

Number of Coefficients – The number of MFCCs to return. Commonly used values include 13 or the default 20. This number includes C0 if requested (see Include C0 below).

Power for Mel Amplitude Logs – An optional power value to which the spectral amplitudes should be raised before applying the cosine transform. Values greater than 1 may in principle reduce the contribution of noise to the results. The default is 1.

Include C0 – Whether to include the "zero'th" coefficient, which simply reflects the overall signal power across the Mel frequency bands.

Outputs

Coefficients – The MFCC values, returned as one vector feature per processing block.

Means of Coefficients – The overall means of the MFCC bins, as a single vector feature with time 0 that is returned when processing is complete.

References and Credits

MFCCs in music: See B. Logan. Mel-Frequency Cepstral Coefficients for Music Modeling. In Proceedings of the First International Symposium on Music Information Retrieval (ISMIR), 2000.

The Mel-Frequency Cepstral Coefficients Vamp plugin was written by Nicolas Chetry and Chris Cannam.