Author Topic: Sampling rates, input domain and sonification (Read 13003 times)

justin · « **on:** July 11, 2012, 15:59:22 »

Hello there,

Newbie to the vamp-world, just about to write my first plug-in and have encountered some doubts I was hoping someone could help me out with:

1) Sampling rate:

Imagine my algorithm requires a fixed sampling rate (e.g. fs = 44100), block size (e.g. 2048 @ fs = 44.1k) and hop size (e.g. 1024 @ fs=44.1kHz). I understand that I can specify a preferred block and hop size, and even return false in the initialization function if the host specifies something else. But, what about the sampling rate? Specifying a required block/hop size (in samples) is not really useful if the sampling rate is not known.

I realise I can save the value of inputSampleRate to a parameter, then check it in the initialization function and return false if it's not 44100, but that would be quite annoying for a user analysing audio with different sampling rates. Is there no way to re-sample the audio before it is chopped into blocks? Re-sampling the audio after it is already chopped into blocks means I have no control over the block/hop size (in terms of their actual duration in seconds).

2) Time-domain filtering:

Is there any way to apply a time domain filter first, and then get the input in the frequency domain? Just hoping to avoid having to compute the DFT inside the plug-in itself.

3) Sonification (sonic visualiser):

In sonic visualiser I see that some output types can be sonified (e.g. clicks at detected onsets). If the output of my plug-in is a continuous per-frame frequency value (in Hz), is there any way to sonify the output in sonic visualiser, e.g. with a sinusoid that follows the frequency of the output?

Thanks!

Justin

cannam · « **Reply #1 on:** July 12, 2012, 09:21:36 »

Hello!

Quote from: justin on July 11, 2012, 15:59:22

1) Sampling rate:

Imagine my algorithm requires a fixed sampling rate (e.g. fs = 44100), block size (e.g. 2048 @ fs = 44.1k) and hop size (e.g. 1024 @ fs=44.1kHz). I understand that I can specify a preferred block and hop size, and even return false in the initialization function if the host specifies something else. But, what about the sampling rate? Specifying a required block/hop size (in samples) is not really useful if the sampling rate is not known.

I'm afraid the samplerate is one thing your plugin has no control over. It must accept whatever's supplied in the constructor, and can only then return false on initialise if the rate is unsatisfactory.

Note that the plugin's block and hop size can depend on the samplerate, they don't have to be hardcoded -- if the reason you want to fix the samplerate is in order to have known block and hop size in physical units (i.e. seconds), you may be able to do it the other way around -- calculate the block and hop size on request based on the samplerate.

Quote

I realise I can save the value of inputSampleRate to a parameter

You don't actually need to save it, it's stored for you in the Plugin base class. (This is probably bad form in terms of software practice, but still)

Quote

Is there no way to re-sample the audio before it is chopped into blocks?

No.

Quote

2) Time-domain filtering:

Is there any way to apply a time domain filter first, and then get the input in the frequency domain? Just hoping to avoid having to compute the DFT inside the plug-in itself.

No, the frequency-domain input is essentially a convenience option for plugins simple enough to be happy to work from STFT data without having to have too much control over it (the host also controls the window shape, for example). More sophisticated plugins will need to work from time-domain data.

In hindsight I wish we had put a generally accessible FFT implementation in the SDK on the plugin side, as well as in PluginInputDomainAdapter on the host side -- there are now many, many duplicates of FFT functions in Vamp plugins out there! Perhaps it's not too late to add it, even.

Quote

3) Sonification (sonic visualiser):

In sonic visualiser I see that some output types can be sonified (e.g. clicks at detected onsets). If the output of my plug-in is a continuous per-frame frequency value (in Hz), is there any way to sonify the output in sonic visualiser, e.g. with a sinusoid that follows the frequency of the output?

No, Sonic Visualiser only contains a MIDI-note-based sound generator that uses sampled sounds. Again though, I know quite a lot of people would find this useful. Maybe I should look at it...

Sorry to have such a negative list of responses for you. The positive side is that it looks as if you've understood the SDK and its limitations pretty well...

Chris

justin · « **Reply #2 on:** July 12, 2012, 14:10:55 »

Hi Chris,

Thanks for the speedy reply!

Quote from: cannam on July 12, 2012, 09:21:36

Quote from: justin on July 11, 2012, 15:59:22
1) Sampling rate:

Imagine my algorithm requires a fixed sampling rate (e.g. fs = 44100), block size (e.g. 2048 @ fs = 44.1k) and hop size (e.g. 1024 @ fs=44.1kHz). I understand that I can specify a preferred block and hop size, and even return false in the initialization function if the host specifies something else. But, what about the sampling rate? Specifying a required block/hop size (in samples) is not really useful if the sampling rate is not known.

I'm afraid the samplerate is one thing your plugin has no control over. It must accept whatever's supplied in the constructor, and can only then return false on initialise if the rate is unsatisfactory.

Note that the plugin's block and hop size can depend on the samplerate, they don't have to be hardcoded -- if the reason you want to fix the samplerate is in order to have known block and hop size in physical units (i.e. seconds), you may be able to do it the other way around -- calculate the block and hop size on request based on the samplerate.

Yes, I guess I'll have to look into this option. In theory it should be possible, though some algorithmic steps might make this somewhat complicated in my case. Worst-case-scenario the first version of the plugin will only support 44.1kHz

Would be an awesome future feature though, to have the host re-sample the audio based on the request of the plugin before passing the audio blocks.

Extra question 1: from the programmer's guide I take it the first block is not centred on time zero but rather starts at the first sample of the audio right? (double checking, as this could cause alignment issues when checking against ground-truths centred on time 0).

Extra question 2: imagine I want initialisation to fail because I'm not happy with something (e.g. sampling rate). Is there any way of communicating the specific reason for the failure to the user? On a command-line host I could write to cerr, but for sonic visualiser?

Quote

Quote
I realise I can save the value of inputSampleRate to a parameter

You don't actually need to save it, it's stored for you in the Plugin base class. (This is probably bad form in terms of software practice, but still)

Noted, cheers.

Quote

Quote
2) Time-domain filtering:

Is there any way to apply a time domain filter first, and then get the input in the frequency domain? Just hoping to avoid having to compute the DFT inside the plug-in itself.

No, the frequency-domain input is essentially a convenience option for plugins simple enough to be happy to work from STFT data without having to have too much control over it (the host also controls the window shape, for example). More sophisticated plugins will need to work from time-domain data.

In hindsight I wish we had put a generally accessible FFT implementation in the SDK on the plugin side, as well as in PluginInputDomainAdapter on the host side -- there are now many, many duplicates of FFT functions in Vamp plugins out there! Perhaps it's not too late to add it, even.

Yes that would definitely speed up the development process for us MIR folk rewriting our code as vamp-plugins.

Quote

Quote
3) Sonification (sonic visualiser):

In sonic visualiser I see that some output types can be sonified (e.g. clicks at detected onsets). If the output of my plug-in is a continuous per-frame frequency value (in Hz), is there any way to sonify the output in sonic visualiser, e.g. with a sinusoid that follows the frequency of the output?

No, Sonic Visualiser only contains a MIDI-note-based sound generator that uses sampled sounds. Again though, I know quite a lot of people would find this useful. Maybe I should look at it...

That would be great. Hmm, I've only just started and I seem to be making quite a lot of feature requests... sorry! But like you said, there are many plugins (especially pitch related ones) that would be upgraded from "cool" to "awesome" if such a sonification was available.

Quote

Sorry to have such a negative list of responses for you. The positive side is that it looks as if you've understood the SDK and its limitations pretty well...

No worries, I appreciate the prompt reply. And yes, the combination of the programmer's guide and the "From Method to Plugin" tutorial + skeleton code makes it very easy to get started!

Thanks,

Justin

cannam · « **Reply #3 on:** July 12, 2012, 14:27:41 »

Quote from: justin on July 12, 2012, 14:10:55

Extra question 1: from the programmer's guide I take it the first block is not centred on time zero but rather starts at the first sample of the audio right? (double checking, as this could cause alignment issues when checking against ground-truths centred on time 0).

Depends on the host, but you can tell from the timestamp provided.

The docs (http://code.soundsoftware.ac.uk/embedded/vamp-plugin-sdk/classVamp_1_1Plugin.html#ae4aed3bebfe80a2e2fccd3d37af26996) say that "[t]he timestamp will be the real time in seconds of the centre of the FFT input window". Therefore, if the first timestamp is zero, that should mean you are being passed a window centred on the start of the audio rather than starting with the first sample.

Quote

Extra question 2: imagine I want initialisation to fail because I'm not happy with something (e.g. sampling rate). Is there any way of communicating the specific reason for the failure to the user?

Sadly not.

Quote

Quote
In hindsight I wish we had put a generally accessible FFT implementation in the SDK on the plugin side

Yes that would definitely speed up the development process for us MIR folk rewriting our code as vamp-plugins.

Well, I was working on the SDK today anyway so I've added it for the 2.4 release. Not everything is updated yet, but the source is at http://code.soundsoftware.ac.uk/projects/vamp-plugin-sdk/files now.

Chris

justin · « **Reply #4 on:** July 12, 2012, 16:42:20 »

Quote from: cannam on July 12, 2012, 14:27:41

Depends on the host, but you can tell from the timestamp provided.

The docs (http://code.soundsoftware.ac.uk/embedded/vamp-plugin-sdk/classVamp_1_1Plugin.html#ae4aed3bebfe80a2e2fccd3d37af26996) say that "[t]he timestamp will be the real time in seconds of the centre of the FFT input window". Therefore, if the first timestamp is zero, that should mean you are being passed a window centred on the start of the audio rather than starting with the first sample.

aha, perfect.

Quote

Sadly not.

ok, I'll try make it as clear as possible in the accompanying documentation.

Quote

Well, I was working on the SDK today anyway so I've added it for the 2.4 release. Not everything is updated yet, but the source is at http://code.soundsoftware.ac.uk/projects/vamp-plugin-sdk/files now.

Nice. I've had a look - any specific reason for using Cross's implementation? I've been researching free FFT libraries (non GPL so that plugin authors are not obliged to publish the source code), after snooping around http://www.fftw.org/benchfft/ I thought perhaps the code by Ooura (http://www.kurims.kyoto-u.ac.jp/~ooura/fft.html) could do the trick (unless you need non-power-of-two blocks). Anyway, just curious.

Thanks again for all the useful feedback,
Justin

cannam · « **Reply #5 on:** July 12, 2012, 16:52:21 »

Quote from: justin on July 12, 2012, 16:42:20

Nice. I've had a look - any specific reason for using Cross's implementation?

Just that it is very, very simple -- it's the simplest implementation I know for the most basic level of support.

Users who really care about it (either because they want the fastest or because they need some specific functionality) will probably want to do something else regardless of whether the provided version is from Cross, Ooura, or KissFFT. This way at least it doesn't provide a significant overhead in library complexity.

It's slow, compared to the fastest implementations, but it's not so slow as to be a huge overhead in most real-world methods. It's good enough to be a sensible way to get your algorithm started.

Chris

justin · « **Reply #6 on:** July 12, 2012, 17:09:47 »

Can't argue with that

Justin

justin · « **Reply #7 on:** July 12, 2012, 19:32:39 »

Quote from: cannam on July 12, 2012, 09:21:36

Note that the plugin's block and hop size can depend on the samplerate, they don't have to be hardcoded -- if the reason you want to fix the samplerate is in order to have known block and hop size in physical units (i.e. seconds), you may be able to do it the other way around -- calculate the block and hop size on request based on the samplerate.

Sorry, last question for the day...

I've been experimenting with setting the step and block size as a function of the inputSampleRate, and on the way I've encountered some weird behaviour from SV. Imagine that in getPreferredBlockSize() I return a value X (X can be a function of inputSample rate or just a fixed number). To ensure the user hasn't changed the block size, in initialise() I run a simple check: if (blockSize != X) return false; But when trying this with SV, the plugin runs (i.e. there's no initialisation error), regardless of what blockSize I choose in the graphical interface. Curiously, if in getPreferredBlockSize() I return a value X, but in initialise() I check for a *different* value Y, then I get the desired error, and if through the interface I change X to Y, the plugin runs as expected. The problem seems to be when the preferred blockSize and the value I'm checking against in initialise() are the same value. To clarify, here's the code of a simple test I ran:

Code: [Select]

size_t
MelodyExtraction::getPreferredBlockSize() const
{	
	return 2048;
}

bool
MelodyExtraction::initialise(size_t channels, size_t stepSize, size_t blockSize)
{
    if (channels < getMinChannelCount() ||
	channels > getMaxChannelCount()) return false;

    if (blockSize != 2048) return false;

    // Real initialisation work goes here!

    return true;
}

In this example, if through the user interface I change the blockSize from 2048 to something else, the plugin still runs

.
I'm most probably doing something wrong, but I can't figure out what.

Thanks!

cannam · « **Reply #8 on:** July 12, 2012, 19:44:01 »

Quote from: justin on July 12, 2012, 19:32:39

But when trying this with SV, the plugin runs (i.e. there's no initialisation error), regardless of what blockSize I choose in the graphical interface.

Ah, I see what's happening here... SV is being too clever for its own good.

What's happening is that when initialise() fails, SV then quietly reinitialises the plugin using its preferred settings and runs it with those instead.

That is almost certainly a bad idea. I'm struggling to think of any situation in which it would be a particularly wise thing to do. There might be one, though, and I ought to check the version control logs before I intemperately rip out the code for the forthcoming release!

Chris

justin · « **Reply #9 on:** July 12, 2012, 19:52:58 »

that explains it. I was running different tests for about an hour to make sure this was actually happening before posting on the forum. At least I know I'm not crazy

j

justin · « **Reply #10 on:** July 26, 2012, 10:48:09 »

Quote from: cannam on July 12, 2012, 14:27:41

Quote from: justin on July 12, 2012, 14:10:55
Extra question 1: from the programmer's guide I take it the first block is not centred on time zero but rather starts at the first sample of the audio right? (double checking, as this could cause alignment issues when checking against ground-truths centred on time 0).

Depends on the host, but you can tell from the timestamp provided.

The docs (http://code.soundsoftware.ac.uk/embedded/vamp-plugin-sdk/classVamp_1_1Plugin.html#ae4aed3bebfe80a2e2fccd3d37af26996) say that "[t]he timestamp will be the real time in seconds of the centre of the FFT input window". Therefore, if the first timestamp is zero, that should mean you are being passed a window centred on the start of the audio rather than starting with the first sample.

Just in case anyone is following this thread - I just realised this is not the case if you use the time domain input - in that case the timestamp will be of the beginning of the block! Sorry Chris, I realise that's stated quite clearly in the link you posted, I should've read it more carefully.

Anyway, for those of us working with time-domain input, if you want "correct" timestamps you should add windowduration/2 to your timestamps (where windowduration is expressed in seconds, i.e. m_blockSize / inputSampleRate).

Cheers

Author Topic: Sampling rates, input domain and sonification (Read 13003 times)

justin

Sampling rates, input domain and sonification

cannam

Re: Sampling rates, input domain and sonification

justin

Re: Sampling rates, input domain and sonification

cannam

Re: Sampling rates, input domain and sonification

justin

Re: Sampling rates, input domain and sonification

cannam

Re: Sampling rates, input domain and sonification

justin

Re: Sampling rates, input domain and sonification

justin

Re: Sampling rates, input domain and sonification

cannam

Re: Sampling rates, input domain and sonification

justin

Re: Sampling rates, input domain and sonification

justin

Re: Sampling rates, input domain and sonification