Author Topic: Inconsistencies in QM-plugin output (when invoked using simple-host or SV app)  (Read 5149 times)

Hannes

  • Newbie
  • *
  • Posts: 1
    • View Profile
I used the feature extraction functionality of the Sonic Visualiser Queen Mary Plugin which provides some higl-level features very valuable for me. Thanks to the developers for that!!

Using the simple host application (vamp-simple-host.exe) included in the the vamp sdk 1.3 also allows to extract these features on the command line. However I detected some inconsistencies between the plugin's output depending on whether it is invoked by the simple host application or it is invoked by the Sonic Visualiser main application (if the output is exported to a file). E.g. the beat tracker detected some more beats when run using the simple host, although most of the beats are (nearly) identical. Although I didn't expect that, this is not a serious issue for me. (Probably the plugin got different input of two different fft implementations, well I don't know...)
What I'm more worried about is the differences in the chromagram or the constant-q output. If I use the main application I got an output which has 4 frames more. That shouldn't be the case. I used the same sound file and the default parameters, so the number of frames should be equal. (my plugin version: 1.4 but the same problems occur with version 1.0, my Sonic Visualiser main application version: 1.3, my OS: Windows)
Looking at the results more closely I found out that the results do correspond, however the simple host's output misses the first four frames. As if the sound files starts 0.74 seconds later.

Is it a bug? I don't have an explanation for this.
Thanks for any comments,

Hannes

cannam

  • Administrator
  • Sr. Member
  • *****
  • Posts: 273
    • View Profile
Hannes,

Sorry to take so long to reply to this.  It's a bit of a subtle one, I wanted to check the figures first, and I'm afraid it took a little while to get around to that.

This behaviour is essentially a bug in the result printing part of the simple host in the Vamp SDK, although there is also a shortage of documentation for the (legitimate) behaviour of the SDK code that leads to this bug.  Sonic Visualiser appears to have the correct output here.  I will aim to get the bug fixed and the behaviour properly documented in the 1.4 SDK release.

The cause is to do with the handling of frame timestamps when using plugins that have frequency-domain input.  Sonic Visualiser feeds these plugins frames starting from the frame that is centred on the first input audio sample, which is timestamped at time zero.

The simple host, in contrast, begins with a frame that starts at the first input audio sample (not such a good thing to do, because it means that samples earlier in the file than half the frame size are not properly represented, but technically legitimate).  This host uses a PluginInputDomainAdapter to handle the conversion to frequency domain if the plugin requested it; for the first frame, it feeds to the adapter these time-domain samples starting at the first input audio sample, with a timestamp of zero.  The adapter recognises that the frequency domain input timestamp should be adjusted to the centre of the frame, and makes that adjustment.  However, the host is not aware that that adjustment has happened, and so prints out the results with the un-adjusted timestamps.  That's the bug.

The results shown in SV and those returned by the simple host will match for plugins that use time-domain input, and should also match for plugins with frequency-domain input where their outputs are timestamped explicitly by the plugin using "adjusted" timestamps, for example the Onsets output of the Simple Percussion Onset Detector example plugin.  Where they differ (with SV being correct and the simple host wrong) is for plugins with frequency-domain input whose outputs are timestamped implicitly by the host, for example the chromagram -- the discrepancy is particularly marked for the chromagram because of its long frame size; the difference of 0.74 seconds you mention is half of that frame size.


Chris
« Last Edit: September 17, 2008, 11:05:36 by cannam »