Power is not everything: two frameworks to overcome limitations of power domain modeling

37
Опубликовано 6 сентября 2016, 18:10
Although many audio signal processing techniques, developed for a wide range of applications such as denoising, source separation or time/pitch-scale modification, operate in the time-frequency power or magnitude domain, discarding the phase information is not without raising important issues. First, if resynthesis of a time-domain signal is necessary, phase needs to be estimated in such a way that it is coherent with the magnitude. Second, additivity of signals is not true anymore, as the cross-terms in the power of a sum are in general not equal to zero. Third, although modeling the phase is often considered an intricate problem, phase may still contain relevant information to exploit, for example in electronic music and to some extent for some instruments such as piano and percussive instruments, where the waveform of the same note or sound played several times is perfectly or nearly perferctly reproducible from one occurence to the other. In this talk, I will present two frameworks to avoid or overcome these issues, one focusing on the complex time-frequency domain and the other on the time domain. The first framework relies on the derivation of general consistency constraints for complex short-time Fourier transform spectrograms. The consistency criterion which we deduce from them can be used as a cost function in audio signal processing algorithms working in the complex time-frequency domain, or as an objective function to estimate the phase which best corresponds to a given magnitude spectrogram. The second framework, called shift-invariant semi-non-negative matrix factorization, attempts to solve the problem of template matching with unknown templates. It consists in a general model defined in the time domain, assuming that the observed waveform is the superposition of a limited number of elementary patterns, added with variable latencies and variable but positive amplitudes. The elementary patterns are learnt from the data together with the timing and amplitude of their activations. I will show preliminary results on audio data and extracellular recordings.
автотехномузыкадетское