Demonstration of Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds [1]

Masaya Kawamura1, Tomohiko Nakamura1, Daichi Kitamura2, Hiroshi Saruwatari1, Yu Takahashi3, Kazunobu Kondo3

1The University of Tokyo, Tokyo, Japan
2National Institute of Technology, Kagawa College, Kagawa, Japan
3Yamaha Corporation, Shizuoka, Japan

This is an accompanying page and includes some examples of the synthesized signals obtained with the proposed and conventional methods. The mixture and groundtruth signals are from the University of Rochester multimodal music performance (URMP) dataset [2] and the PHENICX-Anechoic dataset [3, 4]. These audio signals are not included in the training data.


Dataset Mixture Instrument Ground Truth SISS+DDSP SISS+Proposed SI-Proposed
URMP dataset Viola/Flute
Viola
Flute
Flute 1/Flute 2
Flute 1
Flute 2
PHENICX-Anechoic dataset Cello/Double bass
Cello
Double bass
Flute 1/Flute 2
Flute 1
Flute 2

References

[1] M. Kawamura, T. Nakamura, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, "Differentiable digital signal processing mixture model for synthesis parameter extraction from mixture of harmonic sounds," in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. (to appear http://arxiv.org/abs/2202.00200)
[2] B. Li, X. Liu, K. Dinesh, Z. Duan, G. Sharma, "Creating a multitrack classical music performance dataset for multimodal music analysis: Challenges, insights, and applications," IEEE Transactions on Multimedia, vol. 21, no. 2, pp. 522-535, 2019.
[3] M. Miron, J. J. Carabias-Orti, J. J. Bosch, E. Gómez, and J. Janer, "Score-informed source separation for multichannel orchestral recordings," Journal of Electrical and Computer Engineering, vol. 2016, 2016.
[4] J. Pätynen, V. Pulkki, and T. Lokki, "Anechoic recording system for symphony orchestra," Acta Acustica united with Acustica, vol. 94, no. 6, pp. 856–865, 2008.