This webpage is to show some listening examples for our proposed Atss-Net and the baseline VoiceFilter[1]. It allows the network computing the correlation between each feature parallelly, and using shallower layers to extract more features, compared with the CNN-LSTM architecture. We also provide some samples to prove that our Atss-Net demonstrates promising performance in speech enhancement.
Type | Mixture Input | VoiceFilter[1] | Proposed Atss-Net | Target Speech | F-F | M-M | M-F | M-F | M-F |
---|
Type | Female Input | Proposed Atss-Net | Male Input | Proposed Atss-Net | C-W | F-M | P-M | R-M | F-T |
---|
[1] Q. Wang, H. Muckenhirn, K. Wilson, et al., "VoiceFilter: Targeted voice separation by speaker-conditioned spectrogram masking," in Interspeech, 2019, pp. 2728-2732.