This webpage provides listening examples for our proposed Atss-Net and the baseline VoiceFilter[1]. The Atss-Net allows the network to compute the correlation between features in parallel and uses shallower layers to extract more features compared to the CNN-LSTM architecture. We include sample audio demonstrating Atss-Net’s promising performance in speech enhancement.
Type | Mixture Input | VoiceFilter[1] | Proposed Atss-Net | Target Speech |
---|---|---|---|---|
F-F | M-M | M-F | M-F | M-F |
Type | Female Input | Proposed Atss-Net | Male Input | Proposed Atss-Net |
---|---|---|---|---|
C-W | F-M | P-M | R-M | F-T |
[1] Q. Wang, H. Muckenhirn, K. Wilson, et al., "VoiceFilter: Targeted voice separation by speaker-conditioned spectrogram masking," in Interspeech, 2019, pp. 2728-2732.