Perceptual Feature-Guided Denoising for Speech Enhancement
Bachelor Thesis Project (BTP) under Prof. P Balamurugan (IIT Bombay)
Noisy Audio
Clean Audio
In this report, we introduce an end-to-end deep learning approach for speech enhancement, leveraging a fully convolutional neural network (FCN) to generate clean audio from noisy inputs. The training process incorporates a Feature Loss network, tasked with learning feature embeddings for audio signals. By guiding our Denoising network through perceptual losses derived from differences in feature embeddings, particularly from established sources, we evaluate our model’s performance through a unique lens termed “Perceptual Feature Losses.” Our findings affirm the superiority of this approach over existing techniques, validated through perceptual experiments involving human listeners, suggesting a promising avenue for speech quality enhancement and noise reduction.
At the heart of our approach lies the training of the Denoising network, where exposure to noisy audio inputs allows it to learn over time through evaluating perceptual feature losses against corresponding clean audio. This mechanism ensures the generation of representations closely mirroring clean audio signals at various processing stages.