Title

Phase-aware speech super resolution using U-net architecture with lattice topology

Abstract

Abstract

Speech super resolution (SSR) is one of the main research areas of audio signal processing. The aim is to increase the bandwidth of speech signals with low sampling frequencies by estimating the high frequencies. A speech signal with increased bandwidth, along with accurately predicted high frequencies, generally provides the listener with better speech quality. Traditional signal processing methods such as interpolation do not provide satisfactory results to solve this problem. With the introduction of generative models into the speech domain, synthetic speech generation and optimization of the developed models with generative models-based loss functions are one of the most current research topics. Reconstructing both magnitude and phase information together to produce high quality speech sound is very critical for speech synthesis. In the literature, reconstructing speech phase information is one of
the main problem. Current methods either ignore phase information or try to estimate it using magnitude information in the network. This thesis proposes a method that uses U-net based and lattice filter network by evaluating both magnitude and
phase information together. At the same time, the phase loss function is used to optimize the phase information accurately. By performing upsampling entirely in the frequency domain, the entire spectrum is estimated. This method solves the artifact problem that occurs when upsampling is done over time. The experiments and the results show that the proposed method gives the better results than the state-of-art methods in the evaluation metric ViSQOL and comparable results with the other metric LSD with fewer model parameters.

Supervisor(s)

Supervisor(s)

YALCIN CENIK

Date and Location

Date and Location

2024-01-25 13:00:00

Category

Category

MSc_Thesis