Cognitive modeling of accented speech in Malayalam: exploring the impact of acoustic signal processing and deep learning techniques.
Abstract
Accented Automatic Speech Recognition (AASR) is the ability of a system to
recognize accented speech inputs. It poses a unique challenge, particularly for
languages with limited available datasets. In this research, a comprehensive
exploration of machine learning and deep learning along with feature engineering
techniques was conducted to advance the understanding of accented speech
recognition.
The research is completed in several phases of experimental studies. The journey
begins with an extensive literature review and finding the dominating gap in the
domain of AASR for Malayalam. The unavailability of benchmark dataset in
accented Malayalam and scarcity of previous study in literature hindered this
research. To address the scarcity of relevant datasets, eight distinct sets of accented
data were carefully constructed. Additionally, a spectrogram dataset was developed
to facilitate a comprehensive study. The research investigates various feature
extraction techniques and model architectures, exploring the impact of different
feature combinations on accented speech recognition.
Each dataset is characterized by a diverse range of key properties essential for robust
speech recognition systems. The datasets exhibit a wide spectrum of accents from
varied regions and demographic groups. Efforts were made to maintain balanced
representation across genders, ages, and socio-economic backgrounds, thereby
reducing potential biases. The recordings for some of the datasets were conducted in
natural settings to authentically capture variations in accent and pronunciation.
These datasets are annotated with word and sentence level transcriptions
(depending on the type of audio signal) and the district of the specific accent
providing valuable insights into speaker details and recording conditions. To
evaluate system robustness, recordings were obtained under various noise
conditions, spanning from quiet environments to bustling public spaces.
Collections
- Doctoral Theses [9]