• Login
    View Item 
    •   Institutional Repository @University of Calicut
    • Computer Science
    • Doctoral Theses
    • View Item
    •   Institutional Repository @University of Calicut
    • Computer Science
    • Doctoral Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Cognitive modeling of accented speech in Malayalam: exploring the impact of acoustic signal processing and deep learning techniques.

    Thumbnail
    View/Open
    Final.pdf (12.85Mb)
    Date
    2024-06-12
    Author
    Kallooravi Thandil, Rizwana
    Metadata
    Show full item record
    Abstract
    Accented Automatic Speech Recognition (AASR) is the ability of a system to recognize accented speech inputs. It poses a unique challenge, particularly for languages with limited available datasets. In this research, a comprehensive exploration of machine learning and deep learning along with feature engineering techniques was conducted to advance the understanding of accented speech recognition. The research is completed in several phases of experimental studies. The journey begins with an extensive literature review and finding the dominating gap in the domain of AASR for Malayalam. The unavailability of benchmark dataset in accented Malayalam and scarcity of previous study in literature hindered this research. To address the scarcity of relevant datasets, eight distinct sets of accented data were carefully constructed. Additionally, a spectrogram dataset was developed to facilitate a comprehensive study. The research investigates various feature extraction techniques and model architectures, exploring the impact of different feature combinations on accented speech recognition. Each dataset is characterized by a diverse range of key properties essential for robust speech recognition systems. The datasets exhibit a wide spectrum of accents from varied regions and demographic groups. Efforts were made to maintain balanced representation across genders, ages, and socio-economic backgrounds, thereby reducing potential biases. The recordings for some of the datasets were conducted in natural settings to authentically capture variations in accent and pronunciation. These datasets are annotated with word and sentence level transcriptions (depending on the type of audio signal) and the district of the specific accent providing valuable insights into speaker details and recording conditions. To evaluate system robustness, recordings were obtained under various noise conditions, spanning from quiet environments to bustling public spaces.
    URI
    https://hdl.handle.net/20.500.12818/1612
    Collections
    • Doctoral Theses [8]

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV