Applicability of End-to-End Deep Neural Architecture to Sinhala Speech Recognition

Main Article Content

Buddhi Gamage
Randil Pushpananda
Thilini Nadungodage
Ruvan Weerasinghe

Abstract

This research presents a study on the application of end-to-end deep learning models for Automatic Speech Recognition in the Sinhala language, which is characterized by its high inflection and limited resources. We explore two e2e architectures, namely the e2e Lattice-Free Maximum Mutual Information model and the Recurrent Neural Network model, using a restricted dataset. Statistical models with 40 hours of training data are established as baselines for evaluation. Our pretrained endto-end Automatic Speech Recognition models achieved a Word Error Rate of 23.38% by far the best word-error-rate achieved for low resourced Sinhala Language. Our models demonstrate greater contextual independence and faster processing, making them more suitable for general-purpose speech-to-text translation in Sinhala.

Article Details

Select the Journal Issue
Articles