Applicability of End-to-End Deep Neural Architecture to Sinhala Speech Recognition

Buddhi Gamage; Randil Pushpananda; Thilini Nadungodage; Ruvan Weerasinghe

PDF

Published May 31, 2024

Buddhi Gamage

University of Sri Jayewardenepura

Randil Pushpananda

UCSC

Thilini Nadungodage

UCSC

Ruvan Weerasinghe

UCSC

Abstract

This research presents a study on the application of end-to-end deep learning models for Automatic Speech Recognition in the Sinhala language, which is characterized by its high inflection and limited resources. We explore two e2e architectures, namely the e2e Lattice-Free Maximum Mutual Information model and the Recurrent Neural Network model, using a restricted dataset. Statistical models with 40 hours of training data are established as baselines for evaluation. Our pretrained endto-end Automatic Speech Recognition models achieved a Word Error Rate of 23.38% by far the best word-error-rate achieved for low resourced Sinhala Language. Our models demonstrate greater contextual independence and faster processing, making them more suitable for general-purpose speech-to-text translation in Sinhala.

Issue

Vol 17 No 1 (2024): 2024 Special Issue (March 2024)

Select the Journal Issue

Articles


CodeGen Industry Sponsors	University of Colombo School of Computing Managed & Published

Article Sidebar

Main Article Content

Abstract

Article Details