Enhancing Neural Machine Translation for the Sinhala-Tamil language pair with limited resources

Main Article Content

Ashmari Pramodya
K T Y Mahima
Randil Pushpananda
Ruvan Weerasinghe

Abstract

Neural Machine Translation has emerged as a promising approach for language translation. Transformer-based deep learning architectures have also significantly enhanced translation performance across various language pairs. However, several language pairs with limited resources face challenges in adopting Neural Machine Translation because of their data requirements. This study investigates methods for expanding the parallel corpus to enhance translation quality. We establish a series of effective guidelines for enhancing Tamil-to-Sinhala machine translation based on cutting-edge Neural Machine Translation techniques like fine-tuning hyperparameters and data augmentation through both forward and backward translation. We validate our methods empirically using standard evaluation metrics. Based on our conducted experiments, we observed that Neural Machine Translation models trained on larger sets of back-translated data outperform other methods of synthetic data generation in Transformer-based training settings. We investigated if we could effectively use the Transformer architecture in the limited-resource context of translating Tamil to Sinhala. Our research demonstrated that Transformer models can surpass the top Statistical Machine Translation models, even in language pairs with limited resources. We achieved an improvement of 3.43 BLEU points in translation quality compared to the statistical translation models

Article Details

Select the Journal Issue
Articles