An Open-Source Data Driven Spell Checker for Sinhala

Ruwan Asanka Wasala, Ruwan Weerasinghe, Randil Pushpananda, Chamila Liyanage, Eranga Jayalatharachchi

Abstract


In this paper, we describe a strategy to construct spell checkers for languages where linguistic resources are scarce or non-existent. It is of particular relevance to languages that have rich morphology and thus are difficult to completely enumerate in a lexicon. The approach is based on character n-gram statistics and is relatively inexpensive to construct without deep linguistic knowledge. The technique is applied to Sinhala, the majority language of Sri Lanka, and shown to be able to detect and correct many of the common spelling errors of the language. Results show accuracy above 90% on a publicly available Sinhala text.

Keywords


spell checking; Sinhala; data driven; n-gram

Full Text:

Download Full Paper


CodeGen
Printing Sponsor
University of Colombo
School of Computing

Managed & Published

Creative Commons License
This journal is published under a Creative Commons Attribution 4.0 International License.