An Open-Source Data Driven Spell Checker for Sinhala
In this paper, we describe a strategy to construct spell checkers for languages where linguistic resources are scarce or non-existent. It is of particular relevance to languages that have rich morphology and thus are difficult to completely enumerate in a lexicon. The approach is based on character n-gram statistics and is relatively inexpensive to construct without deep linguistic knowledge. The technique is applied to Sinhala, the majority language of Sri Lanka, and shown to be able to detect and correct many of the common spelling errors of the language. Results show accuracy above 90% on a publicly available Sinhala text.
spell checking; Sinhala; data driven; n-gram
Full Text:Download Full Paper
|University of Colombo
School of Computing
Managed & Published
This journal is published under a Creative Commons Attribution 4.0 International License.