An Open-Source Data Driven Spell Checker for Sinhala
Main Article Content
Abstract
In this paper, we describe a strategy to construct spell checkers for languages where linguistic resources are scarce or non-existent. It is of particular relevance to languages that have rich morphology and thus are difficult to completely enumerate in a lexicon. The approach is based on character n-gram statistics and is relatively inexpensive to construct without deep linguistic knowledge. The technique is applied to Sinhala, the majority language of Sri Lanka, and shown to be able to detect and correct many of the common spelling errors of the language. Results show accuracy above 90% on a publicly available Sinhala text.
Article Details
Issue
Select the Journal Issue
Articles