An Open-Source Data Driven Spell Checker for Sinhala

Main Article Content

Ruwan Asanka Wasala
Ruwan Weerasinghe
Randil Pushpananda
Chamila Liyanage
Eranga Jayalatharachchi

Abstract

In this paper, we describe a strategy to construct spell checkers for languages where linguistic resources are scarce or non-existent. It is of particular relevance to languages that have rich morphology and thus are difficult to completely enumerate in a lexicon. The approach is based on character n-gram statistics and is relatively inexpensive to construct without deep linguistic knowledge. The technique is applied to Sinhala, the majority language of Sri Lanka, and shown to be able to detect and correct many of the common spelling errors of the language. Results show accuracy above 90% on a publicly available Sinhala text.

Article Details

Select the Journal Issue
Articles