Vachana Sanchaya: Bringing Access to 11th century Kannada Literature
The blog post throws light on providing access to Vachana Sanchaya, a eleventh century Kannada literature.
The article was published in Open Access Week on April 3, 2014.
During early 11th century a form of spiritual Kannada language poetry in the Indian state of Karnataka called Vachana sahitya became quite popular. It started flourishing in the 12th century by a religious movement called Lingayatha movement. More than 259 Vachana writers, called Vachanakaru, compiled over 11,000 vachanas (verses). 21,000 of these verses in 15 volumes were published by the Government of Karnataka into an online portal called Samagra Vachana Samputa. Two Wikimedians along with two linguists brought these verses on a standalone project called Vachana Sanchaya. Kannada Wikimedians, Pavithra Hanchagaiah and Omshivaprakash HI along with Kannada linguist O. L. Nagabhushana Swamy converted the font to Unicode to make the verses searchable on this project. The entire collection is now ready to enrich the Kannada WikiSource.
The text in Samagra Vachana Samputa were typed using fonts of ISCII, an Indian character encoding standard. Indic characters generally replace Latin ones inside the font that makes them completely useless when someone does not have the particular font installed in the computer. This is a typical problem with non-Latin fonts, especially Indic typefaces. In case of this particular publication, there were more than 5 ISCII standards which made searching and reusing content completely impossible. Hanchagaiah and Omshivaprakash started writing scripts to make the Vachanas searchable through an index. This demanded a user friendly platform for the linguistic researchers, students, and the public interested in accessing this literature.
Omshivaprakash worked on designing the architecture for this platform using open source software tools. Hanchagaiah was involved in providing critical hacks for digitization and valuable inputs through suggestions, feedback, and quality assurance.
At present, Vachana Sanchaya project has around 200,000 unique words that were derived from these verses. The public has been using the repository and accessing vachana from Facebook, Twitter, and Google+ profiles. There are thousands of people now who read a Vahana as part of their daily routine. Vachana Sanchaya is not only a gateway for reading the literature, but also a research platform for Kannada language and literature. It has options for researchers to help in reviewing content which in turn will help to add references from research papers.
All of the content is currently available to the public through the OpenData API, and once the reviewing the work is complete, it will be distributed in the public domain through WikiSource. This will open up the system for students, developers, researchers, and anyone interested in building linguistic tools for Kannada and other Indic languages. Users will be able to use our code to digitize any book available in the public domain. Early literature in any language is well-respected, so making it available via an open platform allows for reuse of the content for research, publication, and other documentation work.
Other similar projects could take help from this project and use any part of the processes.
Plans going foward:
- To initiate Natural Language Processing (NLP) projects if more researches help to tag words and grow the glossary.
- To continue work on subsequent, similar projects for Sarvagnana Vachanagalu and Dāsa Sanchaya (work has begun) and Vyāsa and Muddann (work to be started)
- To extend this platform to other the contemporary literature works available in the public domain.
Authored by Pavithra Hanchagaiah, Omshivaprakash HI and Subhashish Panigrahi. Draws inspiration from another article published on Opensource.com under CC-BY-SA 4.0