Vachana Sanchaya: 11th century Kannada literature to enrich Wikisource
Kannada Wikipedian Omshivaprakash, Pavithra and I co-authored this article on digitizing Vachana Sahitya, a 11th century Kannada literature on WikiSource.
Pavithra Hanchagaiah and Omshivaprakash HI, Wikimedians from India are co-authors with Subhashish Panigrahi in this article. This was originally posted on Wikimedia blog and published by GlobalVoices on March 18, 2014.
In the poetry of Kannada (an Indic language), Vachana sahitya is a form of rhythmic writing that evolved in the 11th Century C.E. and flourished in the 12th century, as part of the “Lingayatha” movement. More than 259 Vachanakaras (Vachana writers) have compiled over 11,000 vachanas. 21,000 of these verses which were published in a 15 volume “Samagra Vachana Samputa” by the government of Karnataka have been digitized. Two Wikimedians along with a Kannada linguist and author O. L. Nagabhushana Swamy are involved in the Unicode conversions, corrections and writing preface for these verses. The entire work is now available as a standalone project called “Vachana Sanchaya” and ready to enrich Kannada Wikisource.
This project was started a year ago when Kannada Wikimedian Omshivaprakash was trying to help Professor O.L. Naghabhushana Swamy and Kannada author and publisher Vasudhendra access the vachana (verses) of Vachana Sanchaya. Swamy had trouble using publicly available content on Vachanas since the data was in ASCII standard and searching text was a huge problem. I (Pavithra Hanchagaiah) started to help gather information about vachanas and document it in Unicode by writing scripts for open source software. Further discussions were had to get thousands of vachanas in the form of a database, so that they could be easily searchable with an index. This demanded us to build a platform supporting all these activities, which would help the linguistic researchers, students and members of the general public who have an interest in reading and studying Vachana literature. With this idea, Omshivaprakash started designing the model, and his colleague Devaraju started building it. In the meantime I was running various scripts to fix errors in conversion of ASCII text to Unicode, confirming that the data was ready to consume by the modules developed for concordance. We spent weekends & holidays executing this project from home. With the constant feedback and guidance from Mr. Swamy and Vasudendra, we learned how concordance of text is used by researchers and what would make it easier for them to research on Vachana Sahitya. Omshivaprakash worked on the architecture of the platform, decided the infrastructure requirements – free and open source software technologies were used to keep the platform active while managing the entire project. I provided critical hacks for digitization and gave feedback through suggestions.
Currently, the system has around 200,000 unique words in its repository. Vachana Sanchaya is meant for research rather than just a repository of text on the web. While you search the words on our system, you can see who has used the word in all Vachanas. To make the research more readable, we highlight the text searched in each Vachana that would be displayed. To repeat the search for a specific Vachanakara (poet) you just need to click on his name on the graph on the results page. We have used MediaWiki’s jquery-ime input tool architecture that helped us provide a feature to directly enter Kannada text in Unicode for searches. So just type, and get results!
Vachana Sanchaya Website Screenshot
We are glad to see people accessing vachanas from our Facebook, Twitter and Google+ channels. There have been approximately 500,000 pageviews to our site in the first few months of our platform’s public launch. Interestingly, commonly searched Kannada words like “ಕರ್ಮ”(Karma en:Work/Deed) , “ಸತ್ಯ” (Sathya -en:Truthfulness ) and “ನದಿ” (River) have resulted in quick and easy results.
Plans for the future
ಆಂಗೀರಸ, ಪುಲಸ್ತ್ಯ, ಪುಲಹ, ಶಾಂತ,ದಕ್ಷ, ವಸಿಷ್ಠ, ವಾಮದೇವ, ನವಬ್ರಹ್ಮ, ಕೌಶಿಕ, ಶೌನಕ, ಸ್ವಯಂಭು, ಸ್ವಾರೋಚಿಷ, ಉತ್ತಮ, ತಾಮಸ, ರೈವತ, ಚಾಕ್ಷಷ, ವೈವಸ್ವತ, ಸೂರ್ಯಸಾವರ್ಣಿ, ಚಂದ್ರಸಾವರ್ಣಿ, ಬ್ರಹ್ಮಸಾವರ್ಣಿ, ಇಂದ್ರ ಸಾವರ್ಣಿ ಇವರು ಇಪ್ಪತ್ತು ಮಂದಿ ಪ್ರಪಂಚ ನಿರ್ಮಾಣ ಸಹಾಯ[ದ]ವರು. ಹತ್ತೊಂಬತ್ತು ಎಂದರೆ ಪುಣ್ಯನದಿಗಳು. ಅದು ಎಂತೆಂದಡೆ: ಗ್ರಂಥ
Our system is extensible with respect to adding new feature – we have a review desk for researchers to help us with the review of content. Later we will also be adding required references to Vachanas from various research works that have been done around this literature. The content is available to the public through OpenData API and will be distributed as public domain through Wikisource once the review work is complete. This will open up the system for students, developers, researchers and anyone interested in working around building linguistic tools for Kannada and other Indic languages. This system is meant to evolves around other works rather than having to change and re-invent the wheel for more such projects. Vachana Sahitya will further help us to initiate Natural Language Processing (NLP) projects if more researchers get together to tag the words, glossary etc in the coming days. We can also fulfill the need of various language tools like spelling and grammar checker for users through crowd-sourcing the development. The next projects under the “Kannada Sanchaya” are Sarvagnana Vachanagalu and Dāsa Sanchaya which are in the pipeline with initial phases of work underway. Our idea is to extend this platform from Vyasa to Muddanna and possibly the contemporary literature work available in the public domain.