Centre for Internet & Society

The Goa university has entered into a three year memorandum of understanding (MoU) with the Centre for Internet and Society (CIS) for building the Konkani Wikipedia, reports navhindtimes.


This blog post by Apurva Chaudhary was published by Medianama on September 30, 2013.


As part of this partnership, Goa University will be uploading the four volumes of Konkani encyclopedia and the Konkani wikipedia is expected to be available in six months. It appears that the partnership with CIS India includes processes such as scanning, digitization, and creating articles as per requirement of Wikipedia. CIS India has also called out for volunteers for a period of three months to help them digitizing the Konkani encyclopedia.

This is quite notable since most of the development in digitizing documents in Indian language did not include Konkani language, which is spoken in western coast of India. Besides, since the university will upload the Konkani encyclopedia on Wikipedia, it will allow users to search through texts. Most of the digitization projects just scan the books and upload the pages in image format, which makes it difficult to search or perform any kind of data related query.

Last year in May, Wikipedians were digitizing Indian language, out-of-copyright texts online, trying to address the comparative paucity of Indic language texts online. Wikisource is a repository of documents and archived material that serves as a reference source for Wikipedia, and a means of improving access to information sources. Of the 64 languages Wikisource is available in, 8 are Indian: Tamil (stats), Malayalam (stats),Telugu (stats), Kannada (stats), Sanskrit (stats), Marathi (stats), Bengali (stats) and Gujarati (stats). Note that most of these Indian language Wikipedia’s receive active contribution from CIS India under its CIS-A2K program.

Wikipedia had recorded 43.5 million pageviews in Indian language wikis, as of October 2011. In June 2013, the Hindi language Wikipedia had received monthly pageviews of 7.8 million, Tamil language Wikipedia had received 5.2 million pageviews, among others. So there’s clearly a demand to access information in local languages.

That said, while more languages are being added to the list, it should be seen that the articles in these languages do not contain just one or two sentences. According to an analysis by Shijualex.in, many articles posted under Indian language Wikipedia were under 2kb, indicating that it contained only couple of sentences.