Centre for Internet & Society

Content Includes Classics In Malayalam, Bengali.

The article by Sandhya Soman was published in the Times of India on January 10, 2016.

It was a hunt that took Shiju Alex to many places. Finally, his quest ended at Dharmaram College library in Bengaluru as Alex got hold of a copy of the firstever printed book in Malayalam. He scanned it promptly and volunteers uploaded the text on to Malayalam Wikisource, one of the free online libraries run by Wikipedia. Nasim Ali returned to Wikipedia editing only because fellow Odias were reaching out on social media to help upload the 13-volume Bhagavata Mahapuranam.Now, the entire work is available for free at Odia Wikisource.

Actions speak louder than words when it comes to preserving books in regional languages. Indian versions of Wikisource have more than 1 lakh pages of classic epics, philosophical tracts, and novels and poems in 10 languages. And the num bers are growing. “These are the books that we grow up with and connect emo tionally. Most of us would like to see them online,“ said Subhashish Panigrahi, Wikipedian and programme officer at the Centre for Internet and Society .

As Wikipedians come together in Bengaluru on Sunday to celebrate 15 years of editing and curating the encyclopedia in India, more such stories will be told. The growth has been tremendous in Indian language content creation, especially when it comes to setting up Wikisources, said A Ravishankar, programme director at the Wikimedia India chapter. Malayalam has 26,332 pages, including around 200 of the seminal books in the language. While Telugu has 29,039 pages, Bengali has around 11,000. Sanskrit, Tamil, Kannada, Oriya, Marathi, Gujarati and Assamese libraries are also getting bigger. The content ranges from religious texts such as Ramayan and Bible to first-ever printed literary works.

“Most of these are books in the public domain or the ones relicensed with Creative Commons licences. This allows anyone to edit or make a copy of the work, making it reusable,“ said Panigrahi. Some of the relicensed works include the Kannada Vishwakosha brought out by University of Mysore.

It isn't easy to get works online. Alex finds it difficult to procure the original texts to create their PDF versions. “Every time I go to Kerala, I look for old books,“ said Alex, who uploads the PDFs on a public domain for others to upload them. Editors are also not easy to come by . Panigrahi took to social media to find a new set of editors when he was trying to upload the Bhagavatha volumes. “Wiki's volunteer-editors have their hands full. So we appealed on social media and many people signed up,“ he said.

But the effort is worth it, said Alex. Every time he unearths an old book and posts the link on his Facebook page, the reactions are full of surprise. “Many from the younger generation don't know that Samkshepa Vedartham (the first printed work in Malayalam) was printed in Rome. Also, researchers write to me saying they are happy to see the old books online,“ he said.

Students Pitch In

The Wikimedia Foundation has tied up with various colleges to help with typing and proof-reading. Around 120 students of Kalinga Institute of Social Sciences in Bhubaneswar typed stanzas from the Bhagavata while Christ University students from Bengaluru uploaded chunks of the Kannada Vishwakosha as part of their curriculum.

Tech Hurdle

Though the project started in 2006 with Malayalam Wikisource, it spread to other Indian languages around five years ago. The biggest hurdle remains technology as the open source optical character recognition (OCR) software isn't compatible with many Indian languages. “Google's OCR that was launched last year is much better as it works with most Indian languages,“ said Ravishankar. The new software “extracts text from images of any printed text -and sometimes even handwriting, which opens the door to old texts, manuscripts, and more,“ reads Panigrahi's blog post.