Centre for Internet & Society
8 Challenges for Improving Indian Language Wikipedias

Image credits : Subhashish Panigrahi. CC BY-SA 4.0.

After more than 10 years in existence, the Indian-language Wikipedias still are not known to many Indian language speakers. Wikipedia became the largest encyclopedia in history as a result of thousands of volunteer editors.

The article was originally published in the Wire on March 17, 2016 and later mirrored on Opensource.com on March 28, 2016.


Whereas native-language Wikipedias are becoming game changers in other corners of the world, the scenario in India is skewed. While speaking at the "BHASHA: Indian Languages Digital Festival," a day-long discourse at New Delhi on Indian languages and their state in the new media (especially on the digital platforms), I shared challenges that Indian language Wikipedias are facing.

1. Language communities

Many native Indian language speakers do not know how to search online using language typed in their script. Because Google's home page does not display their language script as an option, people often think that their language does not exist on the Internet. Google now has nine Indian languages. But this does not stop a Santali or Manipuri speaker from searching in Unicode Ol chiki (script for Santali) or in Unicode Meithei (script for Manipuri). Google and other search engines will display content in any script on the Internet, but not knowing this keeps many people off the Internet, which also means off of Wikipedia.

2. Wikipedia's editor community

Wikipedia is created by people like you and me. From writing to editing, everything happens voluntarily. Many people do not understand that they can correct mistakes and help improve Wikipedia articles. The Wikipedia editor communities for several Indian languages are really small. Although these languages are spoken by millions of people, only a handful of editors contribute in editing the Wikipedia in these languages. As of January 2016, the Hindi Wikipedia had only 89 editors, whereas Hindi falls right behind English on the list of top languages by number of native speakers.

3. Language input in computer

A majority of people in India do not know how to type in their own language. Also, there is little documentation for users to learn about language input. Even though many government-run schools in India are seeing more computers and have Internet access, native language input and several other basic computer training are not widely taught in schools in all states. Free software for language input is available, and the challenges of typing in Indian languages (PDF) that existed in the past are mostly resolved.

4. Language input in mobile devices

With more than 1 billion people in India (PDF) with mobile phones, the 15% Internet penetration rate will soon grow at a faster pace. This growth and tough competition is compelling telecom service providers to drop data charges, which will help more Indians get access to the Internet. If these people are not educated about native language input, then they will be stuck inside an English-centric Internet rather than being able to navigate in their own languages. Many Indians who have smartphones need full Indian language support—and especially built-in input methods—to contribute in their own language Wikipedia.

5. Low availability of Indian-language content on the Internet

Lack of native language content on the Internet is a barrier to accessing knowledge. For example, let's look at my state Odisha. The Kerala (Indian state) government's official tourism portal is available in Odia and other Indian languages, but the Odisha government's tourism portal has no information in the Odia language today. Our languages are largely neglected in our own states.

6. Mismatch of conventional and new media

Many conventional media houses still use non-standard variants of ASCII/ISCII script encoding systems instead of adopting the Unicode standard. As a global standard, Unicode can help unify the world and has been available for Indian languages for almost 25 years. But many of our print media have failed to adopt this, and many popular Indian-language newspapers still aren't available in Unicode.

7. Lack of open access

Much information online, including content created by the government, is under copyright licensing. The pay-wall gardens and copyright restrictions keep information closed and prevents people from sharing content. Wikipedia content, on the other hand, is available under Creative Commons Share-Alike licensing, which allows anyone to use the content (and even distribute commercial copies of it). The idea of opening up content under free licenses can help information reach countless additional people.

8. Accessibility

India has more than 60-million people with hearing impairments. Many people with physical disabilities need good text-to-speech and speech-to-text engines. And these software solutions must be free, so that anyone, regardless of their finances, can contribute to Wikipedia in their own languages.

The views and opinions expressed on this page are those of their individual authors. Unless the opposite is explicitly stated, or unless the opposite may be reasonably inferred, CIS does not subscribe to these views and opinions which belong to their individual authors. CIS does not accept any responsibility, legal or otherwise, for the views and opinions of these individual authors. For an official statement from CIS on a particular issue, please contact us directly.