Centre for Internet & Society

While speaking at BHASHA: Indian Languages Digital Festival, a day-long discourse at New Delhi on Indian languages and their state in new media, especially digital platforms, I touched upon Wikipedia in Indian languages. Most people, in fact, do not even know that Wikipedia exists in many Indian languages.

The article was first published in Huffington Post on March 19, 2016. This was cross-posted in Medianama titled as Multiple key factors preventing Indic Wikipedia growth on March 21, 2016.


I personally did not know about the Odia Wikipedia until 2011 when a friend told me about its existence. Back then the project was completely inactive. And then a couple of friends and I started contributing, and the project grew to what it is today. The site now has more than 300,000 visitors every month and it is the most-visited Odia language site on the internet. Other languages, I believe, could follow a similar trajectory on Wikipedia, but there are several challenges along the way.

Many people do not even know how to search for information online in their own language, typed in its script.

1. Ignorance of language communities

Many people do not even know how to search for information online in their own language, typed in its script. Some even share that because Google's home page does not have their script it means that their language does not exist on the internet. This ignorance perpetuates the gap.

2.Wikipedia's editor community

Wikipedia, as you all know, is written and edited by people like you and me who volunteer their efforts. Many people probably do not know or do not try to learn that they themselves can correct the mistakes and inaccuracies that exist in many Wikipedia articles and become editors. The Wikipedia editor communities for several Indian languages are really small. When these languages are spoken by millions of people, only a handful of editors contribute in editing the Wikipedia in these languages.

3.Language input in computers

A vast majority of people in this country do not know how to type in their own language. There is also little documentation for users to learn about language input. Even though many government-run schools in India are seeing more computers and internet, native language input is not widely. However, there is a lot of free software for language input and the challenges of typing in Indian language that existed a few years back have almost gone. You just have to look for the right tools.

4. Language input in mobile devices

With over 1 billion people with mobile phones, the 15% internet penetration rate will soon grow. This in turn will help a lot many Indians to get access to the internet. If these people are not educated about native language input then they will be unnecessarily constricted by the English-centric internet. Many Indians that have smartphones need inbuilt input methods to be able to contribute in their own language Wikipedia.

Many Indians that have smartphones need inbuilt input methods to be able to contribute in their own language Wikipedia.

5. Low availability of Indian-language content on the Internet

Lack of native language content on the Internet bars many from accessing knowledge. As per the Internet and Mobile Association of India survey conducted in 2012, over 6% of the population is deterred from going online because of lack of content in their languages. Take the example of my state. When the Kerala government's official tourism portal is available in Odia and other Indian languages, the Odisha government's tourism portal has no information in Odia language today. Our languages are neglected in our own states.

6. Mismatch of conventional and new media

Many conventional media houses still use non-standard variants of ASCII/ISCII script encoding systems instead of adopting the Unicode standard. Unicode being a global standard and having the advantage of unifying the world has been available for Indian languages for about 25 years now. But many of our traditional media have failed to adopt this. Many popular Indian-language newspapers are yet to become available in Unicode.

7. Lack of open access

Most of the information produced on the internet in general and by the government, in particular, is copyrighted. The paywalled garden of copyright restrictions keeps the information closed and stops people from sharing and learning more. On the contrary, Wikipedia is available under a Creative Commons Share-Alike license which allows anyone to make use of the content and even distribute commercial copies of it. The idea of opening up information for the masses in a free license could make the information reach millions of people.

Many conventional media houses still use non-standard variants of ASCII/ISCII script encoding systems instead of adopting the Unicode standard.

8. People with disabilities

Many people cannot read, speak and write. India has over 60 million people with hearing impairment. There is a need for a good quality text-to-speech and speech-to-text engine for people with physical disabilities. Also, these software products have to be free so that common people who cannot afford to buy expensive proprietary software like JAWS can contribute to Wikipedia in their language. Many text-to-speech engines that are available today for Indian languages sound so mechanical that it is tough for common speakers to use them.

The views and opinions expressed on this page are those of their individual authors. Unless the opposite is explicitly stated, or unless the opposite may be reasonably inferred, CIS does not subscribe to these views and opinions which belong to their individual authors. CIS does not accept any responsibility, legal or otherwise, for the views and opinions of these individual authors. For an official statement from CIS on a particular issue, please contact us directly.