Centre for Internet & Society
Eight Challenges That Indian-Language Wikipedias Need to Overcome

From language input to Unicode standards, Indian-language Wikipedias need a sustained effort from the community. Credit: Johann Dréo CC BY 2.0/Flickr

Even after a decade of existence, Indian language Wikipedias are not yet known to many Indian language speakers. Wikipedia, being the largest available encyclopedia made in the human history, it what it is today because of the hundreds and thousands of volunteer-editors. But while native-language Wikipedias are becoming game-changers in other corners of the world, the scenario in India is skewed. In my experience, here are a number of challenges that Indian-language Wikipedias are currently facing.

The article was published in the Wire on March 17, 2016. A version of the article was also mirrored by Opensource.com on March 28, 2016.


1. Language communities:

The language communities of many of the Indian languages are such that many of them do not know how to search for information online, in their language typed in their script. Some of these communities even believe that because Google’s home page does not have their script, their language does not exist on the Internet. Starting with five Indian languages as the language of its interface, Google now has has nine Indian languages. But this does not stop a Santali or Manipuri user to search in Unicode Ol chiki (script for Santali) or in Unicode Meithei (script for Manipuri). Google or any search engine for that matter will display anything available in any script on the Internet. But the lack of this very thing is keeping many people away from being connected to the Internet in general and Wikipedia in particular.\

2. Wikipedia’s editor community:

Wikipedia  is written by people like you and me. And from writing to editing everything happens voluntarily. As many people do not probably know, or do not try to learn, anybody can correct the mistakes and inaccuracies that exist in many Wikipedia articles. The Wikipedia editor communities for several Indian languages are really small. While these languages are spoken by millions of people, only a handful editors contribute in editing the Wikipedia in these languages. In January this year, the Hindi Wikipedia, for instance, had only 89 editors while the total number of Hindi speakers would be over 550 million.

3. Language input in computers:

A vast majority of people in this country do not know how to type in their own language.. There is also little documentation for users to learn about language input. Even though many government-run schools in India are seeing a proliferation of more computers and Internet access, native language input and several other essential training of basic computing are not widely taught in schools in all states. What is sad is that there is a wide variety of free software for native-language input and the challenges of typing in Indian languages that existed a few years back has almost gone.

4. Language input in mobile devices:

With over 1 billion people with mobile phones, the 15% internet penetration rate will soon grow at a faster pace. This in turn — and also tough competition that compel TSPs to drop data charges — will help many Indians get access to the Internet . If these people are not educated about native language input then they will be victims of the English-centric Internet rather than being able to enjoy the virtue of the same. Many Indians that have smartphones need full Indian language support and especially inbuilt input methods to be able to contribute in their own language Wikipedia.

5. Low availability of Indian-language content on the Internet:

Lack of native language content on the Internet is another major factor in the low adoption of Indian language Wikpedias. As per an Internet and Mobile Association of India survey conducted in 2012, over 6% of the population is left behind from joining the online sphere simply because of lack of content in their languages. Take, for instance, my state odisha.While the Kerala government’s official tourism portal is available in Odia and other Indian languages, the Odisha government’s tourism portal itself has no information in Odia-language today. Our languages are neglected largely in our own states.

6. Mismatch of conventional and new media:

Many conventional media houses still continue to use non-standard variants of the ASCII/ISCII script encoding systems, instead of adopting the Unicode standard. Unicode being a global standard, and having the advantage of unifying the world, has been available for Indian languages for almost 25 years now. But much of our vernacular print media has failed to adopt this. Consequently, many popular Indian-language newspapers are yet to become available in Unicode on the open Internet.

7. Lack of Open Access:

Majority of the information produced on the Internet in general and by the government, in particular, are mostly copyrighted. The paywalled garden of copyright restrictions keeps the information closed and stop people from sharing and learning more. On the contrary, Wikipedia is available under a Creative Commons Share-Alike license which allows anyone to make use of the content and even distribute commercial copies of its content. The idea of opening up information for masses in a free license could make  information reach millions of people.

8. People with disabilities:

Many people cannot read, speak and write. India has over 60 million people with some form of hearing impairment. There is a desperate need for a high-quality text-to-speech and speech-to-text engine for people with physical disabilities. Also, these software products have to be free software so that common people, that cannot afford to buy expensive proprietary software like JAWS, can contribute to Wikipedia in their language. Many text-to-speech engines that are available today for Indian languages sound so mechanical that it is difficult for common speakers to use them.

Subhashish Panigrahi is an educator and free knowledge evangelist, and currently works for Communications, Program Capacity & Learning at Wikimedia Foundation, and Access to Knowledge at the Centre for Internet and Society.  Portions of this article came from a speech that Panigrahi gave at BHASHA: Indian Languages Digital Festival in New Delhi.

The views and opinions expressed on this page are those of their individual authors. Unless the opposite is explicitly stated, or unless the opposite may be reasonably inferred, CIS does not subscribe to these views and opinions which belong to their individual authors. CIS does not accept any responsibility, legal or otherwise, for the views and opinions of these individual authors. For an official statement from CIS on a particular issue, please contact us directly.