BHASHA - Indian Languages Digital Festival
Subhashish Panigrahi gave a talk at the Bhasha- India Languages Digital Festival a conference organized by news media YourStory, at New Delhi on March 11, 2016. In the panel "The challenges of making regional language content available on the Web and on mobiles" Panigrahi spoke about some of the challenges in growing the Indian-language Wikipedia projects and the communities.
The notes for the talk are below:
Wikipedia, that exists in many Indian languages are not known to masses. I personally did not know about the Odia Wikipedia until 2011 when a friend told me about its existence. Back then the project was completely inactive. And then a couple of friends and I started contributing, more contributors joined and the project grew up to what it is today. The site now has over 300,000 visitors every month and it is the largest visited site in Odia language on the Internet. This could probably be the same case for your language.
It has been quite challenging in the past to grow the Indian-language Wikipedia projects. There are many challenges and I would talk about eight of them:
- 1. Language communities
The language communities of many of the Indian languages are such that many of them even do no know how to search any information online in their language typed in their script. Some even share that because Google's home page does not have their script means that their language does not exist on the Internet. There exist a large gap of ignorance.
- 2. Wikipedia's editor community
Wikipedia, as you all know, is written by people like you and me. And from writing to editing everything happens voluntarily. As many people do not probably know or do not try to learn that they themselves can correct the mistakes and inaccuracy that exist in many Wikipedia articles. The Wikipedia editor communities for several Indian languages are really small. When these languages are spoken by millions of people, only a handful editors contribute in editing the Wikipedia in these languages.
- 3. Language input
There is a vast majority of people in this country that do not know how to type in their own language.
- 4. Low availability of Indian-language content on the Internet
There are two stages to the lack of Open Access to information. First, lack of native language content on the Internet bars many to access knowledge. Take the example of my state. When the Kerala government's official tourism portal is available in Odia and other Indian languages, my state, the Odisha government's tourism portal has no information in Odia-language. Our languages are neglected largely in our own states.
- 5. Mismatch of conventional and new media
Many conventional have still been using non-standard variants of ASCII/ISCII script encoding systems instead of adopting the Unicode standard. Unicode being a global standard and having the advantage of unifying the world has been available for Indian languages about 25 years now.[1] But many of our traditional media has failed to adopt this. Malayala Manorama, one of the most circulated dailies in Malayalam languages and one of the oldest Indian newspaper still has not started using Unicode on their website. Same is the case for many other newspapers in India.
- 6. Lack of Open Access
The information produced on the Internet in general and by the government, in particular, are mostly copyrighted. The paywalled garden of copyright restrictions keeps the information closed and stop people from sharing and learning more. On the contrary, Wikipedia is available under a Creative Commons Share-Alike license which allows anyone to make use of the content and even distribute commercial copies of its content. The idea of opening up information for masses in a free license could make the information reach to millions of people.
- 7. Mobile input
With over 1 billion people with mobile phones, the 15% internet penetration rate will soon grow meaning lot many Indians will have access to the Internet. If these people are not educated about native language input then they will be victims of the English-centric Internet rather than being able to enjoy the virtue of the same. Many Indians that have smartphones need pre-built input methods to be able to contribute in their own language Wikipedia.
- 8. People with disabilities
Many cannot read, speak and write. India has over 60 million people with hearing impairment.[2] There is a need for good quality text to and speech to text engine for these languages. Also, these software products have to be free software so common people, that cannot afford to buy expensive proprietary software like JAWS, can contribute to Wikipedia in their language. Many text-to-speech engines that are available today for Indian languages sound so mechanical that it is way hard for common speakers to use them.