Centre for Internet & Society

Depicting human language within computing environments has always been a challenge: a given language's script and alphabet needs to be mapped to a coding system that a computer can process digitally. This is done by way of an encoding system that basically maps each character to a unique numeric code.

The article was published by DNA on February 3, 2016.


This was a standard approach for dealing with languages in the computing context. However, over time, many such encoding systems mushroomed. In 2012, Odisha-based non-profit Srujanika, with help from colleagues, created two text encoding converters that could convert two different legacy non-Unicode based script encoding systems to the universally accepted Unicode. I personally tested and found a lot of typos. It seemed to me that one would take more time to convert and proofread than just typing the text.

Unicode is a computing industry standard that provides a unique number for every character of the alphabet irrespective of the platform, program or script. Before the onset of Unicode there existed several other standards—such as American Standard Code for Information Interchange (ASCII) and Indian Script Code for Information Interchange (ISCII)—that defined the manner in which letters of a particular language were depicted on a computer. The text encoding converters generally are used to convert them from one encoding systems to another.

However, as proprietary and legacy encoding systems were so popular among the desktop publishing (DTP) operators, most Indian language media houses remained tied to their existing encoding systems even after Unicode was introduced. This led to editors, journalists, writers and many native language users having to reliable and intuitive way to input in their own language. For example, Unicode Odia resulted in a huge gap of Odia-language content online with users that depended on earlier, disjointed standards.

The converters I explained before could solve this problem partially as they could convert only two encoding systems with about 80% linguistic accuracy. While seeking help to enhance and scale up these existing converters, three Wikimedian-developers came forward to work on the available converters and create more foolproof ones. We worked together for hours spanning over a few months to make the converters better. When I asked my writer and journalist friends to test it, the result literally thrilled me as they all had started writing in Odia on Facebook the very next day.

More blogs started coming in Odia and more social media interaction in Odia. Interestingly, popular newspaper Sarbasadharana.com and an online portal Odisha.com used it. Many even started contributing to blogs and online portals. It became much easier for Wikimedians to use existing resources from portals, newspapers and magazines to enrich Wikipedia. Some of the available soft copies of public domain books acquired and books that were relicensed to CC licenses could easily be used on Wikisource.

Though it is difficult to measure the exact percentage of growth for online Odia-language content on the Internet, a significant change is seen today as compared to the state of the Odia language on the internet six months ago. Almost all the federal entities that were stuck with two non-Unicode encoding systems finally moved to Unicode, with official portal odia.odisha.gov.in including adoption of Unicode in their core policy. As a gesture of support to the development, the federal department has included Odia Wikipedia on the top of their resources page.

Recently, Jnanaranjan Sahu, one of core contributors to the project combined all the converters into a standalone on-wiki converter that is available both on Wikipedia and Wikisource. Many of the larger Odia language community have contributed in finding errors which were fixed. Jyanaranjan has made available a free online responsive converter that not just works from a computer but also seamlessly work from any smartphone. The converter has indeed helped to widely use Odia on the Internet. The bigger dream of an Odia version of Google is closer to becoming real.

The views and opinions expressed on this page are those of their individual authors. Unless the opposite is explicitly stated, or unless the opposite may be reasonably inferred, CIS does not subscribe to these views and opinions which belong to their individual authors. CIS does not accept any responsibility, legal or otherwise, for the views and opinions of these individual authors. For an official statement from CIS on a particular issue, please contact us directly.