Centre for Internet & Society

The following is a compilation of the statistical update of the Indic language Wikipedias from January to June 2012. The author provides perspectives on the health of various Indic language communities as well as the state of various Indic language Wikipedias during the period.

The period of analysis is editor contributions between January 1, 2012 and June 30, 2012. (Read last year’s report here). The data for this report and analysis are based on the statistical data published at http://stats.wikimedia.org. Thanks to Erik Zachte for compiling all this information.

Some of the important points from this report are:

  • As always Indic wikipedia communities that are focused on community building had done well. Progress is slow but the results are steady and sustainable.
  • The communities that have made substantial progress in community building are Urdu, Oriya, Assamese, and Malayalam. (among this, for Urdu Wikipedia most of the activity is from Pakistan). The most recent entry to this club is Punjabi which will show up in the statistics of next few months.
  • Providing adequate support for newbies is very much required after each outreach. But many communities are failing here. This is affecting the conversion rate even though many outreach activities are happening across the country,
  • As seen in the past the readership of Indic language wikipedias is still growing up.

This report is presented in the following sequence.

  1. Community
  2. Content
  3. Readership

Community

As community is the backbone of every Indic language Wikipedia, it is important that the respective language wiki communities give adequate importance to community building. Many language communities are still not understanding the importance of building the community. To achieve the goal of building free knowledge database in the respective language we need participation from maximum number of speakers of the respective language. The following table gives information on two important parameters about the community in the respective language Wikipedia:

  • Number of users who had 100 or more edits in a month (high active Wikipedians)
  • Number of users who had at least 5 or more edits a month (active Wikipedians)

User growth in Indic language Wikipedias during 2012 January-June

Some of the important information that we can make out from this table are:

  • The number of high active editors (editors with more than 100 edits per month) are the backbone of each language wikipedia. Apart from doing normal article editing they are the users maintaining the wiki. Tamil and Malayalam continue to be on the top spot with almost 24 active users . Marathi, Gujarati, Oriya, Punjabi, and Urdu also showed growth in the number of high active users.
  • Assamese Wikipedia is showing a reduction in the number of high active users even though its number of active users increased. This means that Assamese Wikipedia requires some more current active users to take up the role of wiki adminship and similar leadership roles.
  • The number of active users (editors with more than 5 edits per month) give an overview of the overall activity in wikipedia. Here also Malayalam and Tamil continue to be on the top. Some of the languages that showed notable growth in the number of active users are Urdu, Oriya, and Assamese. As we know there are lot of community building activities happening in both Oriya and Assamese. Along with community building activities both the communities are making sure they are providing sufficient support to newbies using various options. And there efforts are showing up in the form of community strength.
  • The number of active members in Odia has increased to 25 which means community has grown 3 times over the past 6 months.
  • The number of wiki editors per million for most Indic languages is still below 1. This shows that awareness about Indic language wiki projects is still an issue for most Indic Wikipedias. From this statistics (http://stats.wikimedia.org/EN/Sitemap.htm), we can see that for Sanskrit the number of editors per million speakers has become 280 which is one of the highest in the world. No other Indic language Wikipedia is near Sanskrit in this parameter. Malayalam comes second with 3 editors per million and Assamese and Bishnupriya Manipuri comes third with 2 editors per million. Tamil is in the fourth place with 1 editor. For all other Indic languages the number of editors per million population is below 1. This shows that still the penetration of respective language Wikipedia among the speakers of a language is very low. We need more outreach programs to reach the speakers of the respective language.

Content

The number of articles is an important parameter which has misguided some wiki communities. However, it is a very important parameter if communities are increasing the number of articles in a way helpful to the readers of the wiki.

  • Hindi continues to be on the top spot with 1,02,902. During the past 6 months almost 2000 articles got added to Hindi Wikipedia.
  • Telugu Wikipedia crossed the 50,000 article milestone is one of the major accomplishments during this period. I remember reading the news about Telugu Wikipedia crossing the 30,000 article milestone in June 2007 which shows that it took almost 5 years to reach 50,000 article milestone. As pointed out by User:Veeven in his blog post (about Telugu Wikipedia crossing the 50,000 articles), Telugu Wikipedia needs more support from Telugu speaking population to build the free knowledge project in Telugu. The current number of active users in Telugu Wikipedia is not showing justice to the huge speaker base (more than 8 crores) of Telugu.
  • Another major milestone was Assamese Wikipedia crossing the 1,000 article milestone.
  • Tamil and Malayalam are the two language wikipedias that added most number of articles during this time period. Both the language wikipedias added close to 3000 articles.
  • Sindhi, Newari (Nepal Bhasha) and Bishnupriya language wikipedias showed reduction in the number of articles. There are 2 reasons for this:
  1. There is no active community to add new articles (see the first table for the number of active users)
  2. Spam/vandalism pages were deleted by stewards/global sysops.

Readers (Pageview)

Number of people visiting the website continue to increase for all Indic language wikipedias and the total visits for all Indic language wikipedias combined is close to 4 crore now.

Please note that the information available in the below table is the total visits (page views) for a language wikipedia for a month from all the platforms combined. It includes visits by readers and editors.  This is NOT the list of Number of Unique Visitors to the website.

(The number of readers shown in the below table is in lakhs)

Growth of Readers during January 2012 - June 2012

(The Number of Readers shown in the above table is in lakhs)

  • For most of the Indic languages readership has gone up. For Assamese and Odia it almost doubled.
  • Among big languages unlike the number of active users when it comes to readers most Indic languages are doing justice to its speaking population volume. So even though many of our speakers are not editing the respective language wikipedia they are reading it. Bengali and Telugu are two languages that behaves different here which shows that awareness is very low for both the languages.
  • As the Indic language support in smart phones and different OSs is in better position now, I am sure the readership is going to increase further in future.

Still a major percentage of our speakers (I mean speakers who has access to internet) doesn’t know that there is a Wikipedia exists in their own mother language and they not using it is a big issue. If our reader base is not increasing it will affect the community growth also. Hope things will improve as at least few language communities are involved in various awareness and outreach programs.


Originally posted at http://shijualex.wordpress.com/2012/09/24/indic-language-wikipedias-statistical-report-2012-january-2012-june/

https://www.google.com/accounts/o8/id?id=AItOawm7QgsoyjnbkCgiU1kkCMx3rIJDYDIoJgY
https://www.google.com/accounts/o8/id?id=AItOawm7QgsoyjnbkCgiU1kkCMx3rIJDYDIoJgY says:
Sep 25, 2012 11:43 AM

Shiju,

You seem to have confused page views with number of readers (even in earlier report as well). As per comscore report at http://reportcard.wmflabs.org/, the number of Unique visitors is around 2 crores as of Jun 2012. You may like to check.

Shiju Alex
Shiju Alex says:
Sep 26, 2012 08:53 AM

Yes. I am aware about this. That is why I haven't used the word "Unique vistors" any where. To make the things clear I have added an explanation in the section about Readers. From the next report I will present this section differntly. Thanks for pointing out this probable confusion.

https://www.google.com/accounts/o8/id?id=AItOawm7QgsoyjnbkCgiU1kkCMx3rIJDYDIoJgY
https://www.google.com/accounts/o8/id?id=AItOawm7QgsoyjnbkCgiU1kkCMx3rIJDYDIoJgY says:
Sep 28, 2012 03:41 PM

Thanks for the updates.

Commenting has been disabled.
The views and opinions expressed on this page are those of their individual authors. Unless the opposite is explicitly stated, or unless the opposite may be reasonably inferred, CIS does not subscribe to these views and opinions which belong to their individual authors. CIS does not accept any responsibility, legal or otherwise, for the views and opinions of these individual authors. For an official statement from CIS on a particular issue, please contact us directly.