Centre for Internet & Society

This research seeks to understand the most effective way of researching Big Data in the Global South. Towards this goal, the research planned for the development of a Global South big data Research Network that identifies the potential opportunities and harms of big data in the Global South and possible policy solutions and interventions.

This work has been made possible by a grant from the John D. and Catherine T. MacArthur Foundation. The conclusions, opinions, or points of view expressed in the report are those of the authors and do not necessarily represent the views of the John D. and Catherine T. MacArthur Foundation.


Introduction

The research was for a duration of 12 months and in form of an exploratory study which sought to understand the potential opportunity and harm of big data as well as to identify best practices and relevant policy recommendations. Each case study has been chosen based on the use of big data in the area and the opportunity that is present for policy recommendation and reform. Each case study will seek to answer a similar set of questions to allow for analysis across case studies.

What is Big Data

Big data has been ascribed a number of definitions and characteristics. Any study of big data must begin with first conceptualizing defining what big data is. Over the past few years, this term has been become a buzzword, used to refer to any number of characteristics of a dataset ranging from size to rate of accumulation to the technology in use.[1]

Many commentators have critiqued the term big data as a misnomer and misleading in its emphasis on size. We have done a survey of various definitions and understandings of big data and we document the significant ones below.

Computational Challenges

The condition of data sets being large and taxing the capacities of main memory, local disk, and remote disk have been seen as problems that big data solves. While this understanding of big data focusses only on one of its features—size, other characteristics posing a computational challenge to existing technologies have also been examined. The (US) National Institute of Science and Technology has defined big data as data which “exceed(s) the capacity or capability of current or conventional methods and systems.” [2]

These challenges are not merely a function of its size. Thomas Davenport provides a cohesive definition of big data in this context. According to him, big data is “data that is too big to fit on a single server, too unstructured to fit into a row-and-column database, or too continuously flowing to fit into a static data warehouse.” [3]

Data Characteristics

The most popular definition of big data was put forth in a report by Meta (now Gartner) in 2001, which looks at it in terms of the three 3V’s—volume[4], velocity and variety. It is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.[5]

Aside from volume, velocity and variety, other defining characteristics of big data articulated by different commentators are— exhaustiveness,[6] granularity (fine grained and uniquely indexical),[7] scalability,[8] veracity,[9] value[10] and variability.[11] It is highly unlikely that any data-sets satisfy all of the above characteristics. Therefore, it is important to determine what permutation and combination of these gamut of attributes lead us to classifying something as big data.

Qualitative Attributes

Prof. Rob Kitchin has argued that big data is qualitatively different from traditional, small data. Small data has used sampling techniques for collection of data and has been limited in scope, temporality and size, and are “inflexible in their administration and generation.”[12]

In this respect there are two qualitative attributes of big data which distinguish them from traditional data. First, the ability of big data technologies to accommodate unstructured and diverse datasets which hitherto were of no use to data processors is a defining feature. This allows the inclusion of many new forms of data from new and data heavy sources such as social media and digital footprints. The second attribute is the relationality of big data.[13]

This relies on the presence of common fields across datasets which allow for conjoining of different databases. This attribute is usually a feature of not the size but the complexity of data enabling high degree of permutations and interactions within and across data sets.

Patterns and Inferences

Instead of focussing on the ontological attributes or computational challenges of big data, Kenneth Cukier and Viktor Mayer Schöenberger define big data in terms of what it can achieve.[14]

They defined big data as the ability to harness information in novel ways to produce useful insights or goods and services of significant value. Building on this definition, Rohan Samarajiva has categorised big data into non-behavioral big data and behavioral big data. The latter leads to insights about human behavior.[15]

Samarajiva believes that transaction-generated data (commercial as well as non-commercial) in a networked infrastructure is what constitutes behavioral big data. Scope of Research The initial scope arrived at for this case-study on role of big data in governance in India focussed on the UID Project, the Digital India Programme and the Smart Cities Mission. Digital India is a programme launched by the Government of India to ensure that Government services are made available to citizens electronically by improving online infrastructure and by increasing Internet connectivity or by making the country digitally empowered in the field of technology.[16]

The Programme has nine components, two of which focus on e-governance schemes. Read More [PDF, 1948 Kb]


[1]. Thomas Davenport, Big Data at Work: Dispelling the Myths, Uncovering the opportunities, Harvard Business Review Press, Boston, 2014.

[2]. MIT Technology Review, The Big Data Conundrum: How to Define It?, available at https://www. technologyreview.com/s/519851/the-big-data-conundrum-how-to-define-it/

[3]. Supra note 1.

[4]. What constitutes as high volume remains an unresolved matter. Intel defined Big Data volumes are emerging in organizations generating a median of 300 terabytes of data a week.

[5]. http://www.gartner.com/it-glossary/big-data/

[6]. Viktor Mayer Schöenberger and Kenneth Cukier, Big Data: A Revolution that will transform how we live, work and think” John Murray, London, 2013.

[7]. Rob Kitchin, The Data Revolution: Big Data, Open Data, Data Infrastructures and their consequences, Sage, London, 2014.

[8]. Nathan Marz and James Warren, Big Data: Principles and best practices of scalable realtime data systems, Manning Publication, New York, 2015.

[9]. Bernard Marr, Big Data: the 5 Vs everyone should know, available at https://www.linkedin. com/pulse/20140306073407-64875646-big-data-the-5-vs-everyone-must-know.

[10]. Id.

[11]. Eileen McNulty, Understanding Big Data: the 7 Vs, available at http://dataconomy.com/sevenvs-big-data/.

[12]. Supra Note 7.

[13]. Danah Boyd and Kate Crawford, Critical questions for big data. Information, Communication and Society 15(5): 662–679, available at https://www.researchgate.net/publication/281748849_Critical_questions_for_big_data_Provocations_for_a_cultural_technological_and_scholarly_ phenomenon

[14]. Supra Note 6.

[15]. Rohan Samarajiva, What is Big Data, available at http://lirneasia.net/2015/11/what-is-bigdata/.

[16]. http://www.digitalindia.gov.in/content/about-programme

The views and opinions expressed on this page are those of their individual authors. Unless the opposite is explicitly stated, or unless the opposite may be reasonably inferred, CIS does not subscribe to these views and opinions which belong to their individual authors. CIS does not accept any responsibility, legal or otherwise, for the views and opinions of these individual authors. For an official statement from CIS on a particular issue, please contact us directly.