Big Data in the Global South - An Analysis
I. Introduction
"The period that we have embarked upon is unprecedented in history in terms of our ability to learn about human behavior." [1]
The world we live in today is facing a slow but deliberate metamorphosis of decisive information; from the erstwhile monopoly of world leaders and the captains of industry obtained through regulated means, it has transformed into a relatively undervalued currency of knowledge collected from individual digital expressions over a vast network of interconnected electrical impulses.[2] This seemingly random deluge of binary numbers, when interpreted represents an intricately woven tapestry of the choices that define everyday life, made over virtual platforms. The machines we once employed for menial tasks have become sensorial observers of our desires, wants and needs, so much so that they might now predict the course of our future choices and decisions.[3] The patterns of human behaviour that are reflected within this data inform policy makers, in both a public and private context. The collective data obtained from our digital shadows thus forms a rapidly expanding storehouse of memory, from which interested parties can draw upon to resolve problems and enable a more efficient functioning of foundational institutions, such as the markets, the regulators and the government.[4]
The term used to describe a large volume of collected data, in a structured as well as unstructured form is called Big Data. This data requires niche technology, outside of traditional software databases, to process; simply because of its exponential increment in a relatively short period of time. Big Data is usually identified using a "three V" characterization - larger volume, greater variety and distinguishably high rates of velocity. [5] This is exemplified in the diverse sources from which this data is obtained; mobile phone records, climate sensors, social media content, GPS satellite identifications and patterns of employment, to name a few. Big data analytics refers to the tools and methodologies that aim to transform large quantities of raw data into "interpretable data", in order to study and discern the same so that causal relationships between events can be conclusively established.[6] Such analysis could allow for the encouragement of the positive effects of such data and a concentrated mitigation of negative outcomes.
This paper seeks to map out the practices of different governments, civil society, and the private sector with respect to the collection, interpretation and analysis of big data in the global south, illustrated across a background of significant events surrounding the use of big data in relevant contexts. This will be combined with an articulation of potential opportunities to use big data analytics within both the public and private spheres and an identification of the contextual challenges that may obstruct the efficient use of this data. The objective of this study is to deliberate upon how significant obstructions to the achievement of developmental goals within the global south can be overcome through an accurate recognition, interpretation and analysis of big data collected from diverse sources.
II. Uses of Big Data in the Global Development
Big Data for development is the process though which raw, unstructured and imperfect data is analyzed, interpreted and transformed into information that can be acted upon by governments and policy makers in various capacities. The amount of digital data available in the world today has grown from 150 exabytes in 2005 to 1200 exabytes in 2010.[7] It is predicted that this figure would increase by 40% annually in the next few years[8], which is close to 40 times growth of the world's population. [9] The implication of this is essentially that the share of available data in the world today that is less than a minute old is increasing at an exponential rate. Moreover, an increasing percentage of this data is produced and created real-time.
The data revolution that is incumbent upon us is characterized by a rapidly accumulating and continuously evolving stock of data prevalent` in both industrialized as well as developing countries. This data is extracted from technological services that act as sensors and reflect the behaviour of individuals in relation to their socio-economic circumstances.
For many global south countries, this data is generated through mobile phone technology. This trend is evident in Sub Saharan Africa, where mobile phone technology has been used as an effective substitute for often weak and unstructured State mechanisms such as faulty infrastructure, underdeveloped systems of banking and inferior telecommunication networks.[10]
For example, a recent study presented at the Data for Development session at the NetMob Conference at MIT used mobile phone data to analyze the impact of opening a new toll highway in Dakar, Senegal on human mobility, particularly how people commute to work in the metropolitan area. [11] A huge investment, the improved infrastructure is expected to result in a significant increase of people in and out of Dakar, along with the transport of essential goods. This would initiate rural development in the areas outside of Dakar and boost the value of land within the region.[12] The impact of the newly constructed highway can however only be analyzed effectively and accurately through the collection of this mobile phone data from actual commuters, on a real time basis.
Mobile phones technology is no longer used just for personal communication but has been transformed into an effective tool to secure employment opportunities, transfer money, determine stock options and assess the prices of various commodities.[13] This generates vast amounts of data about individuals and their interactions with the government and private sector companies. Internet Traffic is predicted to grow between 25 to 30 % in the next few years in North America, Western Europe and Japan but in Latin America, The Middle East and Africa this figure has been expected to touch close to 50%.[14] The bulk of this internet traffic can be traced back to mobile devices.
The potential applicability of Big Data for development at the most general level is the ability to provide an overview of the well being of a given population at a particular period of time.[15] This overcomes the relatively longer time lag that is prevalent with most other traditional forms of data collection. The analysis of this data has helped, to a large extent, uncover "digital smoke signals" - or inherent changes in the usage patterns of technological services, by individuals within communities.[16] This may act as an indicator of the changes in the underlying well-being of the community as a whole. This information about the well-being of a community derived from their usage of technology provides significantly relevant feedback to policy makers on the success or failure of particular schemes and can pin point changes that need to be made to status quo. [17]The hope is that this feedback delivered in real-time, would in turn lead to a more flexible and accessible system of international development, thus securing more measurable and sustained outcomes. [18]
The analysis of big data involves the use of advanced computational technology that can aid in the determination of trends, patterns and correlations within unstructured data so as to transform it into actionable information. It is hoped that this in addition to the human perspective and experience afforded to the process could enable decision makers to rely upon information that is both reliable and up to date to formulate durable and self-sustaining development policies.
The availability of raw data has to be adequately complemented with intent and a capacity to use it effectively. To this effect, there is an emerging volume of literature that seeks to characterize the primary sources of this Big Data as sharing certain easily distinguishable features. Firstly, it is digitally generated and can be stored in a binary format, thus making it susceptible to requisite manipulation by computers attempting to engage in its interpretation. It is passively produced as a by-product of digital interaction and can be automatically extracted for the purpose of continuous analysis. It is also geographically traceable within a predetermined time period. It is however important to note that "real time" does not necessarily refer to information occurring instantly but is reflective of the relatively short time in which the information is produced and made available thus making it relevant within the requisite timeframe. This allows efficient responsive action to be taken in a short span of time thus creating a feedback loop. [19]
In most cases the granularity of the data is preferably sought to be expanded over a larger spatial context such as a village or a community as opposed to an individual simply because this affords an adequate recognition of privacy concerns and the lack of definitive consent of the individuals in the extraction of this data. In order to ease the process of determination of this data, the UN Global Pulse has developed taxonomy of sorts to assess the types of data sources that are relevant to utilizing this information for development purposes.[20] These include the following sources;
Data Exhaust or the digital footprint left behind by individuals' use of technology for service oriented tasks such as web purchases, mobile phone transactions and real time information collected by UN agencies to monitor their projects such as levels of food grains in storage units, attendance in schools etc.
Online Information which includes user generated content on the internet such as news, blog entries and social media interactions which may be used to identify trends in human desires, perceptions and needs.
Physical sensors such as satellite or infrared imagery of infrastructural development, traffic patterns, light emissions and topographical changes, thus enabling the remote sensing of changes in human activity over a period of time.
Citizen reporting or crowd sourced data , which includes information produced on hotlines, mobile based surveys, customer generated maps etc. Although a passive source of data collection, this is a key instrument in assessing the efficacy of action oriented plans taken by decision makers.
The capacity to analyze this big data is hinged upon the reliance placed on technologically advanced processes such as powerful algorithms which can synthesize the abundance of raw data and break down the information enabling the identification of patterns and correlations. This process would rely on advanced visualization techniques such "sense-making tools"[21]
The identification of patterns within this data is carried out through a process of instituting a common framework for the analysis of this data. This requires the creation of a specific lexicon that would help tag and sort the collected data. This lexicon would specify what type of information is collected and who it is interpreted and collected by, the observer or the reporter. It would also aid in the determination of how the data is acquired and the qualitative and quantitative nature of the data. Finally, the spatial context of the data and the time frame within which it was collected constituting the aspects of where and when would be taken into consideration. The data would then be analyzed through a process of Filtering, Summarizing and Categorizing the data by transforming it into an appropriate collection of relevant indicators of a particular population demographic. [22]
The intensive mining of predominantly socioeconomic data is known as "reality mining" [23] and this can shed light on the processes and interactions that are reflected within the data. This is carried out via a tested three fold process. Firstly, the " Continuous Analysis over the streaming of the data", which involves the monitoring and analyzing high frequency data streams to extract often uncertain raw data. For example, the systematic gathering of the prices of products sold online over a period of time. Secondly, "The Online digestion of semi structured data and unstructured data", which includes news articles, reviews of services and products and opinion polls on social media that aid in the determination of public perception, trends and contemporary events that are generating interest across the globe. Thirdly, a 'Real-time Correlation of streaming data with slowly accessible historical data repositories,' which refers to the "mechanisms used for correlating and integrating data in real-time with historical records."[24] The purpose of this stage is to derive a contextualized perception of personalized information that seeks to add value to the data by providing a historical context to it. Big Data for development purposes would make use of a combination of these depending on the context and need.
(i) Policy Formulation
The world today has become increasingly volatile in terms of how the decisions of certain countries are beginning to have an impact on vulnerable communities within entirely different nations. Our global economy has become infinitely more susceptible to fluctuating conditions primarily because of its interconnectivity hinged upon transnational interdependence. The primordial instigators of most of these changes, including the nature of harvests, prices of essential commodities, employment structures and capital flows, have been financial and environmental disruptions. [25] According to the OECD, " Disruptive shocks to the global economy are likely to become more frequent and cause greater economic and social hardship. The economic spillover effects of events like the financial crisis or a potential pandemic will grow due to the increasing interconnectivity of the global economy and the speed with which people, goods and data travel."[26]
The local impacts of these fluctuations may not be easily visible or even traceable but could very well be severe and long lasting. A vibrant literature on the vulnerability of communities has highlighted the impacts of these shocks on communities often causing children to drop out of school, families to sell their productive assets, and communities to place a greater reliance on state rations.[27] These vulnerabilities cannot be definitively discerned through traditional systems of monitoring and information collection. The evidence of the effects of these shocks often take too long to reach decision makers; who are unable to formulate effective policies without ascertaining the nature and extent of the hardships suffered by these in a given context. The existing early warning systems in place do help raise flags and draw attention to the problem but their reach is limited and veracity compromised due to the time it takes to extract and collate this information through traditional means. These traditional systems of information collection are difficult to implement within rural impoverished areas and the data collected is not always reliable due to the significant time gap in its collection and subsequent interpretation. Data collected from surveys does provide an insight into the state of affairs of communities across demographics but this requires time to be collected, processed, verified and eventually published. Further, the expenses incurred in this process often prove to be difficult to offset.
The digital revolution therefore provides a significant opportunity to gain a richer and deeper insight into the very nature and evolution of the human experience itself thus affording a more legitimate platform upon which policy deliberations can be articulated. This data driven decision making, once the monopoly of private institutions such as The World Economic Forum and The McKinsey Institute [28] has now emerged at the forefront of the public policy discourse. Civil society has also expressed an eagerness to be more actively involved in the collection of real-time data after having perceived its benefits. This is evidenced by the emergence of 'crowd sourcing'[29] and other 'participatory sensing' [30] efforts that are founded upon the commonalities shared by like minded communities of individuals. This is being done on easily accessible platforms such as mobile phone interfaces, hand-held radio devices and geospatial technologies. [31]
The predictive nature of patterns identifiable from big data is extremely relevant for the purpose of developing socio-economic policies that seek to bridge problem-solution gaps and create a conducive environment for growth and development. Mobile phone technology has been able to quantify human behavior on an unprecedented scale.[32] This includes being able to detect changes in standard commuting patterns of individuals based on their employment status[33] and estimating a country's GDP in real-time by measuring the nature and extent of light emissions through remote sensing. [34]
A recent research study has concluded that "due to the relative frequency of certain queries being highly correlated with the percentage of physician visits in which individuals present influenza symptoms, it has been possible to accurately estimate the levels of influenza activity in each region of the United States, with a reporting lag of just a day." Online data has thus been used as a part of syndromic surveillance efforts also known as infodemiology. [35] The US Centre for Disease Control has concluded that mining vast quantities of data through online health related queries can help detect disease outbreaks " before they have been confirmed through a diagnosis or a laboratory confirmation." [36] Google trends works in a similar way.
Another public health monitoring system known as the Healthmap project compiles seemingly fragmented data from news articles, social media, eye-witness reports and expert discussions based on validated studies to "achieve a unified and comprehensive view of the current global state of infectious diseases" that may be visualized on a map. [37]
Big Data used for development purpose can reduce the reliance on human inputs thus narrowing the room for error and ensuring the accuracy of information collected upon which policy makers can base their decisions.
(ii) Advocacy and Social Change
Due to the ability of Big Data to provide an unprecedented depth of detail on particular issues, it has often been used as a vehicle of advocacy to highlight various issues in great detail. This makes it possible to ensure that citizens are provided with a far more participative experience, capturing their attention and hence better communicating these problems. Numerous websites have been able to use this method of crowd sourcing to broadcast socially relevant issues[38]. Moreover, the massive increase in access to the internet has dramatically improved the scope for activism through the use of volunteered data due to which advocates can now collect data from volunteers more effectively and present these issues in various forums. Websites like Ushahidi[39] and the Black Monday Movement [40] being prime examples of the same. These platforms have championed various causes, consistently exposing significant social crises' that would otherwise go unnoticed.
The Ushahidi application used crowd sourcing mechanisms in the aftermath of the Haiti earthquake to set up a centralized messaging system that allowed mobile phone users to provide information on injured and trapped people.[41] An analysis of the data showed that the concentration of text messages was correlated with the areas where there was an increased concentration of damaged buildings. [42] Patrick Meier of Ushahidi noted "These results were evidence of the system's ability to predict, with surprising accuracy and statistical significance, the location and extent of structural damage post the earthquake." [43]
Another problem that data advocacy hopes to tackle, however, is that of too much exposure, with advocates providing information to various parties to help ensure that there exists no unwarranted digital surveillance and that sensitive advocacy tools and information are not used inappropriately. An interesting illustration of the same is The Tactical Technology Collective[44] that hopes to improve the use of technology by activists and various other political actors. The organization, through various mediums such as films, events etc. hopes to train activists regarding data protection and privacy awareness and skills among human rights activists. Additionally, Tactical Technology also assists in ensuring that information is used in an appealing and relevant manner by human rights activists and in the field of capacity building for the purposes of data advocacy.
Observed data such as mobile phone records generated through network operators as well as through the use of social media are beginning to embody an omnipotent role in the development of academia through detailed research. This is due to the ability of this data to provide microcosms of information within both contexts of finer granularity and over larger public spaces. In the wake of natural disasters, this can be extremely useful, as reflected by the work of Flowminder after the 2010 Haiti earthquake.[45] A similar string of interpretive analysis can be carried out in instances of conflict and crises over varying spans of time. Flowminder used the geospatial locations of 1.9 million subscriber identity modules in Haiti, beginning 42 days before the earthquake and 158 days after it. This information allowed researches to empirically determine the migration patterns of population post the earthquake and enabled a subsequent UNFPA household survey.[46] In a similar capacity, the UN Global Pulse is seeking to assist in the process of consultation and deliberation on the specific targets of the millennium development goals through a framework of visual analytics that represent the big data procured on each of the topics proposed for the post- 2015 agenda online.[47]
A recent announcement of collaboration between RTI International, a non-profit research organization and IBM research lab looks promising in its initiative to utilize big data analytics in schools within Mombasa County, Kenya.[48] The partnership seeks to develop testing systems that would capture data that would assist governments, non-profit organizations and private enterprises in making more informed decisions regarding the development of education and human resources within the region. Äs observed by Dr. Kamal Bhattacharya, The Vice President of IBM Research, "A significant lack of data on Africa in the past has led to misunderstandings regarding the history, economic performance and potential of the government." The project seeks to improve transparency and accountability within the schooling system in more than 100 institutions across the county. The teachers would be equipped with tablet devices to collate the data about students, classrooms and resources. This would allow an analysis of the correlation between the three aspects thus enabling better policy formulation and a more focused approach to bettering the school system. [49] This is a part of the United States Agency for International Development's Education Data for Decision Making (EdData II) project. According to Dr Kommy Weldemariam, Research Scientist , IBM Research, "… there has been a significant struggle in making informed decisions as to how to invest in and improve the quality and content of education within Sub-Saharan Africa. The Project would create a school census hub which would enable the collection of accurate data regarding performance, attendance and resources at schools. This would provide valuable insight into the building of childhood development programs that would significantly impact the development of an efficient human capital pool in the near future."[50]
A similar initiative has been undertaken by Apple and IBM in the development of the "Student Achievement App" which seeks to use this data for "content analysis of student learning". The Application as a teaching tool that analyses the data provided to develop actionable intelligence on a per-student basis." [51] This would give educators a deeper understanding of the outcome of teaching methodologies and subsequently enable better leaning. The impact of this would be a significant restructuring of how education is delivered. At a recent IBM sponsored workshop on education held in India last year , Katharine Frase, IBM CTO of Public Sector predicted that "classrooms will look significantly different within a decade than they have looked over the last 200 years."[52]
(iii) Access and the exchange of information
Big data used for development serves as an important information intermediary that allows for the creation of a unified space within which unstructured heterogeneous data can be efficiently organized to create a collaborative system of information. New interactive platforms enable the process of information exchange though an internal vetting and curation that ensures accessibility to reliable and accurate information. This encourages active citizen participation in the articulation of demands from the government, thus enabling the actualization of the role of the electorate in determining specific policy decisions.
The Grameen Foundation's AppLab in Kampala aids in the development of tools that can use the information from micro financing transactions of clients to identify financial plans and instruments that would be be more suitable to their needs.[53] Thus, through working within a community, this technology connects its clients in a web of information sharing that they both contribute to and access after the source of the information has been made anonymous. This allows the individual members of the community to benefit from this common pool of knowledge. The AppLab was able to identify the emergence of a new crop pest from an increase in online searches for an unusual string of search terms within a particular region. Using this as an early warning signal, the Grameen bank sent extension officers to the location to check the crops and the pest contamination was dealt with effectively before it could spread any further.[54]
(iv) Accountability and Transparency
Big data enables participatory contributions from the electorate in existing functions such as budgeting and communication thus enabling connections between the citizens, the power brokers and elites. The extraction of information and increasing transparency around data networks is also integral to building a self-sustaining system of data collection and analysis. However it is important to note that this information collected must be duly analyzed in a responsible manner. Checking the veracity of the information collected and facilitating individual accountability would encourage more enthusiastic responses from the general populous thus creating a conducive environment to elicit the requisite information. The effectiveness of the policies formulated by relying on this information would rest on the accuracy of such information.
An example of this is Chequeado, a non-profit Argentinean media outlet that specializes in fact-checking. It works on a model of crowd sourcing information on the basis of which it has fact checked everything from the live presidential speech to congressional debates that have been made open to the public. [55] It established a user friendly public database, DatoCHQ, in 2014 which allowed its followers to participate in live fact-checks by sending in data, which included references, facts, articles and questions, through twitter. [56] This allowed citizens to corroborate the promises made by their leaders and instilled a sense of trust in the government.
III. Big Data and Smart Cities in the Global South
Smart cities have become a buzzword in South Asia, especially after the Indian government led by Prime Minister Narendra Modi made a commitment to build 100 smart cities in India[57]. A smart city is essentially designed as a hub where the information and communication technologies (ICT) are used to create feedback loops with an almost minimum time gap. In traditional contexts, surveys carried out through a state sponsored census were the only source of systematic data collection. However these surveys are long drawn out processes that often result in a drain on State resources. Additionally, the information obtained is not always accurate and policy makers are often hesitant to base their decisions on this information. The collection of data can however be extremely useful in improving the functionality of the city in terms of both the 'hard' or physical aspects of the infrastructural environment as well as the 'soft' services it provides to citizens. One model of enabling this data collection, to this effect, is a centrally structured framework of sensors that may be able to determine movements and behaviors in real-time, from which the data obtained can be subsequently analyzed. For example, sensors placed under parking spaces at intersections can relay such information in short spans of time. South Korea has managed to implement a similar structure within its smart city, Songdo.[58]
Another approach to this smart city model is using crowd sourced information through apps, either developed by volunteers or private conglomerates. These allow for the resolving of specific problems by organizing raw data into sets of information that are attuned to the needs of the public in a cohesive manner. However, this system would require a highly structured format of data sets, without which significantly transformational result would be difficult to achieve.[59]
There does however exist a middle ground, which allows the beneficiaries of this network, the citizens, to take on the role of primary sensors of information. This method is both cost effective and allows for an experimentation process within which an appropriate measure of the success or failure of the model would be discernible in a timely manner. It is especially relevant in fast growing cities that suffer congestion and breakdown of infrastructure due to the unprecedented population growth. This population is now afforded with the opportunity to become a part of the solution.
The principle challenge associated with extracting this Big Data is its restricted access. Most organizations that are able to collect this big data efficiently are private conglomerates and business enterprises, who use this data to give themselves a competitive edge in the market, by being able to efficiently identify the needs and wants of their clientele. These organizations are reluctant to release information and statistics because they fear it would result in them losing their competitive edge and they would consequently lose the opportunity to benefit monetarily from the data collected. Data leaks would also result in the company getting a bad name and its reputation could be significantly hampered. Despite the individual anonymity, the transaction costs incurred in ensuring the data of their individual customers is protected is often an expensive process. In addition to this there is a definite human capital gap resulting from the significant lack of scientists and analysts to interpret raw data transmitted across various channels.
(i) Big Data in Urban Planning
Urban planning would require data that is reflective of the land use patterns of communities, combined with their travel descriptions and housing preferences. The mobility of individuals is dependent on their economic conditions and can be determined through an analysis of their purchases, either via online transactions or from the data accumulated by prominent stores. The primary source of this data is however mobile phones, which seemed to have transcend economic barriers. Secondary sources include cards used on public transport such as the Oyster card in London and the similar Octopus card used in Hong Kong. However, in most developing countries these cards are not available for public transport systems and therefore mobile network data forms the backbone of data analytics. An excessive reliance on the data collected through Smart phones could however be detrimental, especially in developing countries, simply because the usage itself would most likely be concentrated amongst more economically stable demographics and the findings from this data could potentially marginalize the poor.[60]
Mobile network big data (MNBD) is generated by all phones and includes CDRs, which are obtained from calls or texts that are sent or received, internet usage, topping up a prepaid value and VLR or Visitor Location Registry data which is generated whenever the phone is question has power. It essentially communicates to the Base Transceiver Stations (BSTs) that the phone is in the coverage area. The CDR includes records of calls made, duration of the call and information about the device. It is therefore stored for a longer period of time. The VLR data is however larger in volume and can be written over. Both VLR and CDR data can provide invaluable information that can be used for urban planning strategies. [61] LIRNEasia, a regional policy and regulation think-tank has carried out an extensive study demonstrating the value of MNBD in SriLanka.[62] This has been used to understand and sometimes even monitor land use patterns, travel patterns during peak and off seasons and the congregation of communities across regions. This study was however only undertaken after the data had been suitably pseudonymised.[63] The study revealed that MNBD was incredibly valuable in generating important information that could be used by policy formulators and decision makers, because of two primary characteristics. Firstly, it comes close to a comprehensive coverage of the demographic within developing countries, thus using mobile phones as sensors to generate useful data. Secondly, people using mobile phones across vast geographic areas reflect important information regarding patterns of their travel and movement. [64]
MNBD allows for the tracking and mapping of changes in population densities on a daily basis, thus identifying 'home' and 'work' locations, informing policy makers of population congestion so that thy may be able to formulate policies with respect to easing this congestion. According to Rohan Samarajiva, founding chair of LIRNEasia, "This allows for real-time insights on the geo-spatial distribution of population, which may be used by urban planners to create more efficient traffic management systems."[65] This can also be used for the developmental economic policies. For example, the northern region of Colombo, a region inhabited by the low income families shows a lower population density on weekdays. This is reflective of the large numbers travelling to southern Colombo for employment. [66]Similarly, patterns of land use can be ascertained by analyzing the various loading patterns of base stations. Building on the success of the Mobile Data analysis project in SriLanka LIRNEasia plans to collaborate with partners in India and Bangladesh to assimilate real time information about the behavioral tendencies of citizens, using which policy makers may be able to make informed decisions. When this data is combined with user friendly virtual platforms such as smartphone Apps or web portals, it can also help citizens make informed choices about their day to day activities and potentially beneficial long term decisions. [67]
Challenges of using Mobile Network Data
Mobile networks invest significant sums of money in obtaining information regarding usage patterns of their services. Consequently, they may use this data to develop location based advertizing. In this context, there is a greater reluctance to share data for public purposes. Allowing access to one operator's big data by another could result in significant implications on the other with respect to the competitive advantage shared by the operator. A plausible solution to this conundrum is the accumulation of data from multiple sources without separating or organizing it according to the source it originates from. There is thus a lesser chance of sensitive information of one company being used by another. However, even operators do have concerns about how the data would be handled before this "mashing up" occurs and whether it might be leaked by the research organization itself. LIRNEasia used comprehensive non-disclosure agreements to ensure that the researchers who worked with the data were aware of the substantial financial penalties that may be imposed on them for data breaches. The access to the data was also restricted. [68]
Another line of argumentation advocates for the open sharing of data. A recent article in the Economist has articulated this in the context of the Ebola outbreak in West Africa. " Releasing the data, though, is not just a matter for firms since people's privacy is involved. It requires governmental action as well. Regulators in each affected country would have to order operators to make their records accessible to selected researchers, who through legal agreements would only be allowed to use the data in a specific manner. For example, Orange, a major mobile phone network operator has made millions of CDRs from Senegal and The Ivory Coast available for researchers for their use under its Data Development Initiative. However the Political will amongst regulators and Network operators to do this seems to be lacking."[69]
It would therefore be beneficial for companies to collaborate with the customers who create the data and the researchers who want to use it to extract important insights. This however would require the creation of and subsequent adherence to self regulatory codes of conduct. [70] In addition to this cooperation between network operators will assist in facilitating the transference of the data of their customers to research organizations. Sri Lanka is an outstanding example of this model of cooperation which has enabled various operators across spectrums to participate in the mobile-money enterprise.[71]
(ii) Big Data and Government Delivery of Services and Functions
The analysis of Data procured in real time has proven to be integral to the formulation of policies, plans and executive decisions. Especially in an Asian context, Big data can be instrumental in urban development, planning and the allocation of resources in a manner that allows the government to keep up with the rapidly growing demands of an empowered population whose numbers are on an exponential rise. Researchers have been able to use data from mobile networks to engage in effective planning and management of infrastructure, services and resources. If, for example, a particular road or highway has been blocked for a particular period of time an alternative route is established before traffic can begin to build up creating a congestion, simply through an analysis of information collected from traffic lights, mobile networks and GPS systems.[72]
There is also an emerging trend of using big data for state controlled services such as the military. The South Korean Defense Minister Han Min Koo, in his recent briefing to President Park Geun-hye reflected on the importance of innovative technologies such as Big Data solutions. [73]
The Chinese government has expressed concerns regarding data breaches and information leakages that would be extremely dangerous given the exceeding reliance of governments on big data. A security report undertaken by Qihoo 360, China's largest software security provider established that 2,424 of the 17,875 Web security loopholes were on government websites. Considering the blurring line between government websites and external networks, it has become all the more essential for authorities to boost their cyber security protections.[74]
The Japanese government has considered investing resources in training more data scientists who may be able to analyze the raw data obtained from various sources and utilize requisite techniques to develop an accurate analysis. The Internal Affairs and Communication Ministry planned to launch a free online course on big data, the target of which would be corporate workers as well as government officials.[75]
Data analytics is emerging as an efficient technique of monitoring the public transport management systems within Singapore. A recent collaboration between IBM, StarHub, The Land Transport Authority and SMRT initiated a research study to observe the movement of commuters across regions. [76] This has been instrumental in revamping the data collection systems already in place and has allowed for the procurement of additional systems of monitoring.[77] The idea is essentially to institute a "black box" of information for every operational unit that allows for the relaying of real-time information from sources as varied as power switches, tunnel sensors and the wheels, through assessing patterns of noise and vibration. [78]
In addition to this there are numerous projects in place that seek to utilize Big Data to improve city life. According to Carlo Ritti, Director of the MIT Senseable City Lab, "We are now able to analyze the pulse of a city from moment to moment. Over the past decade, digital technologies have begun to blanket our cities, forming the backbone of a large, intelligent infrastructure." [79] The professor of Information Architecture and Founding Director of the Singapore ETH Centre, Gerhart Schmitt has observed that "the local weather has a major impact on the behavior of a population." In this respect the centre is engaged in developing a range of visual platforms to inform citizens on factors such as air quality which would enable individuals to make everyday choices such as what route to take when planning a walk or predict a traffic jam. [80] Schmitt's team has also been able to arrive at a pattern that connects the demand for taxis with the city's climate. The amalgamation of taxi location with rainfall data has been able to help locals hail taxis during a storm. This form of data can be used in multiple ways allowing the visualization of temperature hotspots based on a "heat island" effect where buildings, cars and cooling units cause a rise in temperature. [81]
Microsoft has recently entered into a partnership with the Federal University of Minas Gerais, one of the largest universities in Brazil to undertake a research project that could potentially predict traffic jams up to an hour in advance. [82] The project attempts to analyze information from transport departments, road traffic cameras and drivers social network profiles to identify patterns that they could use to help predict traffic jams approximately 15 to 60 minutes before they actually happen.[83]
In anticipation of the increasing demand for professionals with requisite training in data sciences, the Malaysian Government has planned to increase the number of local data scientists from the present 80 to 1500 by 2020, through the support of the universities within the country.
IV. Big Data and the Private Sector in the Global South
Essential considerations in the operations of Big Data in the Private sector in the Asia Pacific region have been extracted by a comprehensive survey carried out by the Economist Intelligence Unit.[84] Over 500 executives across the Asia Pacific region were surveyed, from across industries representing a diverse range of functions. 69% of these companies had an annual turnover of over US $500m. The respondents were senior managers responsible for taking key decisions with regard to investment strategies and the utilization of big data for the same.
The results of the Survey conclusively determined that firms in the Asia Pacific region have had limited success with implementing Big Data Practices. A third of the respondents claimed to have an advanced knowledge of the utilization of big data while more than half claim to have made limited progress in this regard. Only 9% of the Firms surveyed cited internal barriers to implementing big data practices. This included a significant difficulty in enabling the sharing of information across boundaries. Approximately 40% of the respondents surveyed claimed they were unaware of big data strategies, even if they had in fact been in place simply because these had been poorly communicated to them. Almost half of the firms however believed that big data plays an important role in the success of the firm and that it can contribute to increasing revenue by 25% or more.
Numerous obstacles in the adoption of big data were cited by the respondents. These include the lack of suitable software to interpret the data and the lack of in-house skills to analyze the data appropriately. In addition to this, the lack of willingness on the part of various departments to share their data for the fear of a breach or leak was thought to be a major hindrance. This combined with a lack of communication between the various departments and exceedingly complicated reports that cannot be analyzed given the limited resources and lack of human capital qualified enough to carry out such an analysis, has resulted in an indefinite postponement of any policy propounding the adoption of big data practices.
Over 59% of the firms surveyed agreed that collaboration is integral to innovation and that information silos are a huge hindrance within a knowledge based economy. There is also a direct correlation between the size of the company and its progress in adopting big data, with larger firms adopting comprehensive strategies more frequently than smaller ones. A major reason for this is that large firms with substantially greater resources are able to actualize the benefits of big data analytics more efficiently than firms with smaller revenues. These businesses which have advanced policies in place outlining their strategies with respect to their reliance on big data are also more likely to communicate these strategies to their employees to ensure greater clarity in the process.
The use of big data was recently voted as the "best management practice" of the past year according to a cumulative ranking published by Chief Executive China Magazine, a Trade journal published by Global Sources on 13th January, 2015 in Beijing. The major benefit cited was the real-time information sourced from customers, which allows for direct feedback from clients when making decisions regarding changes in products or services. [85]
A significant contributor to the lack of adequate usage of data analytics is the belief that a PhD is a prerequisite for entering the field of data science. This misconception was pointed out by Richard Jones, vice president of Cloudera in the Australia, New Zealand and the Asean region. Cloudera provides businesses with the requisite professional services that they may need to effectively utilize Big Data. This includes a combination of the necessary manpower, technology and consultancy services.[86] Deepak Ramanathan, the chief technology officer, SAS Asia Pacific believes that this skill gap can be addressed by forming data science teams within both governments and private enterprises. These teams could comprise of members with statistical, coding and business skills and allow them to work in a collaborative manner to address the problem at hand.[87] SAS is an Enterprise Software Giant that creates tools tailored to suit business users to help them interpret big data. Eddie Toh, the planning and marketing manager of Intel's data center platform believes that businesses do not necessarily need data scientists to be able to use big data analytics to their benefit and can in fact outsource the technical aspects of the interpretation of this data as and when required.[88]
The analytical team at Dell has forged a partnership with Brazilian Public Universities to facilitate the development of a local talent pool in the field of data analytics. The Instituto of Data Science (IDS) will provide training methodologies for in person or web based classes. [89] The project is being undertaken by StatSoft, a subsidiary of Dell that was acquired by the technology giant last year. [90]
V. Conclusion
There have emerged numerous challenges in the analysis and interpretation of Big Data. While it presents an extremely engaging opportunity, which has the potential to transform the lives of millions of individuals, inform the private sector and influence government, the actualization of this potential requires the creation of a sustainable foundational framework ; one that is able to mitigate the various challenges that present themselves in this context.
A colossal increase in the rate of digitization has resulted in an unprecedented increment in the amount of Big Data available, especially through the rapid diffusion cellular technology. The importance of mobile phones as a significant source of data, especially in low income demographics cannot be overstated. This can be used to understand the needs and behaviors of large populations, providing an in depth insight into the relevant context within which valuable assessments as to the competencies, suitability and feasibilities of various policy mechanisms and legal instruments can be made. However, this explosion of data does have a lasting impact on how individuals and organizations interact with each other, which might not always be reflected in the interpretation of raw data without a contextual understanding of the demographic. It is therefore vital to employ the appropriate expertise in assessing and interpreting this data. The significant lack of a human resource to capital to analyze this information in an accurate manner poses a definite challenge to its effective utilization in the Global South.
The legal and technological implications of using Big Data are best conceptualized within the deliberations on protecting the privacy of the contributors to this data. The primary producers of this information, from across platforms, are often unaware that they are in fact consenting to the subsequent use of the data for purposes other than what was intended. For example people routinely accept terms and conditions of popular applications without understanding where or how the data that they inadvertently provide will be used.[91] This is especially true of media generated on social networks that are increasingly being made available on more accessible platforms such as mobile phones and tablets. Privacy has and always will remain an integral pillar of democracy. It is therefore essential that policy makers and legislators respond effectively to possible compromises of privacy in the collection and interpretation of this data through the institution of adequate safeguards in this respect.
Another challenge that has emerged is the access and sharing of this data. Private corporations have been reluctant to share this data due to concerns about potential competitors being able to access and utilize the same. In addition to this, legal considerations also prevent the sharing of data collected from their customers or users of their services. The various technical challenges in storing and interpreting this data adequately also prove to be significant impediments in the collection of data. It is therefore important that adequate legal agreements be formulated in order to facilitate a reliable access to streams of data as well as access to data storage facilities to accommodate for retrospective analysis and interpretation.
In order for the use of Big Data to gain traction, it is important that these challenges are addressed in an efficient manner with durable and self-sustaining mechanisms of resolving significant obstructions. The debates and deliberations shaping the articulation of privacy concerns and access to such data must be supported with adequate tools and mechanisms to ensure a system of "privacy-preserving analysis." The UN Global Pulse has put forth the concept of data philanthropy to attempt to resolve these issues, wherein " corporations [would] take the initiative to anonymize (strip out all personal information) their data sets and provide this data to social innovators to mine the data for insights, patterns and trends in realtime or near realtime."[92]
The concept of data philanthropy highlights particular challenges and avenues that may be considered for future deliberations that may result in specific refinements to the process.
One of the primary uses of Big Data, especially in developing countries is to address important developmental issues such as the availability of clean water, food security, human health and the conservation of natural resources. Effective Disaster management has also emerged as one of the key functions of Big Data. It therefore becomes all the more important for organizations to assess the information supply chains pertaining to specific data sources in order to identify and prioritize the issues of data management. [93] Data emerging from different contexts, across different sources may appear in varied compositions and would differ significantly across economic demographics. The Big Data generated from certain contexts would be inefficient due to the unavailability of data within certain regions and the resulting studies affecting policy decisions should take into account this discrepancy. This data unavailability has resulted in a digital divide which is especially prevalent in the global south. [94]
Appropriate analysis of the Big Data generated would provide a valuable insight into the key areas and inform policy makers with respect to important decisions. However, it is necessary to ensure that the quality of this data meets a specific standard and appropriate methodological processes have been undertaken to interpret and analyze this data. The government is a key actor that can shape the ecosystem surrounding the generation, analysis and interpretation of big data. It is therefore essential that governments of countries across the global south recognize the need to collaborate with civic organizations as well technical experts in order to create appropriate legal frameworks for the effective utilization of this data.
[1] Onella, Jukka- Pekka. "Social Networks and Collective Human Behavior." UN Global Pulse. 10 Nov.2011. <http://www.unglobalpulse.org/node/14539>
[2] http://www.business2community.com/big-data/evaluating-big-data-predictive-analytics-01277835
[3] Ibid
[4] http://unglobalpulse.org/sites/default/files/BigDataforDevelopment-UNGlobalPulseJune2012.pdf
[5] Ibid, p.13, pp.5
[6] Kirkpatrick, Robert. "Digital Smoke Signals." UN Global Pulse. 21 Apr. 2011. <http://www.unglobalpulse.org/blog/digital-smoke-signals>
[7] Helbing, Dirk , and Stefano Balietti. "From Social Data Mining to Forecasting Socio-Economic Crises." Arxiv (2011) 1-66. 26 Jul 2011 http://arxiv.org/pdf/1012.0178v5.pdf.
[8] Manyika, James, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh andAngela H. Byers. "Big data: The next frontier for innovation, competition, and productivity." McKinsey
Global Institute (2011): 1-137. May 2011.
[9] "World Population Prospects, the 2010 Revision." United Nations Development Programme. <http://esa.un.org/unpd/wpp/unpp/panel_population.htm>
[10] Mobile phone penetration, measured by Google, from the number of mobile phones per 100 habitants, was 96% in Botswana, 63% in Ghana, 66% in Mauritania, 49% in Kenya, 47% in Nigeria, 44% in Angola, 40% in Tanzania (Source: Google Fusion Tables)
[11] http://www.brookings.edu/blogs/africa-in-focus/posts/2015/04/23-big-data-mobile-phone-highway-sy
[12] Ibid
[13] <http://www.google.com/fusiontables/Home/>
[14] "Global Internet Usage by 2015 [Infographic]." Alltop. <http://holykaw.alltop.com/global-internetusage-by-2015-infographic?tu3=1>
[15] Kirkpatrick, Robert. "Digital Smoke Signals." UN Global Pulse. 21 Apr. 2011 <http://www.unglobalpulse.org/blog/digital-smoke-signals>
[16] Ibid
[17] Ibid
[18] Ibid
[19] Goetz, Thomas. "Harnessing the Power of Feedback Loops." Wired.com. Conde Nast Digital, 19 June 2011. <http://www.wired.com/magazine/2011/06/ff_feedbackloop/all/1>.
[20] Kirkpatrick, Robert. "Digital Smoke Signals." UN Global Pulse. 21 Apr. 2011. <http://www.unglobalpulse.org/blog/digital-smoke-signals>
[21] Bollier, David. The Promise and Peril of Big Data. The Aspen Institute, 2010. <http://www.aspeninstitute.org/publications/promise-peril-big-data>
[22] Ibid
[23] Eagle, Nathan and Alex (Sandy) Pentland. "Reality Mining: Sensing Complex Social Systems",Personal and Ubiquitous Computing, 10.4 (2006): 255-268.
[24] Kirkpatrick, Robert. "Digital Smoke Signals." UN Global Pulse. 21 Apr. 2011. <http://www.unglobalpulse.org/blog/digital-smoke-signals>
[25] OECD, Future Global Shocks, Improving Risk Governance, 2011
[26] "Economy: Global Shocks to Become More Frequent, Says OECD." Organisation for Economic Cooperationand Development. 27 June. 2011.
[27] Friedman, Jed, and Norbert Schady. How Many More Infants Are Likely to Die in Africa as a Result of the Global Financial Crisis? Rep. The World Bank <http://siteresources.worldbank.org/INTAFRICA/Resources/AfricaIMR_FriedmanSchady_060209.pdf>
[28] Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute,June 2011<http://www.mckinsey.com/mgi/publications/big_data/pdfs/MGI_big_data_full_report.pdf>
[29] The word "crowdsourcing" refers to the use of non-official actors ("the crowd") as (free) sources of information, knowledge and services, in reference and opposition to the commercial practice of
outsourcing. "
[30] Burke, J., D. Estrin, M. Hansen, A. Parker, N. Ramanthan, S. Reddy and M.B. Srivastava. ParticipatorySensing. Rep. Escholarship, University of California, 2006. <http://escholarship.org/uc/item/19h777qd>.
[31] "Crisis Mappers Net-The international Network of Crisis Mappers." <http://crisismappers.net>, http://haiti.ushahidi.com and Goldman et al., 2009
[32] Alex Pentland cited in "When There's No Such Thing As Too Much Information". The New York Times.23 Apr. 2011<http://www.nytimes.com/2011/04/24/business/24unboxed.html?_r=1&src=tptw>.
[33] Nathan Eagle also cited in "When There's No Such Thing As Too Much Information". The New YorkTimes. 23 Apr. 2011. <http://www.nytimes.com/2011/04/24/business/24unboxed.html?_r=1&src=tptw>.
[34] Helbing and Balietti. "From Social Data Mining to Forecasting Socio-Economic Crisis."
[35] Eysenbach G. Infodemiology: tracking flu-related searches on the Web for syndromic surveillance.AMIA (2006)<http://yi.com/home/EysenbachGunther/publications/2006/eysenbach2006cinfodemiologyamia proc.pdf>
[36] Syndromic Surveillance (SS)." Centers for Disease Control and Prevention. 06 Mar. 2012.<http://www.cdc.gov/ehrmeaningfuluse/Syndromic.html>.
[37] Health Map <http://healthmap.org/en/>
[38] see www.detective.io
[39] www.ushahidi.com
[41] Ushahidi is a nonprofit tech company that was developed to map reports of violence in Kenya followingthe 2007 post-election fallout. Ushahidi specializes in developing "free and open source software for
information collection, visualization and interactive mapping." <http://ushahidi.com>
[42] Conducted by the European Commission's Joint Research Center against data on damaged buildingscollected by the World Bank and the UN from satellite images through spatial statistical techniques.
[43] www.ushahidi.com
[44] See https://tacticaltech.org/
[45] see www. flowminder.org
[46] Ibid
[48] http://allafrica.com/stories/201507151726.html
[49] Ibid
[50] Ibid
[51] http://www.computerworld.com/article/2948226/big-data/opinion-apple-and-ibm-have-big-data-plans-for-education.html
[52] Ibid
[53] http://www.grameenfoundation.org/where-we-work/sub-saharan-africa/uganda
[54] Ibid
[55] http://chequeado.com/
[56] http://datochq.chequeado.com/
[57] Times of India (2015): "Chandigarh May Become India's First Smart City," 12 January, http://timesofi ndia.indiatimes.com/india/Chandigarh- may-become-Indias-fi rst-smart-city/articleshow/ 45857738.cms
[58] http://www.cisco.com/web/strategy/docs/scc/ioe_citizen_svcs_white_paper_idc_2013.pdf
[59] Townsend, Anthony M (2013): Smart Cities: Big Data, Civic Hackers and the Quest for a New Utopia, New York: WW Norton.
[60] See "Street Bump: Help Improve Your Streets" on Boston's mobile app to collect data on roadconditions, http://www.cityofboston.gov/DoIT/ apps/streetbump.asp
[61] Mayer-Schonberger, V and K Cukier (2013): Big Data: A Revolution That Will Transform How We Live, Work, and Think, London: John Murray.
[62] http://www.epw.in/review-urban-affairs/big-data-improve-urban-planning.html
[63] Ibid
[64] Newman, M E J and M Girvan (2004): "Finding and Evaluating Community Structure in Networks,"Physical Review E, American Physical Society, Vol 69, No 2.
[65] http://www.sundaytimes.lk/150412/sunday-times-2/big-data-can-make-south-asian-cities-smarter-144237.html
[66] Ibid
[67] Ibid
[68] http://www.epw.in/review-urban-affairs/big-data-improve-urban-planning.html
[69] GSMA (2014): "GSMA Guidelines on Use of Mobile Data for Responding to Ebola," October, http:// www.gsma.com/mobilefordevelopment/wpcontent/ uploads/2014/11/GSMA-Guidelineson-
protecting-privacy-in-the-use-of-mobilephone- data-for-responding-to-the-Ebola-outbreak-_ October-2014.pdf
[70] An example of the early-stage development of a self-regulatory code may be found at http:// lirneasia.net/2014/08/what-does-big-data-sayabout- sri-lanka/
[71] See "Sri Lanka's Mobile Money Collaboration Recognized at MWC 2015," http://lirneasia. net/2015/03/sri-lankas-mobile-money-colloboration- recognized-at-mwc-2015/
[72] http://www.thedailystar.net/big-data-for-urban-planning-57593
[73] http://koreaherald.com , 19/01/2015
[74] http://www.news.cn/, 25/11/2014
[75] http://the-japan-news.com , 20/01/2015
[76] http://www.todayonline.com/singapore/can-big-data-help-tackle-mrt-woes
[77] Ibid
[78] Ibid
[79] http://edition.cnn.com/2015/06/24/tech/big-data-urban-life-singapore/
[80] Ibid
[81] Ibid
[82] http://venturebeat.com/2015/04/03/how-microsofts-using-big-data-to-predict-traffic-jams-up-to-an-hour-in-advance/
[83] Ibid
[84] https://www.hds.com/assets/pdf/the-hype-and-the-hope-summary.pdf
[85] http://www.news.cn , 14/01/2015
[86] http://www.techgoondu.com/2015/06/29/plugging-the-big-data-skills-gap/
[87] Ibid
[88] Ibid
[89] http://www.zdnet.com/article/dell-to-create-big-data-skills-in-brazil/
[90] Ibid
[91] Efrati, Amir. "'Like' Button Follows Web Users." The Wall Street Journal. 18 May 2011.
<http://online.wsj.com/article/SB10001424052748704281504576329441432995616.html>
[92] Krikpatrick, Robert. "Data Philanthropy: Public and Private Sector Data Sharing for Global Resilience."
UN Global Pulse. 16 Sept. 2011. <http://www.unglobalpulse.org/blog/data-philanthropy-public-privatesector-data-sharing-global-resilience>
[93] Laney D (2001) 3D data management: Controlling data volume, velocity and variety. Available at: http://blogs. gartner.com/doug-laney/files/2012/01/ad949-3D-DataManagement-Controlling-Data-Volume-Velocity-andVariety.pdf
[94] Boyd D and Crawford K (2012) Critical questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication, & Society 15(5): 662-679.