Big Data and Positive Social Change in the Developing World: A White Paper for Practitioners and Researchers
I was a part of a working group writing a white paper on big data and social change, over the last six months. This white paper was produced by a group of activists, researchers and data experts who met at the Rockefeller Foundation’s Bellagio Centre to discuss the question of whether, and how, big data is becoming a resource for positive social change in low- and middle-income countries (LMICs).
Bellagio Big Data Workshop Participants. (2014). “Big data and positive social change in the developing world: A white paper for practitioners and researchers.” Oxford: Oxford Internet Institute. Available online: http://ssrn.com/abstract=2491555.
Our working definition of big data includes, but is not limited to, sources such as social media, mobile phone use, digitally mediated transactions, the online news media, and administrative records. It can be categorised as data that is provided explicitly (e.g. social media feedback); data that is observed (e.g. mobile phone call records); and data that is inferred and derived by algorithms (for example social network structure or inflation rates). We defined four main areas where big data has potential for those interested in promoting positive social change: advocating and facilitating; describing and predicting; facilitating information exchange and promoting accountability and transparency.
In terms of advocating and facilitating, we discussed ways in which volunteered data may help organisations to open up new public spaces for discussion and awareness-building; how both aggregating data and working across different databases can be tools for building awareness, and howthe digital data commons can also configure new communities and actions (sometimes serendipitously) through data science and aggregation. Finally, we also looked at the problem of overexposure and howactivists and organisations can protect themselves and hide their digital footprints. The challenges we identified in this area were how to interpret data correctly when supplementary information may be lacking; organisational capacity constraints around processing and storing data, and issues around data dissemination, i.e. the possible negative consequences of inadvertently identifying groups or individuals.
Next, we looked at the way big data can help describe and predict, functions which are particularly important in the academic, development and humanitarian areas of work where researchers can combine data into new dynamic, high-resolution datasets to detect new correlations and surface new questions. With data such as mobile phone data and Twitter analytics, understanding the data’s comprehensiveness, meaning and bias are the main challenges, accompanied by the problem of developing new and more comprehensive ethical systems to protect data subjects where data is observed rather than volunteered.
The next group of activities discussed was facilitating information exchange. We looked at mobile-based information services, where it is possible for a platform created around a particular aim (e.g. agricultural knowledge-building) to incorporate multiple feedback loops which feed into both research and action. The pitfalls include the technical challenge of developing a platform which is lean yet multifaceted in terms of its uses, and particularly making it reliably available to low-income users. This kind of platform, addressed by big data analytics, also offers new insights through data discovery and allows the provider to steer service provision according to users’ revealed needs and priorities.
Our last category for big data use was accountability and transparency, where organisations are using crowdsourcing methods to aggregate and analyse information in real time to establish new spaces for critical discussion, awareness and action. Flows of digital information can be managed to prioritise participation and feedback, provide a safe space to engage with policy decisions and expose abuse. The main challenges are how to keep sensitive information (and informants) safe while also exposing data and making authorities accountable; how to make the work sustainable without selling data, and how to establish feedback loops so that users remain involved in the work beyond an initial posting. In the crowdsourcing context, new challenges are also arising in terms of how to verify and moderate real-time flows of information, and how to make this process itself transparent.
Finally, we also discussed the relationship between big and open data. Open data can be seen as a system of governance and a knowledge commons, whereas big data does not by its nature involve the idea of the commons, so we leaned toward the term ‘opening data’, i.e. processes which could apply to commercially generated as much as public-sector datasets. It is also important to understand where to prioritise opening, and where this may exclude people who are not using the ‘right’ technologies: for example, analogue methods (e.g. nailing a local authority budget to a town hall door every month) may be more open than ‘open’ digital data that’s available online.
Our discussion surfaced many questions to do with representation and meaning: must datasets be interpreted by people with local knowledge? For researchers to get access to data that is fully representative, do we need a data commons? How are data proprietors engaging with the power dynamics and inequalities in the research field, and how can civil society engage with the private sector on its own terms if data access is skewed towards elites? We also looked at issues of privacy and risk: do we need a contextual risk perspective rather than a single set of standards? What is the role of local knowledge in protecting data subjects, and what kinds of institutions and practices are necessary? We concluded that there is a case to be made for building a data commons for private/public data, and for setting up new and more appropriate ethical guidelines to deal with big data, since aggregating, linking and merging data present new kinds of privacy risk. In particular, organisations advocating for opening datasets must admit the limitations of anonymisation, which is currently being ascribed more power to protect data subjects than it merits in the era of big data.
Our analysis makes a strong case that it is time for civil society groups in particular to become part of the conversation about the power of data. These groups are the connectors between individuals and governments, corporations and governance institutions, and have the potential to promote big data analysis that is locally driven and rooted. Civil society groups are also crucially important but currently underrepresented in debates about privacy and the rights of technology users, and civil society as a whole has a responsibility for building critical awareness of the ways big data is being used to sort, categorise and intervene in LMICs by corporations, governments and other actors. Big data is shaping up to be one of the key battlefields of our era, incorporating many of the issues civil society activists worldwide have been working on for decades. We hope that this paper can inform organisations and
individuals as to where their particular interests may gain traction in the debate, and what their contribution may look like.
Click to download the full white paper here. (PDF, 1.95 Mb)