Centre for Internet & Society
Privacy in the Age of Big Data

The sum of these interconnected parts will lead to a complete loss of anonymity, greater surveillance and impact free speech and individual choice.

Personal data is freely accessible, shared and even sold, and those to whom this information belongs have little control over its flow.

The article was published in the Asian Age on April 10, 2017.


In 2011 it was estimated that the quantity of data produced globally surpassed 1.8 zettabyte. By 2013, it had increased to 4 zettabytes. This is a result of digital services which involve constant data trails left behind by human activity. This expansion in the volume, velocity, and variety of data available, together with the development of innovative forms of statistical analytics on the data collected, is generally referred to as “Big Data”. Despite significant (though largely unrealised) promises about Big Data, which range from improved decision-making, increased efficiency and productivity to greater personalisation of services, concerns remain about the impact of such datafication of all human activity on an individual’s privacy. Privacy has evolved into a sweeping concept, including within its scope matters pertaining to control over one’s body, physical space in one’s home, protection from surveillance, and from search and seizure, protection of one’s reputation as well as one’s thoughts. This generalised and vague conception of privacy not only comes with great judicial discretion, it also thwarts a fair understanding of the subject. Robert Post called privacy a concept so complex and “entangled in competing and contradictory dimensions, so engorged with various and distinct meanings”, that he sometimes “despairs whether it can be usefully addressed at all”.

This also leaves the idea of privacy vulnerable to considerable suspicion and ridicule. However, while there is a lack of clarity over the exact contours of what constitutes privacy, there is general agreement over its fundamental importance to our ability to lead whole lives. In order to understand the impact of datafied societies on privacy, it is important to first delve into the manner in which we exercise our privacy. The ideas of privacy and data management that are prevalent can be traced to the Fair Information Practice Principles (FIPP). These principles are the forerunners of most privacy regimes internationally, such as the OECD Privacy Guidelines, APEC Framework, or the nine National Privacy Principles articulated by the Justice A.P. Shah Committee Report. All of these frameworks have rights to notice, consent and correction, and how the data may be used, as their fundamental principles. It makes the data subject to the decision-making agent about where and when her/his personal data may be used, by whom, and in what way. The individual needs to be notified and his consent obtained before his personal data is used. If the scope of usage extends beyond what he has agreed to, his consent will be required for the increased scope.

In theory, this system sounds fair. Privacy is a value tied to the personal liberty and dignity of an individual. It is only appropriate that the individual should be the one holding the reins and taking the large decisions about the use of his personal data. This makes the individual empowered and allows him to weigh his own interests in exercising his consent. The allure of this paradigm is that in one elegant stroke, it seeks to ensure that consent is informed and free and also to implement an acceptable trade-off between privacy and competing concerns. This approach worked well when the number of data collectors were less and the uses of data was narrower and more defined. Today’s infinitely complex and labyrinthine data ecosystem is beyond the comprehension of most ordinary users. Despite a growing willingness to share information online, most people have no understanding of what happens to their data.

The quantity of data being generated is expanding at an exponential rate. From smartphones and televisions, trains and airplanes, sensor-equipped buildings and even the infrastructures of our cities, data now streams constantly from almost every sector and function of daily life, “creating countless new digital puddles, lakes, tributaries and oceans of information”. The inadequacy of the regulatory approaches and the absence of a comprehensive data protection regulation is exacerbated by the emergence of data-driven business models in the private sector and the adoption of data-driven governance approach by the government. The Aadhaar project, with over a billion registrants, is intended to act as a platform for a number of digital services, all of which produce enormous troves of data. The original press release by the Central Government reporting the approval by the Cabinet of Ministers of the Digital India programme, speaks of “cradle to grave” digital identity as one of its vision areas.

While the very idea of the government wanting to track its citizens’ lives from cradle to grave is creepy enough in itself, let us examine for a minute what this form of datafied surveillance will entail. A host of schemes under Digital India shall collect and store information through the life cycle of an individual. The result, as we can see, is building databases on individuals, which when combined, will provide a 360 degree view into the lives of individuals. Alongside the emergence of India Stack, a set of APIs built on top of the Aadhaar, conceptualised by iSPIRT, a consortium of select IT companies from India, to be deployed and managed by several agencies, including the National Payments Corporation of India, promises to provide a platform over which different private players can build their applications.

The sum of these interconnected parts will lead to a complete loss of anonymity, greater surveillance and impact free speech and individual choice. The move towards a cashless economy — with sharp nudges from the government — could lead to lack of financial agencies in case of technological failures as has been the case in experiments with digital payments in Africa. Lack of regulation in emerging data driven sectors such as Fintech can enable predatory practices where right to remotely deny financial services can be granted to private sector companies. An architecture such as IndiaStack enables datafication of financial transactions in a way that enables linked and structured data that allows continued use of the transaction data collected. It is important to recognise that at the stage of giving consent, there are too many unknowns for us to make informed decisions about the future uses of our personal data. Despite blanket approvals allowing any kind of use granted contractually through terms of use and privacy policies, there should be legal obligations overriding this consent for certain kinds of uses that may require renewed consent.

Biometrics-based identification in UK: In 2005, researchers from London School of Economics and Political Science came out with a detailed report on the UK Identity Cards Bill (‘UK Bill’) — the proposed legislation for a national identification system based on biometrics. The project also envisaged a centralised database (like India) that would store personal information along with the entire transaction history of every individual. The report pointed strongly against the centralising storage of information and suggested other alternatives such as a system based on smartcards (where biometrics are stored on the card itself) or offline biometric-reader terminals.

As per the report, the alternatives would also have been cheaper as neither required real-time online connectivity. In India, online authentication is a far greater challenge. According to Network Readiness Index, 2016, India ranks 91, whereas UK is placed eight. Poor Internet connectivity can raise a lot of problems in the future including paralysis of transactions. The UK identification project was subsequently discarded as a result of the privacy and cost considerations raised in this report.

Aadhaar: Privacy concerns

  1. Once the data is collected through National Information Utilities, it will be privatised and controlled by private utilities.
  2. Once an individual’s data is entered in the system, it cannot be deleted. That individual will have no control over it.
  3. Aadhaar Data (Demographic details along with photographs) are shared/transferred with the private entities including telecom companies as per the Aadhaar (Targeted delivery of Financial and other subsidies, benefits and services) Act, 2016 with the consent of Aadhaar number holder to fulfil their e-KYC requirements. The data is shared in encrypted form through secured channel.
  4. Aadhaar Enabled Payment System (AEPS) on which 119 banks are live.
  5. More than 33.87 crore transactions have taken place through AEPS, which was only 46 lakhs in May 2014.
  6. As on 30-9-2016, 78 government schemes were linked to Aadhaar.
  7. The Aadhaar (Targeted Delivery of Financial and Other Subsidies, Benefits and Services) Act, 2016, provides that no core-biometric information (fingerprints, iris scan) shall be shared with anyone for any reason whatsoever (Sec 29) and that the biometric information shall not be used for any purpose other than generation of Aadhaar and authentication.
  8. Access to the data repository of UIDAI, called the Central Identities Data Repository(CIDR), is provided to third parties or private companies.

Central Monitoring System (CMS) is already live in Delhi, New Delhi and Mumbai. Union minister Ravi Shankar Prasad revealed this in one of his replies in the Lok Sabha last year. CMS has been set up to automate the process of Lawful Interception & Monitoring of telecommunications.

Central Monitoring System (CMS) is already live in Delhi, New Delhi and Mumbai. Union minister Ravi Shankar Prasad revealed this in one of his replies in the Lok Sabha last year. CMS has been set up to automate the process of Lawful Interception & Monitoring of telecommunications.

Lawful Intercept and Monitoring (LIM) systems are used by the Indian Government to intercept records of voice, SMSes, GPRS data, details of a subscriber’s application and recharge history and call detail record (CDR) and monitor Internet traffic, emails, web-browsing, Skype and any other Internet activity of Indian users.

The views and opinions expressed on this page are those of their individual authors. Unless the opposite is explicitly stated, or unless the opposite may be reasonably inferred, CIS does not subscribe to these views and opinions which belong to their individual authors. CIS does not accept any responsibility, legal or otherwise, for the views and opinions of these individual authors. For an official statement from CIS on a particular issue, please contact us directly.