Ethical Issues in Open Data
On August 1, 2013, I took part in a web meeting, organized and hosted by Tim Davies of the World Wide Web foundation. The meeting, titled “Ethical issues in Open Data,” had an agenda focused around privacy considerations in the context of the open data movement.
The main panelists, Carly Nyst and Sam Smith from Privacy International, as well as Steve Song from the International Development Research Centre, were joined by roughly a dozen other privacy and development researchers from around the globe in the hour long session.
The primary issue of the meeting was the concern over modern capabilities of cross-analytics for de-anonymizing data sets and revealing personally identifiable information (PII) in open data. Open data can constitute publicly available information such as budgets, infrastructures, and population statistics, as long as the data meets the three open data characteristics: accessibility, machine readability, and availability for re-use. “Historically,” said Tim Davies, “public registers have been protected through obscurity.” However, both the capabilities of data analysts and the definition of personal data have continued to expand in recent years. This concern thus presents a conflict between researchers who advocate governments releasing open data reports, and researchers who emphasize privacy in the developing world.
Steve Song, advisor to IDRC Information & Networks program, spoke of the potential collateral damage that comes with publishing more and more types of information. Song addressed the imperative of the meeting in saying, “privacy needs to be a core part of open data conversation.” In his presentation, he gave a particularly interesting example of the tensions between public and private information implications. Following the infamous 2012 school shooting in Newtown, Connecticut, the information on Newtown’s gun permit owning citizens (made publicly available through America’s Freedom of Information Act) was aggregated into an interactive map which revealed the citizens’ addresses. This obviously became problematic for the Newtown community, as the map not only singled out homes which exercised their right to bear arms but also indirectly revealed which homes were without firearm protection and thereby more vulnerable to theft and crime. The Newtown example clearly demonstrates the relationship (and conflict) between open data and privacy; it resolves to the conflict between the right to information and the right to privacy.
An apparent issue surrounding open data is its perceived binary nature. Many advocates either view data as being open, or not; any intermediary boundaries are only forms of governments limiting data accessibility. Therefore, a point raised by meeting attendee Raed Sharif aptly presented an open data counter-argument. Sarif noted how, inversely, privacy conceptions may form a threat to open data. He mentioned how governments could take advantage of privacy arguments to justify their refusal to publish open reports.
However, Carly Nyst summarized the privacy concern and argument in her remarks near the end of the meeting. Namely, she reasoned that the open data mission is viable, if only limited to generic data, i.e., data about infrastructure, or other information that is in no way personal. Doing so will avoid obstructions of individual privacy. Until more advanced anonymization techniques can be achieved, which can overcome modern re-identification methods, publicly publishing PII may prove too risky. It was generally agreed upon during the meeting that open data is not inherently bad, and in fact its analysis and availability can be beneficial, but the threat of its misuse makes it dangerous. For the future of open data, researchers and advocates should perhaps consider more nuanced approaches to the concept in order to respect considerations for other ethical issues, such as privacy.