Centre for Internet and Society

IRC 22 - Proposed Session - #LockdownsAndShutdowns

Admin — 2022-05-19T15:05:42Z

Details of a session proposed for the Internet Researchers' Conference 2022 - #Home.

Internet Researchers' Conference 2022 - # Home - Call for Sessions

Session Type: Workshop or Collaborative Working Session

Session Plan

Internet shutdowns are a form of censorship which can have substantial economic and human rights implications. Despite the potential negative consequences, shutdowns are still used across the globe, and many social perspectives on shutdowns remain under-researched and poorly understood. For example, the relationship between internet shutdowns and one’s sense of safety and freedom at home. This connection is pertinent given the COVID19 pandemic and government recommendations to work from home, which emphasised the importance of the internet and the ability to connect with others freely. By connecting with others online, we create a sense of digital community. While many are spending more time at home, shutdowns continued despite the increasing need for online communication. This session aims to understand community perspectives surrounding shutdowns and other forms of censorship, specifically focusing on one’s “home”. Shutdowns are a common tool to curb forms of collective action (such as protests), and some public spaces have had reduced availability due to COVID19. Therefore, the importance of the internet in enabling social movements, like protests, cannot be understated. Thus, this session will touch upon many essential topics and encourage others to think about shutdowns and the increased importance of the internet in allowing social movements from within one’s home.

The session will last a total of 60 minutes. The first 5 minutes will provide an overview of the session’s structure and why this topic is important. We will then move into a semi-structured format consisting of 3 x 15-minute mini-sessions, with each mini-session touching upon a different question. Example questions may cover topics such as the unique role of the internet in enabling online social movements in times of a lockdown or if shutdowns during lockdowns merit a different moral threshold. The prompt questions will encourage interdisciplinary discussion so that participants from diverse backgrounds can make meaningful contributions. We envisage that this session will be organic and open in a large roundtable format. The last 10 minutes of the session will consist of an open-style discussion so that any remaining thoughts, opinions, and reflections from participants may be shared.

Session Team

Michael Collyer is an OTF Senior Fellow in Information Controls and a Doctoral Candidate at the University of Oxford. His research interests are information controls, Bayesian statistics, machine learning, and natural language processing.

Joss Wright is the Co-Director of the Oxford EPSRC Cybersecurity Doctoral Training Centre; Co-Director of the Oxford Martin Programme on the Illegal Wildlife Trade; and Senior Research Fellow at the Oxford Internet Institute. His work focuses on computational approaches to social science questions, with a particular focus on technologies that exert, resist, or subvert control over information.

Andreas Tsamados is a doctoral researcher at the Oxford Internet Institute focusing on human control over AI/ML applications within national security and defence. He is also developing the Algorithmic Resistance Cookbook, a guide to using data-driven tools and techniques to practice resistance against intrusive and repressive aspects of present-day algorithmic culture.

Marianne Díaz Hernández is a #KeepItOn Fellow at Access Now. Marianne is a Venezuelan lawyer, digital rights activist, and fiction writer, currently based in Santiago, Chile. Her work focuses mainly on issues regarding online freedom of speech, privacy, web filtering, internet infrastructure and digital security. She founded the digital rights NGO Acceso Libre, a volunteer-based organization that documents threats to human rights in the online environment in Venezuela. Before joining Access Now, Marianne worked as a public policy analyst for the Latin American NGO Derechos Digitales. She’s volunteered for Global Voices, particularly for the Advox project, since 2010. She has also published several fiction books, and co-founded the small press Casajena Editoras. In 2019, she was recognized with the “Human Rights Hero” award, granted by Access Now, for her “research and leading advocacy efforts against invasive measures taken by the Maduro government in Venezuela. She’s currently working towards a Master’s Degree in Narrative Writing at Alberto Hurtado University.

Nathan Dobson is a Postdoc at the Centre for Socio-Legal Studies, Oxford. He has a PhD in Anthropology from the University of California, Irvine. His current research is on internet shutdowns in relation to elections and violence in Africa. He has a background in African Studies and has worked at the University of Florida, USA, and the University of Birmingham, UK.

For more details visit https://cis-india.org/raw/irc22-proposed-session-lockdownsandshutdowns

IRC22 - Proposed Session - #DigitisingCrisesRemakingHome

Admin — 2022-04-25T12:23:42Z

Details of a session proposed for the Internet Researchers' Conference 2022- #Home.

Internet Researchers' Conference 2022 - # Home - Call for Sessions

Session Type: Panel Discussion

Session Plan

The session is planned as a panel discussion between three scholars on three distinct, interconnected notions of home – specifically the home as a dwelling unit, an administrative unit (such as a municipality, a city, or a state), and a country (or a nation state) in the context of India. We intend to parse these ideas within the context of the ongoing Covid-19 pandemic to discuss notions of ‘safety’, ‘trust’, ‘support’, and ‘access’ by examining the digital turn in all three kinds of ‘home’. The session will open with the scholars speaking to each other, and laying out the central ideas. The conversation between the three scholars will act as provocations to enable a larger discussion with other attendees.

In 2020, when the first Covid-19 lockdowns began, the internet was discussed as a space of solidarity, of meeting, entertainment, work, and of support. But soon it became evident that access to such spaces of solidarity or support was not necessarily equal. While for some it was almost non-existent, for many others it was limited or regulated. In the Indian context these differences only stood out further due to unequal access to infrastructure, healthcare, and even basic necessities such as food that was starkly apparent in the long march of several thousand migrant workers from cities back to their ‘homes’ in rural areas at the height of the Indian summer.

At the national level, the digital response to the pandemic was most palpable. The use of contact tracing through apps such as Aarogya Setu, the CoWin portal for vaccinations, and the often arbitrary use of drones, facial recognition, and artificial intelligence have raised questions about surveillance, inclusion, and how useful technology can be in assisting a public health crisis. Often such responses reflected a law and order response to what has been a public health crisis. On the other hand, the establishment of Vande Bharat missions to bring stranded Indians from around the world ‘back home to India’ presented a very different idea of home.

Administrative units at the state and local levels had differing procedures and interventions. Many attempted to follow the guidelines and interventions laid out by the central government, others introduced their own digital solutions but soon found that these were not enough to actually deliver governance during the pandemic.

This session will explore the ‘how’ and ‘why’ of the digital becoming the default mode of managing the pandemic–or any sort of threat. We ask if the idea of ‘home’ as a ‘safe space’ had ever really been so and whether the pandemic exacerbated existing exploitative mechanisms within a ‘home’ – be it the dwelling, the city, or even one’s country. We also intend to discuss issues of access, surveillance, privacy, vulnerability, the burdens of care-work, the exploitative extraction of data, and divergent understandings of consent frameworks within these three axes of the idea of the ‘home’.

Session Team

Vidya Subramanian is Raghunathan Family Fellow, South Asia Institute, Harvard University. She is an interdisciplinary scholar whose research interests lie at the intersection of technologies and societies. Her current research investigates the changing nature of citizenship in the technological society we now inhabit. Focusing on India, her research is loosely framed by two large issues: the first is the colonisation of the everyday so-called real world by the digital; and the second is how power permeates and is implicated in such technologies.

Kalindi Kokal is Post Doctoral Fellow, Centre for Policy Studies, IIT Bombay. She has a doctorate in law from the Martin Luther University, Halle-Wittenberg, Germany. Her doctoral work centred on understanding how non-state actors in dispute processing engage with state law. Her dissertation is an ethnographic study of dispute-processing mechanisms in two rural communities in the states of Maharashtra and Uttarakhand in India. She works on understanding how the manner in which people actually experience state law coupled with their perceptions of dispute resolution and state courts underscore the need to explore broader understandings of law and dispute resolution.

Uttara Purandare is PhD Researcher, IITB-Monash Research Academy. She is pursuing her PhD in Public Policy under a joint programme offered by IIT Bombay and Monash University. Her area of research is smart cities. Looking specifically at the intersection of technology, gender, and governance, Uttara’s research focuses on how safety and surveillance are constructed by the smart city rhetoric and the role of private sector firms in governing the smart city. The COVID-19 pandemic and the technologies that have been introduced by national governments and smart cities purportedly to curb the spread of the virus have raised interesting questions about privacy and citizens’ rights during a crisis. Uttara is presently exploring some of these questions within the Indian context.

For more details visit https://cis-india.org/raw/irc22-proposed-session-digitisingcrisesremakinghome

Global Civil Society Coalition launches website to promote Access to Knowledge

sinha — 2022-10-12T12:05:03Z

CIS is a part of a global civil society coalition that is working to promote access to, and use of, knowledge - the Access to Knowledge or A2K coalition.

Earlier this week, the coalition launched a website articulating its mission and recommendations to reform copyright systems for the benefit of education, research, and cultural heritage.

Copyright systems pose serious obstacles to quality teaching and learning, researchers’ ability to receive and impart information and to share in scientific advancement and its benefits, and preservation and access of cultural and scientific heritage. The website presents evidence and legal solutions, with a focus on the digital and online dimension to the issues. Three global maps also show the (limited) extent to which copyright limitations and exceptions across the world support online education, text and data mining, and preservation, highlighting the need for global legal eform.

The members of the A2K coalition represent a diverse set of voices such as educators, researchers, students, libraries, archives, museums, other knowledge users and creative communities around the globe. In Asia-pacific, we have ourselves and Open Access India as members presently. We invite organizations who share a similar vision of a fair and balanced copyright system to join the coalition.

For more details visit https://cis-india.org/a2k/blogs/global-civil-society-coalition-launches-website-to-promote-access-to-knowledge

Data Lives of Humanities Text

sneha-pp — 2020-12-23T13:07:43Z

The ‘computational turn’ in the humanities has brought with it several questions and challenges for traditional ways of engaging with the ‘text’ as an object of enquiry. The prevalence of data-driven scholarship in the humanities offers several challenges to traditional forms of work and practice, with regard to theory, tools, and methods. In the context of the digital, ‘text’ acquires new forms and meanings, especially with practices such as distant reading. Drawing upon excerpts from an earlier study on digital humanities in India, this essay discusses how data in the humanities is not a new phenomenon; concerns about the ‘datafication’ of humanities, now seen prominently in digital humanities and related fields is actually reflective of a longer conflict about the inherited separation between humanities and technology. It looks at how ‘data’ in the humanities has become a new object of enquiry as a result of several changes in the media landscape in the past few decades. These include large-scale digitalization and availability of corpora of materials (digitized and born-digital) in an array of formats and across varied platforms, thus leading to also a steady prevalence of the use of computational methods in working with and studying cultural artifacts today. This essay also explores how reading ‘text as data’ helps understand the role of data in the making of humanities texts and redefines traditional ideas of textuality, reading, and the reader.

This essay by Puthiya Purayil Sneha was published in Lives of Data: Essays on Computational Cultures from India (2020) edited by Sandeep Mertia, with a Foreword by Ravi Sundaram as part of the Series on Theory on Demand by Institute of Network Cultures, Amsterdam.

Read the open access book here.

For more details visit https://cis-india.org/raw/data-lives-of-humanities-text

Understanding the Data Gaps on Wikidata Concerning Heritage Structures of West Bengal

Bodhisattwa Mandal — 2021-05-15T12:31:40Z

This is a short study on identifying the data gaps related to heritage structures in West Bengal on Wikidata, and potential strategies to address the same. The report is authored by Bodhisattwa Mandal, with editorial oversight and support by Puthiya Purayil Sneha and external review by Sumandro Chattapadhyay. This is part of a series of short-term studies undertaken by the CIS-A2K team in 2019-2020.

Wikidata is a free and open repository of structured and linked data, hosted by the Wikimedia Foundation, built collaboratively[1] by human volunteers and robots from all over the world[2]. This platform, with an initial intention to be used within Wikimedia projects as a high quality secondary database [3], first started by centrally linking Wikipedia articles about the same topics in different languages[4][5][6][7][8], but soon it started linking with external databases.

Introduction to Wikidata

Wikidata is designed to be structured as a Resource Description Framework or RDF model which describes statements in the form of triplets of subject–predicate–object. In Wikidata, subject–predicate–object is termed as item–property–value. Items on Wikidata can represent every possible object, concept or topic in human knowledge which passes a certain threshold of defined notability and are represented by unique Q numbers. The actual data of an item is called value, which is pre-defined by the data type, be it strings, numbers, dates, url links, coordinates, musical notations etc. or even other items. Properties, represented by unique P numbers, describe the data value of items. The items, properties and values are language independent and thus totally machine-readable, although for human comfort and understanding, one can describe items in their own languages by adding or translating labels, descriptions or aliases.

Due to the machine-readable triplet structure of Wikidata, the database can be easily queried to find answers, which might not be otherwise possible from a list of unstructured contents such as Wikipedia articles. To retrieve and manipulate RDF data formats in triplets, we require a semantic query language for RDF databases named SPARQL. Through Wikidata query service, one can use SPARQL and retrieve data and the prevailing gaps on Wikidata and visualize in different ways.

Wikidata in West Bengal, India

Massive imports of coordinates for places in West Bengal happened between October 2018 and May 2019 on Wikidata as reflected by the map generated using Resemble.js

Wikidata activities around India have been organized around India for almost 4 years under the WikiProject India umbrella. Targeted approaches to fill data gaps on different topics have been pursued through data-thons and campaigns in these years and community strength has been aimed to increase through workshops and skill sharing initiatives.

Being part of that initiative, the Indian state of West Bengal has seen a lot of activities around Wikidata in recent years. Under the WikiProject umbrella, Wikidata volunteers have been working together to build data on different topics related to the state, its demographics, culture, heritage, education, health, politics, language etc. As heritage has been the prime focus of the Wikimedia community members of West Bengal, in this essay, we will identify the data gaps related to the topic through SPARQL query and explore reasons for the same, if any, through interviews of active volunteers who have been working on this area for years.

Wikimedia community members have been working on documenting different forms of heritage since 2011, when they organized Wikipedia Takes Kolkata photo-walk for the first time. Since then, they have organized eight more Wikipedia Takes Kolkata photo-walks, 11 Wiki Exploration projects in 9 districts of the state, 2 editions of prestigious Wiki Loves Monuments in India 2018 and 2019 and several other documentation projects organized organically or single-handedly and by doing so they have uploaded several thousands of photographs related to heritage structures and GLAM collections on Wikimedia Commons.

In this essay, we will focus on the photo-walks and explorations which were conducted to document heritage structures of West Bengal. We will focus on two basic types of data which should be there in every dataset on heritage structures, i.e. a) location, and b) image, and we will find out if there is any significant gap there using SPARQL queries.

Photo-walks and Wiki Explorations in West Bengal

Map of KMC heritage buildings generated from Wikidata query https://w.wiki/Tir

Let’s start with the nine consecutive series of Wikipedia takes Kolkata photo-walks which aims to photo-document heritage buildings and structures of Kolkata. To understand the data gap related to the heritage buildings, we will examine the presence of graded heritage buildings and structures enlisted by Kolkata Municipal Corporation (KMC) on Wikidata through different SPARQL queries. Wikidata now contains 923 heritage buildings and structures listed by KMC, but out of them 26.65% have images and only 18.53% have coordinates.

Although 81.47% of the items of the heritage structures were missing coordinates, but they gave fairly good idea about their location, all of the items had municipal wards and streets connected with them, utilizing which, photographers and travellers are expected to explore the sites easily. However, while testing the items of the wards, it was noticed that however all the 144 wards contain coordinates, but they all lack a crucial property which can denote their area of location i.e. the geoshape data. While coordinates can denote the exact location of certain parts of an area, it is misleading when it comes to a larger area, which requires geoshape to better describe the location. While testing the street data, it was found that both geoshape and coordinate data are lacking for the streets, which makes them extremely difficult to locate.

Map of temples in West Bengal generated from Wikidata query https://w.wiki/Tj7

For the last 3 years, Wikimedia volunteers from West Bengal have also been involved in Wiki Exploration projects to remote parts of the state documenting temples, mosques, sculptures etc., many of which have not been documented online before. Few hundreds of heritage structures in 9 districts of the state were documented and thousands of photographs under this project have been uploaded to Wikimedia Commons. Now, if we test the Wikidata presence of the temples situated in West Bengal, it can be noticed that 435 temples have items, out of which only 196 items have images and only 79 have coordinates. however 302 of them have their location pin-pointed to the village, ward, town or city level. Similar to the previous case, although there are 40,359 items for villages located in West Bengal, only 0.017% have coordinates while none have geoshape data.

From the above two scenarios, it can be easily concluded from the SPARQL queries, that there has been a significant amount of data gap. Both the datasets contain significant lack of location data and images. The second scenario even lacks data on the temples itself.

Challenges of Contributing to Wikidata in/from West Bengal

Now, to understand why there are huge gaps in the data, we have interviewed four volunteers from West Bengal who are involved in these two kinds of projects, three of them are Wikimedia contributors for five-ten years and one of them is relatively new to the movement. They all upload heritage photographs to Wikimedia Commons and 2 of them contribute to Wikidata. All of them agreed that due to lack of suitable hardware, they could not document the exact coordinate data while photo-documenting heritage structures. GPS devices or full-frame cameras with built-in GPS are expensive and are not affordable to many. Interviewees have also pointed out that due to lack of proper training on how to document heritage structures properly, photographers and amaetur researchers miss out vital points of documentation and thus increase data gaps. Restricted access to private heritage structures like temples maintained by families or private heritage buildings and their documents, lack of proper existing documentation along with analogue and digital metadata, and rapid destruction of built heritage due to lack of maintenance or improper restoration procedures etc. are also the reasons for data gaps. While answering the question about why photographs are not converted fully into data, they point out that it might be a burden for photographers to learn about data entry in Wikidata, as this is out of their area of interest and workflow. As noted by an interviewee, ‘the nature of work for Wikidata does not match with photographers' workflow.’ However, they also stressed on the need to conduct training programmes on Wikidata for photographers and interested people involved in documentation to let them know the importance of structured data in the area of heritage documentation.

Recommendations

From the observations of this short study, it is recommended that volunteers working on heritage documentation in West Bengal should be supported with suitable hardware to document coordinates. Frequent training programs should be conducted, preferably by experts, for volunteers on how to document heritage structures in a professional way, so that data gaps remain minimal. Training on Wikidata should be conducted for photographers to let them understand the importance of structured data in the field of heritage documentation. It is also recommended to increase interaction among the Wikidata and Wikimedia Commons volunteers, to understand each other's work flow and strategically modify those to provide optimal results.

References

[1] Vrandečić, Denny (2012). "Wikidata: a new platform for collaborative data collection". Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion. Lyon, France: ACM Press: 1063. doi:10.1145/2187980.2188242. ISBN 978-1-4503-1230-1.

[2] Vrandečić, Denny; Krötzsch, Markus (2014-09-23). "Wikidata: a free collaborative knowledgebase". Communications of the ACM. 57 (10): 78–85. doi:10.1145/2629489.

[3] Vrandečić, Denny (2012).

[4] Roth, Mathew (30 March 2012). "The Wikipedia data revolution". Wikimedia Foundation Blog.

[5] Pintscher, Lydia (14 January 2013). "First steps of Wikidata in the Hungarian Wikipedia". Wikimedia Deutschland Blog.

[6] Pintscher, Lydia (30 January 2013)."Wikidata coming to the next two Wikipedias". Wikimedia Deutschland Blog.

[7] Pintscher, Lydia (15 February 2013). "Wikidata live on the English Wikipedia". Wikimedia Deutschland Blog.

[8] Pintscher, Lydia (6 March 2013). "Wikidata now live on all Wikipedias". Wikimedia Deutschland Blog.

Notes

[1] The query results were generated during early 2020. The results may vary at the time of publication of this article.

[2] See Annexure I for the interview questionnaire.

[3] Read this report on Wikimedia Meta-Wiki here.

For more details visit https://cis-india.org/a2k/blogs/understanding-the-data-gaps-on-wikidata-concerning-heritage-structures-of-west-bengal

CIS_ODR Report_11/11/20

aman — 2021-03-22T05:22:55Z

For more details visit https://cis-india.org/internet-governance/cis_odr-report_11-11-20

The Wolf in Sheep's Clothing: Demanding your Data

Rekha Jain — 2020-11-10T17:44:13Z

This piece was originally published in The Economic Times Telecom, on 8 September, 2020.

The increasing digitalization of the economy and ubiquity of the Internet, coupled with developments in Artificial Intelligence (AI) and Machine Learning (ML) has given rise to transformational business models across several sectors. These developments have changed the very structure of existing sectors, with a few dominant firms straddling across many sectors. The position of these firms is entrenched due to the large amounts of data they have, and usage of sophisticated algorithms that deliver very targeted service/content and their global nature.

Such data based network businesses are generally multi-sided platforms subject to network effects and winner takes all phenomena, often, making traditional competition regulation inappropriate. In addition, there has been concern that such companies hurt competition as they are owners of large amounts of data collected globally, the very basis on which new services are predicated. Also since users have an inertia to share their data on multiple platforms, new companies find it very challenging to emerge. Several of the large companies are of US origin. Several regions/countries such as EU, UK, India are concerned that while these companies benefit from the data of their citizens or their devices, SMEs and other companies in their own countries find it increasingly difficult to remain viable or achieve scale. With the objective of supporting enterprises, including SMEs in their own countries, Europe, UK India are in different stages of data regulation initiatives.

In India, the Personal Data Protection (PDP) Bill, 2019 deals with the framework for collecting, managing and transferring of Personal Data of Indian citizens, including mandating sharing of anonymized data of individuals and non-personal data for better targeting of services or policy making. In addition, the Report by the Committee of Experts (CoE) on Non Personal Data (NPD) came up with a Framework for Regulating NPD. Since the NPD Report is a more recent phenomenon, this articles analyzes some aspects of it.

According to CoE, non-personal data could be of two types. First, data or information which was never about an individual (e.g. weather data). Second, data or information that once was related to an individual (e.g. mobile number) but has now ceased to be identifiable due to the removal of certain identifiers through the process of ‘anonymisation’. However, it may be possible to recover the personal data from such anonymized data and therefore, the distinction between personal and non-personal is not clean. In any case, the PDP bill 2019 deals with personal data. If the CoE felt that some aspect of personal data (including anonymized data) were not adequately dealt with, it should work to strengthen it. The current approach of the CoE is bound to create confusion and overlapping jurisdiction. Since anonymized data is required to be shared, there are disincentives to anonymization, causing greater risk to individual privacy.

A new class of business based on a “horizontal classification cutting across different industry sectors” is defined. This refers to any business that derives “new or additional economic value from data, by collecting, storing, processing, and managing data” based on a certain threshold of data collected/processed that will be defined by the regulatory authority that is outlined in the report. The CoE also recommends that “Data Businesses will provide, within India, open access to meta-data and regulated access to the underlying data” without any remuneration. Further, “By looking at the meta-data, potential users may identify opportunities for combining data from multiple Data Businesses and/or governments to develop innovative solutions, products and services. Subsequently, data requests may be made for the detailed underlying data”.

With increasing digitalization, today almost every business is a data business. The problem in such categorization will be with the definition of thresholds. It is likely that even a small video sharing app or an AR/VR app would store/collect/process/transmit more data than say a mid-sized bank in terms of data volumes. Further, with increasing embedding of IoT in various aspects of our lives and businesses (smart manufacturing, logistics, banking etc), the amount of data that is captured by even small entities can be huge.

The private sector, driven by profitability, identifies innovative business models, risks capital and finds unique ways of capturing and melding different data sets. In order to sustain economic growth, such innovation is necessary. The private sector would also like legal protection over these aspects of its businesses, including the unique IPR that may be embedded in the processing of data or its business processes. But mandating such onerous requirements on sharing by the CoE is going to kill any private initiative. Any regulatory regime must balance between the need to provide a secure environment for protecting data of incumbents and making it available to SMEs/businesses.

Meta data provides insights to the company’s databases and processes. These are source of competitive advantage for any company. Meta data is not without a context. The basis of demanding such disclosure is mandated with the proposed NPD Regulator who would evaluate such a purpose. In practice, purposes are open to interpretation and the structure of appeal mechanism etc is going to stall any such sharing. Would such mandates of sharing not interfere with the existing Intellectual Property Rights? Or the freedom to contract? Any innovation could easily be made available to a competitor that front-ends itself with a start-up. To mandate making such data available would not be fair. Further, how would the NPD regulator even ensure that such data is used for the purpose (which the proposed regulator is supposed to evaluate) that it is sought for? In Europe, where such data sharing mandates are being considered, the focus is on public data. For private entities, the sharing is largely based on voluntary contributions. Compulsory sharing is mandated only under restricted situations where market failure situations are not addressed through Competition Act and provided legitimate interest of the data holder and existing legal provisions are taken into account.

Further, the compliance requirements for such Data Businesses is very onerous and makes a mockery of “minimum government” framework of the government. The CoE recommends that all Data Businesses, whether government NGO, or private “to disclose data elements collected, stored and processed, and data-based services offered”. As if this was not enough, the CoE further recommends that “Every Data Business must declare what they do and what data they collect, process and use, in which manner, and for what purposes (like disclosure of data elements collected, where data is stored, standards adopted to store and secure data, nature of data processing and data services provided). This is similar to disclosures required by pharma industry and in food products”. Such disclosures are necessary in these industries as the companies in this sector deal with critical aspects of human life. But are such requirements necessary for all activities and businesses? As long as organizations collect and process data, in a legal manner, within the sectoral regulation, why should such information have to be “reported”? Further, such bureaucratic processes and reporting requirements are only going to be a burden to existing legitimate businesses and give rise to a thriving regulatory license raj.

Further questions that arise are: How is any compliance agency going to make sure that all the underlying metadata is made available in a timely manner? As companies respond to a dynamic environment, their analysis and analytical tools change and so does the metadata. This inherent aspect of businesses raises the question: At what point in time should companies make their meta-data available? How will the compliance be monitored?

Conclusion: The CoE needs to create an enabling and facilitating an environment for data sharing. The incentives for different types of entities to participate and contribute must be recognized. Adequate provisions for risks and liabilities arising out data sharing need to be thought through. National initiatives on data sharing should not create an onerous reporting regime, as envisaged by the CoE, even if digital.

DISCLAIMER: The views expressed are solely of the author and ETTelecom.com does not necessarily subscribe to it. ETTelecom.com shall not be responsible for any damage caused to any person/organisation directly or indirectly.

For more details visit https://cis-india.org/internet-governance/blog/the-wolf-in-sheeps-clothing-demanding-your-data

Annual Programmatic Report 2018-2019

pranav — 2020-11-10T10:56:33Z

For more details visit https://cis-india.org/about/reports/annual-programmatic-report-2018-2019

Investigating Encrypted DNS Blocking in India

divyank — 2020-10-27T11:21:08Z

We find that encrypted DNS protocols are not blocked in India and share our test methodology.

This report was edited and reviewed by Gurshabad Grover and Simone Basso.

The Domain Name System (DNS) translates human-readable web addresses, like ‘cis-india.org’, into machine-readable IP addresses, such as ‘172.67.211.18’, that the routers that comprise the internet can understand and direct traffic to. This basic function of the web has historically operated unencrypted — allowing intermediaries that facilitate access to the internet, like coffee shop Wi-Fi operators and internet service providers (ISPs), to view what websites we visit. This gap in privacy is being exploited by both public and private entities to censor access to the web and surveil our browsing habits.

New internet protocols are being deployed that attempt to encrypt connections to DNS providers. Through the use of these methods, the contents of DNS queries are hidden from network intermediaries and eavesdroppers and are only visible to the DNS provider chosen by an individual or a default one assigned to them by their ISP or web browser. While there are other ways of censoring web traffic, encrypted DNS protocols prevent censors from using their older DNS-based methods. In response to these new protocols, states like Iran are trying to block them entirely, to maintain the status quo.

In this report, we investigate and find that encrypted DNS protocols, specifically the DNS over HTTPS (DoH) and DNS over TLS (DoT) standards, are accessible through major Indian ISPs, and describe the technical details of our testing methodology.

Test Setup

We compiled a list of publicly accessible DNS resolvers that support the encrypted DoH and DoT protocols and tested access to them from four popular Indian ISPs, namely Airtel, Atria Convergence Technologies (ACT), Reliance Jio, and Vodafone. Together, these cover a large majority (roughly 95%, as reported by TRAI) of the Indian internet subscriber base.

To test connectivity, we used the Open Observatory for Network Interference (OONI) probe engine (version 0.18.0). Specifically, the ‘miniooni’ command-line interface tool bundled with it. Instructions on how to install this can be found here.

Test methodology

To test whether DNS providers are reachable over encrypted communication protocols, the tool performs a DNS query using the specified one (either DoH or DoT). If the connection is successful and we receive a response from the DNS server, we conclude that the protocol is not blocked. Failing to query a specific DNS server over DoT or DoH does not necessarily mean that it has been censored. To understand whether a failure could be censorship, rather than a transient error, we would correlate measurements from many users within the same ISP and country and use an alternate network, such as a VPN, to access the possibly blocked service from another country.

In Iran, where DNS over TLS is reported to be blocked, it was found that censorship occurs by interfering with the TLS handshake. Traffic corresponding to DNS over TLS is easier to identify and block as it communicates over a unique port and a distinctive ALPN, while DNS over HTTPS traffic is harder to block effectively as the HTTPS standard is widely used on the web and interference would lead to collateral censorship.

Results

The tests were run on each ISP in early October 2020 using the following command:

$ ./miniooni --file=./resolvers.txt dnscheck

The raw results in the OONI data format can be found here. A summary of the observations are as follows:

All DNS resolvers tested were accessible over both DoH and DoT protocols from all ISPs tested.
IPv6 addresses were not reachable through ACT broadband. This limitation was independently confirmed using the Test-IPv6 tool and has also been discussed on Reddit.

Limitations

As our previous research by the Centre for Internet and Society indicates, censorship practices vary across ISPs. While we find no evidence of encrypted DNS protocols being blocked on these four major ISPs, there may be others implementing such blocking.

The second limitation is that these tests were run on a handful of connections from a couple of locations (Delhi and Bangalore). Web censorship mechanisms may vary by location within the country.

Finally, the results only indicate the accessibility of encrypted DNS resolvers at a particular point in time. We have not put in place any continuous monitoring of the censorship of encrypted DNS protocols.

Conclusion

Broadly, the legal framework of web censorship in India allows the Government and courts to ask ISPs to block access to online resources. The precise technical details of how to implement the censorship are left to the ISPs.

Because of net neutrality obligations, ISPs are not supposed to arbitrarily block resources. Coupled with the fact that the use of encrypted DNS protocols is not related to any particular content/website deemed unlawful, it might be expected that ISPs are not blocking encrypted DNS protocols. However, previous evidence of arbitrary blocking by ISPs motivated us to study whether any major ISP was blocking the use of these protocols or preventing access to any third-party DNS server.

As part of this exercise, we also contributed code to the OONI probe engine, making it easier for other researchers to test connectivity to multiple DNS providers.

For more details visit https://cis-india.org/internet-governance/blog/investigating-encrypted-dns-blocking-in-india

Mapping GLAM in Maharashtra

Subodh Kulkarni — 2021-05-15T12:30:59Z

This is a short study on mapping the digital transition in selected Galleries, Libraries, Archives and Museums (GLAM) institutions in Maharashtra, India, and exploring possibilities and challenges for collaborations with Wikimedia projects. Research was undertaken by Aaryaa Joshi, Dnyanada Gadre-Phadke, Kalyani Kotkar and Subodh Kulkarni; the report has been authored by Subodh Kulkarni with editorial oversight and support by Puthiya Purayil Sneha, and external review by Sumandro Chattapadhyay. This is part of a series of short-term studies undertaken by the CIS-A2K team in 2019–2020.

Introduction

The digital turn has been an important development for the cultural heritage sector in India, especially in the last decade, where access to internet and multimedia technologies has led to several advancements in the Galleries, Libraries, Archives and Museums (GLAM) space. This has also encouraged a multiplicity of uses of cultural content in diverse contexts. Several efforts have been undertaken in this space over the last decade, including state initiatives like the National Museum Collections digital repository, archival efforts at universities such as Jadavpur University and private and individual initiatives such as the People’s Archive of Rural India (PARI) and Indiancine.ma. Apart from developments in preservation, curation and content sharing there remain continued concerns related to access, infrastructure and linguistic barriers in this sector. Intellectual property rights, open access and privacy issues have also emerged as important issues for cultural institutions looking to open up their collections to a wider public.

Collaboration with open knowledge production spaces like Wikimedia and OpenGLAM then offer important insights into possibilities now available with the digital turn for better public access to cultural content, but also in terms of the development of collaborative archival efforts. Efforts such as GLAM-Wiki have been crucial in bridging the gap between cultural institutions and initiatives in the free knowledge movement. There is still however lack of documentation and research on the various kinds of existing collections and archival efforts afoot in India, and how they may benefit from better access through platforms like Wikimedia. This study maps a few of such GLAM institutions in Maharashtra, India, and reviews their collections, challenges and limitations to explore possibilities for better collaboration between cultural and public memory institutions through GLAM-Wiki initiatives.

Research Questions and Method

The study was framed by the following questions:

How has the digital transition in the GLAM sector in Maharashtra, India, impacted the process of creation and access to cultural content?
What are possible collaborations with open knowledge efforts like GLAM-Wiki?

The mapping of GLAM institutions was undertaken through questionnaires/surveys conducted with six GLAM institutions working in Pune district and one in Kolhapur district of Maharashtra state. The institutions were identified through existing networks established by Subodh Kulkarni, CIS-A2K Programme Officer associated with Wikimedia projects working in this area and snowball sampling. The questionnaires were focused on the nature, objective and scope of the collections, funding, provenance, offline and online workflows (including acquiring, preservation, accessioning, digitisation and metadata standards), human resources, infrastructure, IPR policies and public outreach efforts. The questionnaires were administered with the help of the Programme Officer and volunteers working in this language community. The questionnaire with Marathi translation is given in Annexure I.

The mapping helped to produce a set of recommendations for possible GLAM-Wiki collaborations in the Indian context. This was done through field visits to these institutions, review of the material, and interviews with key resource persons (administrators, faculty and students, archivists, librarians, developers etc.) who manage the collections of cultural content.

The following seven GLAM institutions were visited during the period November 2019 to February 2020. Further visits were cancelled due to the COVID-19 pandemic situation. Three Wikimedians — User:ज्ञानदा गद्रे-फडके, User:आर्या जोशी & User:कल्याणी कोतकर uploaded images of these collections on Wikimedia Commons, and added/expanded five related articles on Marathi Wikipedia — राजा दिनकर केळकर संग्रहालय, पुणे नगर वाचन मंदिर, सार्वजनिक वाचनालय, राजगुरुनगर, and आपटे वाचन मंदिर.

Observations about Research Method

The study was done with the help of three active Wikimedians, Aaryaa Joshi (Username:आर्या जोशी), Dnyanada Gadre-Phadke (Username:ज्ञानदा गद्रे-फडके) & Kalyani Kotkar (Username:कल्याणी कोतकर) interested in GLAM related activities. The questionnaire was developed with their participation. Orientation sessions were conducted to discuss the research design, process and outputs. The potential areas for bringing content into various Wikimedia projects were explained. While these Wikimedians conducted the visits for this mapping voluntarily, the actual expenses on travel, refreshments etc. were reimbursed. These volunteers had to carve out time slots from their regular jobs to complete the task. The timings at institutions and availability of key persons also needed to be considered while planning the visits. Sometimes the volunteers had to take leave from their regular work, which also led to some difficulties.

The first visit was to establish an association with the institution and the persons. The meeting with the authorities at the institution was essential to get the consent forms signed and complete other such formalities, including permissions to conduct interviews. This process delayed the work slightly, but is an important learning in terms of the need to establish a rapport with institutions for such research. The questionnaire was translated into Marathi (the local language) to facilitate the discussions. It was felt that to cover the basic aspects of the collections at an institution, at least 4–5 visits are required with a little gap between visits. This regular frequency will help to build relationships as well as maintain the work flow. The sample size for the present study was small due to some unforeseen constraints such as getting enough number of interested volunteer Wikimedians to undertake some of the research, multiple visits required for each institution which extended the duration of fieldwork, lack of positive responses from the GLAM institutions as well as eventual restrictions due to the COVID-19 pandemic.

Survey of GLAM in Maharashtra

To identify the major institutions in Maharashtra and prepare the list of major GLAM institutions in the state, various government and private official websites as well as publications were studied. It was realised that no website or publication has created a comprehensive district or statewide list of institutions. Information about a few institutions is available online, but these are helpful largely from a tourism point of view. There is no proper selection or thematic categorisation which considers researchers, students, or other communities of interest. The popular tourist routes are given importance. Therefore, there is a need to document all the GLAMs category-wise on platforms freely accessible to the public. Some of the websites are listed in Annexure II .

Description of Surveyed Institutions

Apte Vachan Mandir, Ichalkaranji

Art Gallery at Apte Vachan Mandir, Ichalkaranji. By ज्ञानदा गद्रे-फडके, Art gallery at Apte vachan mandir, Ichalkaranji, CC BY-SA 4.0

Apte Vachan Mandir is a 150 year old library in a small city named Ichalkaranji in Kolhapur district of Maharashtra. The authorities are very cooperative and eager to start digitization of the old/rare books and art gallery. They also need help regarding digitisation and preservation of the century old paintings. The institute is ready to scan the books if equipment and training is provided to their staff. The officials have given the list of 400+ rare books which they are planning to digitise. The official communication has started with the secretary of the institution. The further process stalled due to the COVID-19 pandemic.

Iravati Karve Anthropological Museum, Pune

Iravati Karve Anthropological Museum is located in the Savitribai Phule Pune University campus, Pune. The initial visit was conducted and permission was sought for further documentation. The curator and authorities have extended all possible cooperation regarding open knowledge access to the museum collections urther visits could not be undertaken due to the restrictions as a result of the COVID-19 pandemic.

Joshi’s Museum of Miniature Railways

Joshi’s Museum of Miniature Railways was founded in 1998 by B. S. Joshi in Pune city. It houses different models of trains, railway stations, tracks with signals, bridges, streets in the cities, circus etc. Light and sound shows are also arranged here. This is a unique collection in India. One can get an experience of scientific concepts, handicraft, technology, history, amusement related artifacts at one place. The authorities of this museum do not feel the need of digitization as it is a live show which gives the best experience. However the documentation of the development process regarding railway models present in the museum is important. They wish to increase the outreach through publicity of the museum on free knowledge platforms to attract visitors to increase the footfall. As it is a privately owned museum, it is getting difficult to maintain it or add new things to it. So, there is scope for some kind of engagement with this museum.

Museum in College of Military Engineering, Pune

College of Military Engineering is a premier institute for army training in India established in 1943. The museum houses vintage engineering equipment from the pre-World War I era, which is displayed over a large landscape. The archives of the corps are also maintained in the library section. Permission for an initial visit was received late due to administrative procedures. Further visits for interviews with the key officials were planned but cancelled due to the lockdown following the COVID-19 pandemic. But there is scope to document the rare machinery, engineering structures, military vehicles etc. as it is openly accessible to the public. The institute is also keen to spread this knowledge to young generations.

Pune Nagar Vachan Mandir

Pune Nagar Vachan Mandir Library. By दिपक कोतकर, पुणे नगर वाचन मंदिर ग्रंथालय 4, CC BY-SA 4.0

Pune Nagar Vachan Mandir is a historic library in Pune founded in 1848. The library houses a rich collection with rare books in various languages from the 17th century. It also possesses historical manuscripts and valuable diaries. The library management is very up to date on new developments in the field and has already adopted web technologies for catering to members. The catalogues are made available online in Koha. They have started digitisation efforts to some extent but need inputs and support. The authorities are eager to collaborate on larger projects to make their resources freely available. The authorities are ready to give the database of books for further integration with Wikimedia projects.

Raja Dinkar Kelkar Museum

Raja Dinkar Kelkar Museum was founded in 1920 by Dinkar Kelkar in Pune city. This museum houses 22,000 rare artifacts from different historical times. The thematic galleries have been developed thoughtfully. The museum has published 8 catalogues on these themes. More details of this museum can be seen on the official website.

Mastani Mahal restored at Raja Dinkar Kelkar Museum,Pune. By आर्या जोशी, मस्तानी महाल, CC BY-SA 4.0

This museum is partially funded by the State Government for some regular maintenance. The funds for development, upgradation, conservation and promotion are to be raised by the institution. A digitisation project has been planned by the museum authorities, and it is progressing as the resources are being arranged. The museum officials are open to share information digitally in the public domain. They believe that they can reach interested masses through Wikimedia Projects. They have given permission to photograph the objects and the various conservation practices in their laboratory. They have expressed their readiness to give free access to libraries and museums for Wikimedians visiting the institution for purposes of research.

Rajgurunagar Public Library, Rajgurunagar

Rajgurunagar Public Library is a 150 plus year old Public access library with a competitive examination center. The special features are rare books from the 19th century and manuscripts. The management was not aware of Wikimedia projects, Open source cataloging, Unicode data entry systems etc. But after the visit, the officials responded very positively to start digitisation of 25 rare books in collaboration with the Access to Knowledge programme, Centre for Internet and Society and Vigyan Ashram, Pabal. The task was completed and these books were digitised and uploaded on Wikimedia Commons by creating a separate category. As the manuscripts and other material is getting degenerated, this collection needs to be digitised at the earliest.

Observations

Target audience

The GLAM institutions, especially museums and libraries are facing a decrease in footfall in recent times. The officials feel that uploading material on the web under free licenses will further aid this trend. At the same time they also express their interest and ideas to attract a new generation to engage with these collections through promotional mobile apps. There are however persistent anxieties about public access to these materials on the web. Some institutions possess unique or rare material such as antiquities, manuscripts, live models or books. The officials fear that the institutions will lose their points of attraction if they are projected on the web with descriptions. On the other hand, the researchers and interested communities are unaware of such treasures with these institutions.

Sustainability

Sustenance of the institutions is another important point and obstacle in digitisation of collections. The publications of the museums are a source of revenue for them. As the entry fees or subscription charges need to be kept minimal for the visitors, the priced material sold at the counters is the only income source for these institutions. Hence, there is a limitation of online availability and promotion of this material. Finding a sustainable model which also allows for open access to content is a difficult task for a large number of organisations. The financial support to these institutions is not a priority area for Government agencies or philanthropic organisations. Some institutions have successfully attempted for corporate social responsibility (CSR) funding. They need professional inputs for fundraising campaigns.

Technical challenges

There are also technical challenges with the digitisation process itself. Some of the libraries have not adopted a universal cataloging system. Therefore it is difficult to analyse the data of books according to copyright status and physical conditions. The authorities are eager to dispose of decaying material after digitisation. Some of them have approached State Government departments for funds but got no response. This may be because standard digitisation policies are not in place at a national level, and a lot of institutions are unaware of existing benchmarks and policies. Another hindrance is that the books will not be permitted to be taken outside the institution for scanning because of the physical condition. Awareness and training in archival and records management is a key requirement in these conditions.

Capacity building

The awareness and capacity building of the personnel at the institutions in the area of free knowledge and digitisation skills is to be enhanced before starting any project. The terminologies and case studies of some projects in local languages are necessary for better understanding of concepts as well as best practices. Some of the good archive projects in Marathi completed by various organisations include digitisation of the complete works of Vinoba Bhave, Prabodhankar Thakeray and Vinayak Savarkar. The language department of the State Government of Maharashtra has also digitised and uploaded 129 old books and 555 old magazines on their website. The other website of the literature & culture department has made available 434 books in PDF, epub and mobi format.

Recommendations

These recommendations are based on the interactions with the Wikimedians involved in the process, the interviews with key persons from seven GLAM institutions and previous experiences of working with such institutions. The important learnings from this research study are captured in the observations stated above. As the focus of the discussions remained limited to the access to cultural content and possible collaborations regarding Wikimedia Projects, the content creation aspect was not touched upon in detail. The recommendations emerging from this study provide some guidelines for action points for the near future. However, for designing broader strategies for the GLAM sector, a sizable number of institutions in different regions of the state need to be mapped to provide a more comprehensive picture of the sector and its possibilities

The recommendations regarding various stakeholders in the mapping process are stipulated below -

For Wikimedians

Orientation sessions for Wikimedians visiting the institute regarding GLAM related Wikimedia projects, copyright issues, Creative commons licenses and basics of library science should be conducted. The availability of resource material on these topics in local languages will be useful in the interview process.
For replicating this mapping activity across one state or several states, the selection of Wikimedia volunteers is crucial. The provision for reasonable honorarium per visit should be made for time bound as well as qualitative execution of tasks.

For GLAM institutions

It was observed that the GLAM institutions are not well aware about the free knowledge platforms like Wikimedia projects or Internet archive. They are aware about copyright and intellectual property rights, but not about Creative Commons or other licenses available. They wish to make their resources available across the world but are not clear about the methods. The collaboration regarding these aspects is highly appreciated.
Old libraries have a good collection of rare old books. They are finding difficulties in preservation of books as well as facing space constraints. Also for these books, readership is also negligible. Hence there is a need to digitise this valuable reference material before it degenerates.

For CIS (or other implementing agency)

A comprehensive list of GLAM institutions in the state, with further categorisation into geographical & thematic aspects is to be developed and be made freely accessible for the public at large.
Training in universal metadata structures and unicode systems like Koha is to be arranged for the staff and management members at these institutions. At least the cataloging in universal format should be done on priority to analyse the metadata for copyright free status. A central repository is needed to avoid duplication in scanning. CIS-A2K needs to design strategic plan for this activity.
For in-depth case studies of potential GLAM-Wiki institutions, Wikimedian in Residence (WiR) programme should be adopted.
Interactions with concerned State and Central Government departments would facilitate the research activity and further collaborations. The findings of the research could be shared with such agencies along with concrete project proposals designed in collaboration with concerned institutions.

As illustrated by the observations of this study, the digital turn has brought about significant changes in the cultural heritage sector, but a large part of these still pertain to concerns around access to cultural content. The role of digital technologies and free knowledge platforms like Wikipedia in addressing these issues of access and outreach, and importantly in content creation therefore remains to be explored, through a more comprehensive study of the sector. Further, the study has also been indicative of the potential of collaborative work, and efforts needed towards the same, which may be helpful in also contributing towards a broader strategy for GLAM work with Wikimedia projects in Indian languages.

Read this report on Wikimedia Meta-Wiki here.

For more details visit https://cis-india.org/a2k/blogs/mapping-glam-in-maharashtra

Artificial Intelligence: A Full-Spectrum Regulatory Challenge (Working Draft) PDF

pranav — 2020-08-04T06:07:47Z

For more details visit https://cis-india.org/internet-governance/artificial-intelligence-a-full-spectrum-regulatory-challenge-working-draft-pdf

The State of Secure Messaging

divyank — 2020-07-17T08:12:15Z

A look at the protections provided by and threats posed to secure communication online.

This blogpost was edited by Gurshabad Grover and Amber Sinha.

The current benchmark for secure communication online is end-to-end encrypted messaging. It refers to a method of encryption wherein the contents of a message are only readable by the devices of the individuals, or endpoints, participating in the communication. All other Internet intermediaries such as internet service providers, internet exchange points, undersea cable operators, data centre operators, and even the messaging service providers themselves cannot read them. This is achieved through cryptographic mechanisms that allow independent devices to establish a shared secret key over an insecure communication channel, which they then use to encrypt and decrypt messages. Common examples of end-to-end encrypted messaging are applications like Signal and WhatsApp.

This post attempts to give at-risk individuals, concerned citizens, and civil society at large a more nuanced understanding of the protections provided and threats posed to the security and privacy of their communications online.

Threat Model

The first step to assessing security and privacy is to identify and understand actors and risks. End-to-end encrypted messaging applications consider the following threat model:

Device compromise: Can happen physically through loss or theft, or remotely. Access to an individual’s device could be gained through technical flaws or coercion (legal, or otherwise). It can be temporary or be made persistent by installing malware on the device.
Network monitoring and interference: Implies access to data in transit over a network. All Internet intermediaries have such access. They may either actively interfere with the communication or passively observe traffic.
Server compromise: Implies access to the web server hosting the application. This could be achieved through technical flaws, insider access such as an employee, or through coercion (legal, or otherwise).

End-to-end encrypted messaging aims to offer complete message confidentiality and integrity in the face of server and network compromise, and some protections against device compromise. These are detailed below.

Protections Provided

Secure messaging services guarantee certain properties. For mature services that have received adequate study from researchers, we can assume them to be sound, barring implementation flaws which are described later.

Confidentiality: The contents of a message are kept private and the ciphers used are practically unbreakable by adversaries.

Integrity: The contents of a message cannot be modified in transit.

Deniability: Aims to mimic unrecorded real-world conversations where an individual can deny having said something. Someone in possession of the chat transcript cannot cryptographically prove that an individual authored a particular message. While some applications feature such off-the-record messaging capabilities, the legal applicability of such mechanisms is debatable.

Forward and Future Secrecy: These properties aim to limit the effects of a temporary compromise of credentials on a device. Forward secrecy ensures messages collected over the network, which were sent before the compromise, cannot be decrypted. Future secrecy ensures messages sent post-compromise are protected. These mechanisms are easily circumvented in practice as past messages are usually stored on the device being compromised, and future messages can be obtained by gaining persistent access during compromise. These properties are meant to protect individuals aware of these limitations in exceptional situations such as a journalist crossing a border.

Shortcomings

While secure messaging services offer useful protections they also have some shortcomings. It is useful to understand these and their mitigations to minimise risk.

Metadata: Information about a communication such as who the participants are, when the messages are sent, where the participants are located, and what the size of a message is can offer important contextual information about a conversation. While some popular messaging services attempt to minimize metadata generation, metadata leakage, in general, is still considered an open problem because such information can be gleaned by network monitoring as well as from server compromise. Application policies around whether such data is stored and for how long it is retained can improve privacy. There are also experimental approaches that use techniques like onion routing to hide metadata.

Authentication: This is the process of asserting whether an individual sending or receiving a message is who they are thought to be. Current messaging services trust application servers and cell service providers for authentication, which means that they have the ability to replace and impersonate individuals in conversations. Messaging services offer advanced features to mitigate this risk, such as notifications when a participant’s identity changes, and manual verification of participants’ security keys through other communication channels (in-person, mail, etc.).

Availability: An individual’s access to a messaging service can be impeded. Intermediaries may delay or drop messages resulting in what is called a denial of service attack. While messaging services are quite resilient to such attacks, governments may censor or completely shut down Internet access.

Application-level gaps: Capabilities offered by services in addition to messaging, such as contact discovery, online status, and location sharing are often not covered by end-to-end encryption and may be stored by the application server. Application policies around how such information is gathered and retained affect privacy.

Implementation flaws and backdoors: Software or hardware flaws (accidental or intentional) on an individual’s device could be exploited to circumvent the protections provided by end-to-end encryption. For mature applications and platforms, accidental flaws are difficult and expensive to exploit, and as such are only accessible to Government or other powerful actors who typically use them to surveil individuals of interest (and not for mass surveillance). Intentional flaws or backdoors introduced by manufacturers may also be present. The only defence against these is security researchers who rely on manual inspection to examine software and network interactions to detect them.

Messaging Protocols and Standards

In the face of demands for exceptional access to encrypted communication from governments, and risks of mass surveillance from both governments and corporations, end-to-end encryption is important to enable secure and private communication online. The signal protocol, which is open and adopted by popular applications like WhatsApp and Signal, is considered a success story as it brought end-to-end encryption to over a billion users and has become a de-facto standard.

However, it is unilaterally developed and controlled by a single organisation. Messaging Layer Security (or MLS) is a working group within the Internet Engineering Task Force (IETF) that is attempting to standardise end-to-end encryption through participation of individuals from corporations, academia, and civil society. The draft protocol offers the standard security properties mentioned above, except for deniability which is still being considered. It incorporates novel research that allows it to scale efficiently for large groups up to thousands of participants, which is an improvement over the signal protocol. MLS aims to increase adoption further by creating open standards and implementations, similar to the Transport Layer Security (TLS) protocol used to encrypt much of the web today. There is also a need to look beyond end-to-end encryption to address its shortcomings, particularly around authentication and metadata leakage.

For more details visit https://cis-india.org/internet-governance/blog/the-state-of-secure-messaging

Response to the ‘Call for Comments’ on The Santa Clara Principles on Transparency and Accountability

Torsha Sarkar and Suhan S — 2020-07-01T05:56:03Z

The Santa Clara Principles on Transparency and Accountability, proposed in 2018, provided a robust framework of transparency reporting for online companies dealing with user-generated content. In 2020, the framework underwent a period of consultation "to determine whether the Santa Clara Principles should be updated for the ever-changing content moderation landscape." In lieu of this, we presented our responses, which are in-line with our previous research and findings on transparency reporting of online companies, especially in context of the Indian digital space.

The authors would like to thank Gurshabad Grover for his editorial suggestions. A PDF version of the responses is also available here.

-------

1. Currently the Santa Clara Principles focus on the need for numbers, notice, and appeals around content moderation. This set of questions will address whether these categories should be expanded, fleshed out further, or revisited.

a. The first category sets the standard that companies should publish the numbers of posts removed and accounts permanently or temporarily suspended due to violations of their content guidelines. Please indicate any specific recommendations or components of this category that should be revisited or expanded.

While the Principles provide a robust framework for content moderation practices carried out by the companies itself, we believe that the framework could be expanded significantly to include more detailed metrics on government requests for content takedown, as well as for third-party requests. For government requests, this information should include the number of takedown requests received, the number of requests granted (and the nature of compliance - including full, partial or none), the number of items identified in these requests for takedown, and the branch of the government that the request originated from (either from an executive agency or court-sanctioned).

Information regarding account restrictions, with similar levels of granularity, must also form a part of this vertical. These numbers must be backed with further details on the reasons ascertained by the government for demanding takedowns, i.e. the broad category under which content was flagged. For third party requests, similar metrics should be applied wherever appropriate.

Additionally, for companies owning multiple platforms, information regarding both internal content moderation and moderation at the behest of external requests (either by the state or third-parties), must be broken down platform-wise. Alternatively, they should publish separate transparency reports for each platform they own.

b. The second category sets the standard that companies should provide notice to each user whose content is taken down or account is suspended about the reason for the removal or suspension. Please indicate any specific recommendations or components of this category that should be revisited or expanded.

While this category envisages companies to provide notice to its users across removals related to all categories of content, additional research reveals that oftentimes, companies create further categorization of ‘exceptional circumstances’, where it may hold the discretion for not sending a notice, including for CSAM or threats to life. While the intent behind such categorization might be understandable, we believe that any list of exceptional circumstances should not be ideally left to company discretions, and must be prepared in a collaborative fashion. Accordingly, we recommend that the Principles be expanded to identify a limited set of exceptional circumstances, where not sending a notice to a user would be permissible, and would not count as a violation of the Principles.

Additionally, while the current framework provides requirements for granular details in the notice in case of content flagged by the company’s internal moderation standards, we believe a similar model should also be emulated for content removals at the behest of the state. When a piece of content has been identified as illegal by a government takedown request, then the notice issued by the company to the user should be as granular as possible, within the permissible limits of the law under which the takedown request was issued in the first place. Such granularity must include, among other things, the exact legal provision under which the content has been flagged, and the reasons that the government has given in implementing this flagging.

c. The third category sets the standard that companies should provide a meaningful opportunity for timely appeal of any content removal or account suspension. Please indicate any specific recommendations or components of this category that should be revisited or expanded.

Currently, the category of ‘appeals’ in the Santa Clara Principles is focussed on having accountability processes in places, and emphasize on the need of having meaningful review. The framework of the Principles also currently envisage only internal review processes carried out by the company. However, in light of Facebook unveiling its plans for an Oversight Board, a structurally independent body, which would arbitrate select appeal cases of content moderation, these pre-existing principles might need revisiting.

While the Oversight Board is a relatively novel concept, given the important precedence it sets, setting certain fundamental principles of transparent disclosures and accountable conduct around it, might allow researchers and regulators alike to gauge the efficacy of this initiative. Accordingly, the Principles should consider some base-level disclosures that the company must make when it is referring a select category of cases for independent external review. This might include a statement of reasons explaining why certain cases were prioritized for independent review, and in the instance that the decision hinges on a public interest question, then the proceedings of the independent review might also be required to be made public (with due recourse paid to security issues and the confidentiality of the parties involved).

2. Do you think the Santa Clara Principles should be expanded or amended to include specific recommendations for transparency around the use of automated tools and decision-making (including, for example, the context in which such tools are used, and the extent to which decisions are made with or without a human in the loop), in any of the following areas:

Content moderation (the use of artificial intelligence to review content and accounts and determine whether to remove the content or accounts; processes used to conduct reviews when content is flagged by users or others)

Companies have begun to rely on a variety of automated tools to aid their content removal processes, across a variety of content, including revenge porn, terrorist content and CSAM. Research however, has shown that the tools deployed often have their limitations, which include over-removal, and censorship of perfectly legitimate speech.

We recommend that the Principles should accordingly be expanded to include content removed by automatic flagging, the error rates encountered by the tools, and the rate at which wrongly taken down content is being reinstated. There should also be a qualitative aspect to the information presented by these companies, and therefore, there should be a clearer disclosure of the kind of automated tools they use. Such disclosure must, of course, be balanced against interests of the security of the platform and the necessity to ensure that information disclosed is not used by malicious third-party actors to circumvent legitimate moderation.

Additionally, with specific reference to ‘extremist content’, several online companies have collaborated to form the Global Internet Forum to Counter Terrorism (GIFCT), with the intent of facilitating better moderation. The GIFCT uses a hash-based technology of a shared database of ‘terrorist’ content for filtering content on their platforms. However, as it has already been noted, this initiative provides very little information regarding how it functions, and operates without any collaboration with civil society or human rights groups, and without any law enforcement oversight.

Such similar collaborative measures going forward, for deployment of varied forms of automated tools to filter out various forms of content, without any transparency or accountability, can be problematic, since it makes information regarding the efficacy of these tools scarce, research into the processes difficult, and ultimately, any reformative suggestions impossible.

Accordingly, the Principles must emphasize that collaborative efforts to the effect of using automated tools in content moderation must be done with sufficient consideration to the basic principles of transparency and accountability. This might include sharing information about processes with a select list of civil society and human rights groups, and in the transparency reports, separately presenting information about the accuracy rates of the tools.

Content ranking and downranking (the use of artificial intelligence to promote certain content over others such as in search result rankings, and to downrank certain content such as misinformation or clickbait)

Ranking and downranking algorithms have been deployed by companies for various purposes and across different services they offer. For the purposes of our discussion, we would restrict ourselves to two chief use-cases of these processes: search engines and internet platforms.

Search engines

The algorithms that have been developed to find accurate results for query are oftentimes not perfect, and they have been accused of being biased, including being politically non-partisan and burying certain ideologies. Similarly, in the case of automated systems to downrank misinformation, accuracy is not guaranteed as such systems can identify accurate information as misinformation. Since the algorithm is constantly learning and updating, it becomes difficult to know exactly why certain content may be made less visible.

As case-studies of several search engines indicate, a company’s ranking processes often use a combination of algorithms and human moderators. Requirement for transparency therefore, can mandate disclosure of the training materials for these human moderators. For instance, Google has a scheme of ‘Search Quality Raters’, which comprises a group of third-party individuals responsible for giving feedback regarding search results. The guidelines on which their feedback is based on, are publicly available. The Principles can therefore call for similar disclosure of other companies that deploy human help for their ranking processes.

Internet platforms

For social media platforms, ranking algorithms are utilized for curation of news-feeds: dashboards showing content to the user that the algorithm thinks are relevant. The algorithm makes these decisions based on different signals that it is trained with. Information around these algorithms is hard to come by, and even if it is, the algorithms are often blackboxes, with their decisions not explainable.

There are however, ways by which transparency around these algorithms can be improved without compromising the security and integrity of the platform. This might include companies informing users, in an accessible manner, “(i) how they rank, organize and present user generated content.”, and updating the data in a timely manner, allowing researchers and regulators the appropriate opportunity to utilize this information while it is still relevant.

Companies should also have an easy-to-access policy that outlines how it plans to manage the human rights risks arising out of the system(s) it deploys. The human rights impacts assessment must additionally consider the broad social contexts within which the algorithm system is used.

Ad targeting and delivery (the use of artificial intelligence to segment and target specific groups of users and deliver ads to them)

Companies such as Facebook and Google collect a wide variety of data from its audience, using a variety of data points (including age, location, race) which is used to deliver personalised advertisements by the advertisers affiliated with the company. Methods like activity tracking and browser-fingerprinting are employed to track users, with or without explicit notice. Since a user’s privacy is greatly affected by such tracking, more transparency is needed where user data is collected by companies and where they are processed using the company’s algorithms to target and deliver ads. Additionally, targeted advertising, especially in the context of political advertising, result in segmenting groups of people and subjecting them to advertising campaigns. This, in turn may have drastic consequences, since they seem to deepen divisiveness over critical issues.

Notice

The Principles should identify metrics of a meaningful notice that companies must give users when their data is collected for delivering advertisements. Among others, such notice should specify all kinds of data the company is collecting regarding the user, and the categories across which they have been segmented or categorized for advertising.

Disclosure

Companies should also strive to disclose how data is collected and processed, specifically to segment users and deliver advertisements, in detail. This might include disclosing all the categories made available to advertisers by the company, and the names and identities of third parties (both advertisers and data-brokers) with whom such data is shared. CNBC, for instance, in 2019 reported that Facebook selectively shared user data with select partners while denying rival companies from accessing the data. Additionally, companies that allow users to opt out of their data being wholly or partly should disclose this option and make it easy to access. For Example, Facebook lets users turn off data being used for advertising in three different categories. Facebook Ad Preferences menu hidden in a user’s settings is detailed. However, barring a public post that attempts to explain how and why users see certain ads on Facebook, which has one line at the end that directs users to their Ad Preference settings to “View and use” their controls, the company does not have any public document explaining users their choices. Amazon, on the other hand allows users to turn off personalized ads completely and has a dedicated page that explains how a user’s data is used for personalizing advertisements and options to disable it.

Content recommendations and auto-complete (the use of artificial intelligence to recommend content such as videos, posts, and keywords to users based on their user profiles and past behavior)

Algorithms and recommendation systems are designed to suggest content that a user is likely to interact with, on the basis of their browsing behaviour and interaction on the platform. These algorithms are constantly updated to be more accurate. Popular examples include Instagram and YouTube. It is interesting to note that these systems have been documented to often suggest radical content to users, and upon user-interaction with such content, continuously amplify them. YouTube’s algorithm, for instance, has been previously accused of pushing users towards extremist or inflammatory ideologies.

Studying how recommendation algorithms function however, and why certain extremist content are being recommended to users, have been difficult, due to one, the complexity of the current information ecosystem, and two, because of the lack of information around these algorithms. The Santa Clara Principles can, by way of an expansion of scope, look to address the second difficulty, by urging companies to be more transparent with their internal processes.

Sharing of data or open-sourcing algorithms

With due recourse paid to the security and integrity of the platform, we recommend that the code for the algorithm used for recommendations should be open-source and publicly available online. Reddit, for instance, publishes its code for curation of news feeds in an open-source format.

Another way of doing this, as has been studied, is to consider a two-pronged method of sharing data. In the first count, datasets identified as ‘sensitive’, are shared in partnerships with certain institutions, under non-disclosure agreements. In the second count, more non-sensitive data is shared in an anonymized format publicly, and made available for any researcher to access.

This idea, however, must be taken with a few caveats. One, sharing of datasets may not always fulfill the public-facing model of transparency and accountability that the Santa Clara Principles envisage. Two, this might be a particularly onerous obligation for smaller and medium enterprises, and without sufficient economic data, it might be difficult to implement this. And three, any framework adopting this must consider the privacy aspect of such sharing. At this juncture, therefore, we do not recommend this as a compulsory binding obligation that any company adopting the Principles must abide by. Rather, we hope and encourage for more conversations to be held around this concept, so that the aforementioned competing interests are accommodated optimally.

Qualitative transparency

The other mode of ensuring more clarity into the recommendation system should be by asking companies to publish user-facing, clearly accessible policies and explainers that outline how the company uses algorithms to recommend content to users. This can also include creation of a visible list of topics, which the company has chosen ‘not to amplify’ (for instance, topics such as self-harm, eating disorders), and updated regularly.

3. Do you feel that the current Santa Clara Principles provide the correct framework for or could be applied to intermediate restrictions (such as age-gating, adding warnings to content, and adding qualifying information to content). If not, should we seek to include these categories in a revision of the principles or would a separate set of principles to cover these issues be better?

The Santa Clara Principles, as they had been originally envisaged, adhered to the commonly adopted binary of take down/leave up in content moderation, where a piece of unlawful, or problematic content (or an account), was either censored from public view or allowed to continue. However, since then, platforms dealing with user-generated content have resorted to a variety of novel and intermediate techniques to moderate and regulate speech which fall outside the aforementioned binary. With adoption of such steps therefore, it is also important for the Principles to evolve and take into consideration the expanded scope of content moderation. In light of that, we recommend the following steps to be taken in the intermediate areas of regulation:

Adding warnings, qualifying information to content

As mentioned above, in recent past, online intermediaries have resorted to more intermediate restrictions to deal with ‘harmful’ content online. These measures have seen an added boost in light of the Covid-19 outbreak, where there has been a massive increase in misleading information and conspiracy theories online. These measures have included, among others, connecting users who have interacted with misinformation to verified, debunked information and introducing a spectrum of actions based on the degree of harm posed by the content, which includes adding labels, warning, and finally, removal. Such intermediate measures currently are not accommodated within the framework of the Santa Clara Principles, for reasons enumerated above, and going forward, it may become important for the Principles to look at the learnings from these measures and adopt them, wherever appropriate, into the framework.

Additionally, as conversations around the instance of Twitter adding a fact-check to Donald Trump’s tweet show, the application of these intermediate measures are often ad-hoc, since there is often no explanation why certain items receive the moderation treatment, while other, similarly misleading content from same sources, continue to stay online. Accordingly, it is difficult to ascertain the exact reasoning process behind these steps. Therefore, adoption of principles related to measures of adding labels or warnings to information online must also require companies to be transparent with their decision-making processes.

Fact-checking

In recent years, with the proliferation of misinformation on online platforms, several companies have either begun to collaborate with fact-checkers, or deploy their own in-house teams. While these initiatives should be appreciated, it should also be noted that the term ‘fact checking’ assumes a partisan meaning in certain circumstances, including when sources of misinformation themselves offer this service. Accordingly, it becomes important that the fact-checking initiatives adopted by companies adhere to some standards of international best practices, and the decisions made are not riddled with biases, either political or ideological.

The Santa Clara Principles are useful to ascertain the transparency of any fact-checking initiatives, and can be applied across both collaborations between companies and fact-checkers, as well as for in-house fact checking initiatives.

For any manner of collaborations, companies must disclose, in clear terms, the names and identities of the fact-checking organizations that they are teaming up with (this example from Facebook divides this list of names country-wise) and the nature of this collaboration, which must include details of whether the organization stands to any monetary gains, and what is the level of access to the platform and its dashboards given by the company to the fact-checking organization.

For in-house initiatives, the Santa Clara Principles must require companies to disclose information regarding any training programs carried out and the background of the fact-checkers, and this might also include a statement regarding the objectivity and non-partisanship of the initiative.

Lastly, comprehensive information about fact-checking must be presented in a clearly accessible format in the company’s regular transparency reports, which should include data on how many pieces of content got fact-checked in the reporting period, the nature of the content (text, photos, videos, multimedia), the nature of misinformation that was being perpetuated (health, communal etc.), and the number of times the said piece of content was shared before it could be fact-checked.

Age-gating

The Digital Economy Act of 2017, proposed by the UK Government (and since dropped in 2019) serves as an early model of the legislature around the world to regulate the process of putting in place age-restrictions. By the application of that law, any websites offering pornography would have to show a landing page to any user with an UK IP address, which would not go away till the user is able to show that they are over the age of eighteen years. However, the government had left the exact technical method of implementing the age-gate upto the website, which meant that websites were free to adopt any methods they deem fit for verifying age, which might also include facial recognition.

However, learnings from the UK Model, and several other models of attempted age-gating have shown that there are often easy methods of circumvention and the information collected in lieu of implementation of these methods goes on to raise privacy concerns. It is our understanding that the regulation of age-restrictions is currently in a flux, and setting principled guidelines at this stage may not be completely evidence-based. In such light, it is our recommendation that the Santa Clara Principles should not be expanded to include age-gates. Separate consultations and discussions on the merits of the various forms of age-gating should precede any principles in this subject.

4. How have you used the Santa Clara Principles as an advocacy tool or resource in the past? In what ways? If you are comfortable with sharing, please include links to any resources or examples you may have.

In 2019, we developed specific methodologies to analyse information relating to government requests for content takedown and user information, from transparency reports made available by online companies for India. For creating our methodology for government requests for content takedown, we relied significantly on some of the metrics of the Santa Clara Principles, and utilized them to expand our scope of analysis. Our methodology comprised of the following metrics adopted from the Principles:

Numbers: We utilized this metric, and further clarified that the numbers should include a numerical breakdown of the requests received under different laws on content takedown.
Sources: The Santa Clara Principles recommend that the intermediary identify the source of the flagging. Under the intermediary liability regime in India, content takedown requests can be sent by the executive, the courts, or third parties. We accordingly argued that transparency reports must classify the received requests into these three categories.
Notice: We also utilized this metric for our methodology.

The full version of our methodology and the results from our analysis can be found here.

5. How can the Santa Clara Principles be more useful in your advocacy around these issues going forward?

We intend to apply this methodology for future editions of the report as well, and build up a considerable body of work on transparency reporting practices in the Indian context.

6. Do you think that the Santa Clara Principles should apply to the moderation of advertisements, in addition to the moderation of unpaid user-generated content? If so, do you think that all or only some of them should apply?

Moderation of advertisements in the recent years have become an interesting point of contention, be it advertisements that violate the companies policies on disruptive ads policies, or advertisements with more nefarious undertones, including racist language and associations to Nazi symbols.

Several companies already have various moderation policies for these kinds of harmful advertisements and other content that advertisers can promote, and these are often public. Based on this, we think that the Santa Clara Principles can be expanded to include the moderation of advertisements, and the metrics contained within would be applicable across this vertical, wherever appropriate.

7. Is there any part of the Santa Clara Principles which you find unclear or hard to understand?

N/A.

8. Are there any specific risks to human rights which the Santa Clara Principles could better help mitigate by encouraging companies to provide specific additional types of data? (For example, is there a particular type of malicious flagging campaign which would not be visible in the data currently called for by the SCPs, but would be visible were the data to include an additional column.)

N/A.

9. Are there any regional, national, or cultural considerations that are not currently reflected in the Santa Clara Principles, but should be?

While utilizing the Principles for the purposes of our research, we found that the nature of information that some of these online companies make available for users residing in the USA, is very different from the information they make available for users residing in other countries, including in India. For instance, Amazon’s transparency reports regarding government requests for content removal, till the first half of 2018, was restricted only to the US, despite the company having a considerably large presence in India (during our research, Alexa Rank showed Amazon.com to be the 14th most visited website in India).

A public commitment to uphold Santa Clara Principles (as several companies have undertaken, see EFF’s recent Who Has Your Back? report) would mean nothing if these commitments do not extend to all the markets in which the company is operating. Accordingly, we believe that it must be emphasized that the adoption of these Principles into the transparency reporting practices of the company must be consistent across markets, and the information made available should be as uniform as it is legally permissible.

10. Are there considerations for small and medium enterprises that are not currently reflected in the Santa Clara Principles, but should be?

Our understanding at this current juncture is that not enough data exists around the economic costs of setting up the transparency and accountability structures. Accordingly, at the end of this Consultation period, should the Principles be expanded to include more intermediate restrictions and develop accountability structures around algorithmic use, we recommend that a separate consultation be held with small and medium enterprises to identify a) whether or not there would be any economic costs of adoption and how best the Principles can accommodate them, and b) what are the basic minimum guidelines that these enterprises would be able to adopt as a starting point.

11. What recommendations do you have to ensure that the Santa Clara Principles remain viable, feasible, and relevant in the long term?

Given the dynamic nature of developments in the realm of content moderation, periodical consultations, in the vein of the current one, would ensure that the stakeholders are able to raise novel issues at the end of each period, allow the Principles to take stock of the same, and incorporate changes to that effect. We believe that this would allow for the Principles to continue to be aware of the realities of content moderation, and allow for evidence-based policy-making.

12. Who would you recommend to take part in further consultation about the Santa Clara Principles? If possible, please share their names and email addresses.

N/A.

13. If the Santa Clara Principles were to call for a disclosure about the training or cultural background of the content moderators employed by a platform, what would you want the platforms to say in that disclosure? (For example: Disclosing what percentage of the moderators had passed a language test for the language(s) they were moderating or disclosing that all moderators had gone through a specific type of training.)

By now, there have been well documented accounts of human moderators, by independent investigations or admissions by companies. For instance, this blogpost authored in 2018 by Mark Zuckerberg documented the percentage of human moderators who were trained in the Burmese language, in reference to moderating content on the platform in Myanmar. Comprehensive information about linguistic and cultural backgrounds of human moderators is a useful tool to contextualize the decisions made by the platform, and also useful in pushing more effective reforms.

Additionally, it has also been seen that a company’s public facing moderation norms often differ from its internal guidelines, which are shared with its team of human moderators. For instance, TikTok’s internal norms had asked its moderators to ‘suppress’ content from users perceived to be ‘poor’ and ‘ugly’. The gaps in these norms means that there are surreptitious forms of censorship behind-the-scenes, and it is difficult to ascertain the reasonableness and appropriateness of these decisions.

We would also like to emphasize more stringent disclosure requirements from companies regarding the nature of engagement with which they employ their human moderators. As investigations have revealed, the task of human moderation is often outsourced by these companies to third-party firms, and the working conditions in which the moderators make their decisions are inhospitable. Additionally, more often than not, there are no publicly available methods to ascertain whether the company in question is doing enough to ensure the well-being and safety of these moderators.

Therefore, alongside disclosure regarding the nature of training given to the human moderators and their internal moderation norms, we also recommend that the Principles recognize certain fundamental ethical guidelines with relation to their human moderators that companies must adopt. This might include providing identifying information of the third-party firms to which the company outsources its moderation and assurances of sufficient number of counsellors for the moderators.

14. Do you have any additional suggestions?

While the Santa Clara Principles provide a granular and robust framework of reporting, currently it stands to only cover aspects of quantitative transparency - concerning numbers and items. As we have indicated throughout this submission, and in our previous research, there are also need for companies to adhere to more norms focussing on qualitative transparency - in the form of material disclosure of the policies, processes and structures they associate with, or make use of. Aside from the suggestions in the previous sections, in this section we highlight two additional recommendations that we think can help achieve this.

Material regarding local laws

One of our preliminary findings regarding the way these intermediaries report data for other regions (including India) has been that most of the time, the information is incomplete, especially with regards to material regarding the local laws. Compared to the US, for which most of these companies dedicate separate sections, other regions feature relatively fewer times in their reports. Each country in which the company functions, there would be various laws governing content removal, different authorities empowered to issue orders, and varied procedural and substantive requirements of a valid request. For the empowerment of users, we believe that the exact metrics and requirements of these laws must be presented by the intermediaries, in a clear and readable format.

Accessibility of policies

On the topic of empowerment of users, we also believe that the basic information and policies regarding these requests should be placed at one place, for maximum accessibility by users. During our research, we discovered that the disclosures made in lieu of the Principles were spread over different policies, some of which were not easily accessible. While it is not possible at this juncture to predict a comprehensively objective way of making all this information accessible, we believe it would be a useful step if the basic information regarding the intermediary's transparency reporting policies were presented in the same manner as the company's Terms and Services and Privacy Policy. Additionally, we believe that these disclosures should be translated into major languages in which the company operates, for further accessibility.

15. Have current events like COVID-19 increased your awareness of specific transparency and accountability needs, or of shortcomings of the Santa Clara Principles?

The Covid-19 pandemic proves to be a watershed moment for the history of the internet, inasmuch in the manner of proliferation of various forms of misinformation and conspiracy theories, as well as the way in which companies have stepped up to remove said content from their platforms. This has included companies like Google, Twitter and Facebook, who have sought to increasingly rely on automated tools for rapid moderation of harmful content related to the pandemic.

These practices reaffirm the need for having strong requirements for transparency disclosures, both qualitative and quantitative, especially around the use of automated tools for content takedown. This is because of two main reasons.

One, the speed of removal would never tell us anything about the accuracy of the measure. A platform can say that in one reporting period, it took down 1000 pieces of content; this would not mean that its actions were always accurate, or fair or reasonable, since there is no publicly available information to ascertain so. This phenomenon, aggregated with the heightened pressure to remove misinformation related to the pandemic, may contribute to firstly, erroneous removals (as YouTube has warned in blogs), and secondly, towards deepening the information asymmetry regarding accurate data around removals.

Two, given the novel and diverse forms of misleading information related to the pandemic, this offers a critical time to study the relation between online information and the outcomes of a public health crisis. However, these efforts would be thwarted if reliable information around removals relating to the pandemic continue to be unavailable.

For more details visit https://cis-india.org/internet-governance/blog/response-to-the-2018call-for-submissions2019-on-the-santa-clara-principles-on-transparency-and-accountability

Brindaalakshmi.K - Gendering of Development Data in India - Beyond the Binary #4

sumandro — 2020-06-30T10:34:03Z

For more details visit https://cis-india.org/raw/brindaalakshmi-k-gendering-of-development-data-in-india-beyond-the-binary-4

Brindaalakshmi.K - Gendering of Development Data in India - Beyond the Binary #3

sumandro — 2020-06-30T09:48:48Z

For more details visit https://cis-india.org/raw/brindaalakshmi-k-gendering-of-development-data-in-india-beyond-the-binary-3