Deep Packet Inspection: How it Works and its Impact on Privacy
In the last few years, there has been extensive debate and discussion around network neutrality in India. The online campaign in favor of Network Neutrality was led by Savetheinternet.in in India. The campaign was a spectacular success and facilitated sending over a million emails supporting the cause of network neutrality, eventually leading to ban on differential pricing. Following in the footsteps of the Shreya Singhal judgement, the fact that the issue of net neutrality has managed to attract wide public attention is an encouraging sign for a free and open Internet in India. Since the debate has been focused largely on zero rating, other kinds of network practices impacting network neutrality have yet to be comprehensively explored in the Indian context, nor their impact on other values. In this article, the author focuses on network management, in general, and deep packet inspection, in particular and how it impacts the privacy of users.
In the last few years, there has been extensive debate and discussion around network neutrality in India. The online campaign in favor of Network Neutrality was led by Savetheinternet.in in India. The campaign, captured in detail by an article in Mint,  was a spectacular success and facilitated sending over a million emails supporting the cause of network neutrality, eventually leading to ban on differential pricing. Following in the footsteps of the Shreya Singhal judgement, the fact that the issue of net neutrality has managed to attract wide public attention is an encouraging sign for a free and open Internet in India. Since the debate has been focused largely on zero rating, other kinds of network practices impacting network neutrality have yet to be comprehensively explored in the Indian context, nor their impact on other values. In this article, I focus on network management, in general, and deep packet inspection, in particular and how it impacts the privacy of users.
The Internet exists as a network acting as an intermediary between providers of content and it users.  Traditionally, the network did not distinguish between those who provided content and those who were recipients of this service, in fact often, the users also functioned as content providers. The architectural design of the Internet mandated that all content be broken down into data packets which were transmitted through nodes in the network transparently from the source machine to the destination machine. As discussed in detail later, as per the OSI model, the network consists of 7 layers. We will go into each of these layers in detail below, however is important to understand that at the base is the physical layer of cables and wires, while at the top is application layer which contains all the functions that people want to perform on the Internet and the content associated with it. The layers in the middle can be characterised as the protocol layers for the purpose of this discussion. What makes the architecture of the Internet remarkable is that these layers are completely independent of each other, and in most cases, indifferent to the other layers. The protocol layer is what impacts net neutrality. It is this layer which provides the standards for the manner in which the data must flow through the network. The idea was for the it to be as simple and feature free as possible such that it is only concerned with the transmission data as fast as possible ('best efforts principle') while innovations are pushed to the layers above or below it.
This aspect of the Internet's architectural design, which mandates that network features are implemented as the end points only (destination and source machine), i.e. at the application level, is called the 'end to end principle'. This means that the intermediate nodes do not differentiate between the data packets in any way based on source, application or any other feature and are only concerned with transmitting data as fast as possible, thus creating what has been described as a 'dumb' or neutral network.  This feature of the Internet architecture was also considered essential to what Jonathan Zittrain has termed as the 'generative' model of the Internet. Since, the Internet Protocol remains a simple layer incapable of discrimination of any form, it meant that no additional criteria could be established for what kind of application would access the Internet. Thus, the network remained truly open and ensured that the Internet does not privilege or become the preserve of a class of applications, nor does it differentiate between the different kinds of technologies that comprise the physical layer below.
While the above model speaks of a dumb network not differentiating between the data packets that travel through it, in truth, the network operators engage in various kinds of practices that priorities, throttle or discount certain kinds of data packets. In her thesis essay at the Oxford Internet Institute, Alissa Cooper states that traffic management involves three different set of criteria- a) Some subsets of traffic needs to be managed, and arriving at a criteria to identify those subsets the criteria can be based on source, destination, application or users, b) Trigger for the traffic management measure which - could be based upon time of the day, usage threshold or a specific network condition, and c) the traffic treatment put into practice when the trigger is met. The traffic treatment can be of three kinds. The first is Blocking, in which traffic is prevented from being delivered. The second is Prioritization under which identified traffic is sent sooner or later. This is usually done in cases of congestion and one kind of traffic needs to be prioritized. The third kind of treatment is Rate limiting where identified traffic is limited to a defined sending rate. The dumb network does not interfere with an application's operation, nor is it sensitive to the needs of an application, and in this way it treats all information sent over it as equal. In such a network, the content of the packets is not examined, and Internet providers act according to the destination of the data as opposed to any other factor. However, in order to perform traffic management in various circumstances, Deep packet Inspection technology, which does look at the content of data packets is commonly used by service providers.
Deep packet inspection (DPI) enables the examination of the content of a data packets being sent over the Internet. Christopher Parsons explains the header and the payload of a data packet with respect to the OSI model. In order to understand this better, it is more useful to speak of network in terms of the seven layers in the OSI model as opposed to the three layers discussed above.
Under the OSI model, the top layer, the Application Layer is in contact with the software making a data request. For instance, if the activity in question is accessing a webpage, the web-browser makes a request to access a page which is then passed on to the lower layers. The next layer is the Presentation Layer which deals with the format in which the data is presented. This lateral performs encryption and compression of the data. In the above example, this would involve asking for the HTML file. Next comes the Session Layer which initiates, manages and ends communication between the sender and receiver. In the above example, this would involve transmitting and regulating the data of the webpage including its text, images or any other media. These three layers are part of the 'payload' of the data packet.
The next four layers are part of the 'header' of the data packet. It begins with the Transport Layer which collects data from the Payload and creates a connection between the point of origin and the point of receipt, and assembles the packets in the correct order. In terms of accessing a webpage, this involves connecting the requesting computer system with the server hosting the data, and ensuring the data packets are put together in an arrangement which is cohesive when they are received. The next layer is the Data Link Layer. This layer formats the data packets in such a way that that they are compatible with the medium being used for their transmission. The final layer is the Physical Layer which determines the actual media used for transmitting the packets.
The transmission of the data packet occurs between the client and server, and packet inspect occurs through some equipment placed between the client and the server. There are various ways in which packet inspection has been classified and the level of depth that the inspection needs to qualify in order to be categorized as Deep Packet Inspection. We rely on Parson's classification system in this article. According to him, there are three broad categories of packet inspection - shallow, medium and deep.
Shallow packet inspection involves the inspection of the only the header, and usually checking it against a blacklist. The focus in this form of inspection is on the source and destination (IP address and packet;s port number). This form of inspection primarily deals with the Data Link Layer and Network Layer information of the packet. Shallow Packet Inspection is used by firewalls.
Medium Packet Inspection involves equipment existing between computers running the applications and the ISP or Internet gateways. They use application proxies where the header information is inspected against their loaded parse-list and used to look at a specific flows. These kinds of inspections technologies are used to look for specific kinds of traffic flows and take pre-defined actions upon identifying it. In this case, the header and a small part of the payload is also being examined.
Finally, Deep Packet Inspection (DPI) enables networks to examine the origin, destination as well the content of data packets (header and payload). These technologies look for protocol non-compliance, spam, harmful code or any specific kinds of data that the network wants to monitor. The feature of the DPI technology that makes it an important subject of study is the different uses it can be put to. The use cases vary from real time analysis of the packets to interception, storage and analysis of contents of a packets.
Network Management and QoS
The primary justification for DPI presented is network management, and as a means to guarantee and ensure a certain minimum level of QoS (Quality of Service). Quality of Service (QoS) as a value conflicting with the objectives of Network Neutrality, has emerged as a significant discussion point in this topic. Much like network neutrality, QoS is also a term thrown around in vague, general and non-definitive references. The factors that come into play in QoS are network imposed delay, jitter, bandwidth and reliability. Delay, as the name suggests, is the time taken for a packet to be passed by the sender to the receiver. Higher levels of delay are characterized by more data packets held 'in transit' in the network.  A paper by Paul Ferguson and Geoff Huston described the TCP as a 'self clocking' protocol. This enables the transmission rate of the sender to be adjusted as per the rate of reception by the receiver. As the delay and consequent stress on the protocol increases, this feedback ability begins to lose its sensitivity. This becomes most problematic in cases of VoIP and video applications. The idea of QoS generally entails consistent service quality with low delay, low jitter and high reliability through a system of preferential treatment provided to some traffic on a criteria formulated around the need of such traffic to have greater latency sensitivity and low delay and jitter. This is where Deep Packet Inspection comes into play. In 1991, Cisco pioneered the use of a new kind of router that could inspect data packets flowing through the network. DPI is able to look inside the packets and its content, enabling it to classify packets according to a formulated policy. DPI, which was used a security tool, to begin with, is a powerful tool as it allows ISPs to limit or block specific applications or improve performances of applications in telephony, streaming and real-time gaming. Very few scholars believe in an all-or-nothing approach to network neutrality and QoS and debate often comes down to what forms of differentiations are reasonable for service providers to practice. 
Deep Packet inspection was initially intended as a measure to manage the network and protect it from transmitting malicious programs . As mentioned above, Shallow Packet Inspection was used to secure LANs and keep out certain kinds of unwanted traffic.  Similarly, DPI is used for identical purposes, where it is felt useful to enhance security and complete a 'deeper' inspection that also examines the payload along with the header information.
The third purpose of DPI is what concerns privacy theorists the most. The fact that DPI technologies enable the network operators to have access to the actual content of the data packets puts them a position of great power as well as making them susceptible to significant pressure from the state.  For instance, in US, the ISPs are required to conform to the provisions of the Communications Assistance for Law Enforcement Act (CALEA) which means they need to have some surveillance capacities designed into their systems. What is more disturbing for privacy theorists compared to the use of DPI for surveillance under legislation like CALEA, are the other alleged uses by organisation like the National Security Agency through back end access to the information via the ISPs. Aside from the US government, there have been various reports of use of DPI by governments in countries like China, Malaysia and Singapore. 
DPI also enables very granular tracking of the online activities of Internet users. This information is invaluable for the purposes of behavioral targeting of content and advertising. Traditionally, this has been done through cookies and other tracking software. DPI allows new way to do this, so far exercised only through web-based tools to ISPs and their advertising partners. DPI will enable the ISPs to monitor contents of data packets and use this to create profiles of users which can later be employed for purposes such as targeted advertising. 
Each of the above use-cases has significant implications for the privacy of Internet users as the technology in question involves access, tracking or retention of their online communication and usage activity.
Alyssa Cooper compares DPI with other technologies carrying out content inspection such as caching services and individual users employing firewalls or packet sniffers. She argues that one of the most distinguishing feature of DPI is the potential for "mission-creep."  Kevin Werbach writes that while networks may deploy DPI for implementation under CALEA or traffic peer-to-peer shaping, once deployed DPI techniques can be used for completely different purposes such as pattern matching of intercepted content and storage of raw data or conclusions drawn from the data. This scope of mission creep is even more problematic as it is completely invisible. As opposed to other technologies which rely on cookies or other web-based services, the inspection occurs not at the end points, but somewhere in the middle of the network, often without leaving any traces on the user's system, thus rendering them virtually undiscoverable.
Much like other forms of surveillance, DPI threatens the sense that the web is a space where people can engage freely with a wide range of people and services. For such a space to continue to exist, it is important for people to feel secure about their communication and transaction on medium. This notion of trust is severely harmed by a sense that users are being surveilled and their communication intercepted. This has obvious chilling effect on free speech and could also impact electronic commerce.
Allyssa Cooper also points out another way in which DPI differs from other content tracking technologies. As the DPI is deployed by the ISPs, it creates a greater barrier to opting out and choosing another service. There are only limited options available to individuals as far as ISPs are concerned. Christopher Parsons does a review of ISPs using DPI technology in UK, US and Canada and offers that various ISPs do provide in their terms of services that they use DPI for network management purposes. However, this information is often not as easily accessible as the terms and conditions of online services. A;so, As opposed to online services, where it is relatively easier to migrate to another service, due to both presence of more options and the ease of migration, it is a much longer and more difficult process to change one's ISP.
Currently, there are no existing regulatory frameworks in India which deal govern DPI technology in any way. The International Telecommunications Union (ITU) prescribes a standard for DPI however, the standard does not engage with any questions of privacy and requires all DPI technologies to be capable of identifying payload data, and prescribing classification rules for specific applications, thus, conflicting with notions of application agnosticism in network management. More importantly, the requirements to identify, decrypt and analyse tunneled and encrypted data threaten the reasonable expectation of privacy when sending and receiving encrypted communication. In this final section, I look at some possible principles and practices that may be evolved in order to mitigate privacy risks caused due to DPI technology.
Limiting 'depth' and breadth
It has been argued that inherently what DPI technology intends to do is matching of patterns in the inspected content against a pre-defined list which is relevant to the purpose how which DPI is employed. Much like data minimization principles applicable to data controllers and data processors, it is possible for network operators to minimize the depth of the inspection (restrict it to header information only or limited payload information) so as to serve the purpose at hand. For instance, in cases where the ISP is looking to identify peer-to-peer traffic, there are protocols which declare their names in the application header itself. Similarly, a network operators looking to generate usage data about email traffic can do so simply by looking at port number and checking them against common email ports. However, this mitigation strategy may not work well for other use-cases such as blocking malicious software or prohibited content or monitoring for the sake of behavioral advertising.
While depth referred to the degree of inspection within data packets, breadth refers to the volume of packets being inspected. Alyssa Cooper argues that for many DPI use cases, it may be possible to rely on pattern matching on only the first few data packets in a flow, in order to arrive at sufficient data to take appropriate response. Cooper uses the same example about peer-to-peer traffic. In some cases, the protocol name may appear on the header file of only the first packet of a flow between two peers. In such circumstances, the network operators need not look beyond the header files of the first packet in a flow, and can apply the network management rule to the entire flow.
Aside from the depth and breadth of inspection, another important question whether and for along is there a need for data retention. All use cases may not require any kind of data retention and even in case where DPI is used for behavioral advertising, only the conclusions drawn may be retained instead of retaining the payload data.
One of the issues is that DPI technology is developed and deployed outside the purview of standard organizations like ISO. Hence, there has been a lack of open, transparent standards development process in which participants have deliberated the impact of the technology. It is important for DPI to undergo these process which are inclusive, in that there is participation by non-engineering stakeholders to highlight the public policy issues such as privacy. Further, aside from the technology, the practices by networks need to be more transparent.  Disclosure of the presence of DPI, the level of detail being inspected or retained and the purpose for deployment of DPI can be done. Some ISPs provide some of these details in their terms of service and website notices.  However, as opposed to web-based services, users have limited interaction with their ISP. It would be useful for ISPs to enable greater engagement with their users and make their practices more transparent.
The very nature of of the DPI technology renders some aspects of recognized privacy principles like notice and consent obsolete. The current privacy frameworks under FIPP and OECD  rely on the idea of empowering the individual by providing them with knowledge and this knowledge enables them to make informed choices. However, for this liberal conception of privacy to function meaningfully, it is necessary that there are real and genuine choices presented to the alternatives. While some principles like data minimisation, necessity and proportionality and purpose limitation can be instrumental in ensuring that DPI technology is used only for legitimate purposes, however, without effective opt-out mechanisms and limited capacity of individual to assess the risks, the efficacy of privacy principles may be far from satisfactory.
The ongoing Aadhaar case and a host of surveillance projects like CMS, NATGRID, NETRA and NMAC  have raised concerns about the state conducting mass-surveillance, particularly of online content. In this regard, it is all the more important to recognise the potential of Deep Packet Inspection technologies for impact on privacy rights of individuals. Earlier, the Centre for Internet and Society had filed Right to Information applications with the Department of Telecommunications, Government of India regarding the use of DPI, and the government had responded that there was no direction/reference to the ISPs to employ DPI technology.  Similarly, MTNL also responded to the RTI Applications and denied using the technology. It is notable though, that they did not respond to the questions about the traffic management policies they follow. Thus, so far there has been little clarity on actual usage of DPI technology by the ISPs.
 Ashish Mishra, "India's Net Neutrality Crusaders", available at http://mintonsunday.livemint.com/news/indias-net-neutrality-crusaders/2.3.2289565628.html
 Vinton Cerf and Robert Kahn, "A protocol for packet network intercommunication", available at https://www.semanticscholar.org/paper/A-protocol-for-packet-network-intercommunication-Cerf-Kahn/7b2fdcdfeb5ad8a4adf688eb02ce18b2c38fed7a
 Paul Ganley and Ben Algove, "Network Neutrality-A User's Guide", available at http://wiki.commres.org/pds/NetworkNeutrality/NetNeutrality.pdf
 J H Saltzer, D D Clark and D P Reed, "End-to-End arguments in System Design", available at http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf
 Supra Note 4.
 Jonathan Zittrain, The future of Internet - and how to stop it, (Yale University Press and Penguin UK, 2008) available at https://dash.harvard.edu/bitstream/handle/1/4455262/Zittrain_Future%20of%20the%20Internet.pdf?sequence=1
 Alissa Cooper, How Regulation and Competition Influence Discrimination in Broadband Traffic Management: A Comparative Study of Net Neutrality in the United States and the United Kingdom available at http://ora.ox.ac.uk/objects/uuid:757d85af-ec4d-4d8a-86ab-4dec86dab568
 Id .
 Christopher Parsons, "The Politics of Deep Packet Inspection: What Drives Surveillance by Internet Service Providers?", available at https://www.christopher-parsons.com/the-politics-of-deep-packet-inspection-what-drives-surveillance-by-internet-service-providers/ at 15.
 Ibid at 16.
 Id .
 Ibid at 19.
 Id .
 Id .
 Tim Wu, "Network Neutrality: Broadband Discrimination", available at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=388863
 Paul Ferguson and Geoff Huston, "Quality of Service on the Internet: Fact, Fiction,
or Compromise?", available at http://www.potaroo.net/papers/1998-6-qos/qos.pdf
 Barbara van Schewick, "Network Neutrality and Quality of Service: What a non-discrimination Rule should look like", available at http://cyberlaw.stanford.edu/downloads/20120611-NetworkNeutrality.pdf
 Supra Note 14.
 Paul Ohm, "The Rise and Fall of Invasive ISP Surveillance," available at http://paulohm.com/classes/infopriv10/files/ExcerptOhmISPSurveillance.pdf
 Ben Elgin and Bruce Einhorn, "The great firewall of China", available at http://www.bloomberg.com/news/articles/2006-01-22/the-great-firewall-of-china .
 Mike Wheatley, "Malaysia's Web Heavily Censored Before Controversial Elections", available at http://siliconangle.com/blog/2013/05/06/malaysias-web-heavily-censored-before-controversial-elections/
 Alissa Cooper, "Doing the DPI Dance: Assessing the Privacy Impact of Deep Packet Inspection," in W. Aspray and P. Doty (Eds.), Privacy in America: Interdisciplinary Perspectives, Plymouth, UK: Scarecrow Press, 2011 at 151.
 Ibid at 148.
 Kevin Werbach, "Breaking the Ice: Rethinking Telecommunications Law for the Digital Age", Journal of Telecommunications and High Technology, available at http://www.jthtl.org/articles.php?volume=4
 Supra Note 25 at 149.
 Supra Note 25 at 147.
 International Telecommunications Union, Recommendation ITU-T.Y.2770, Requirements for Deep Packet Inspection in next generation networks, available at https://www.itu.int/rec/T-REC-Y.2770-201211-I/en.
 Supra Note 25 at 154.
 Ibid at 156.
 Supra Note 10.
 Paul Ohm, "The Rise and Fall of Invasive ISP Surveillance", available at http://paulohm.com/classes/infopriv10/files/ExcerptOhmISPSurveillance.pdf .
 "India's Surveillance State" Software Freedom Law Centre, available at http://sflc.in/indias-surveillance-state-our-report-on-communications-surveillance-in-india/
 Amber Sinha, "Are we losing our right to privacy and freedom on speech on Indian Internet", DNA, available at http://www.dnaindia.com/scitech/column-are-we-losing-the-right-to-privacy-and-freedom-of-speech-on-indian-internet-2187527
 Smita Mujumdar, "Use of DPI Technology by ISPs - Response by the Department of Telecommunications" available at http://cis-india.org/telecom/dot-response-to-rti-on-use-of-dpi-technology-by-isps