Centre for Internet & Society
Reading from a Distance — Data as Text

Library Services Tag Wordle (by Carol Hartmann, CC-BY-SA 3.0 License)

The advent of new digital technologies and the internet has redefined practices of reading and writing, and the notion of textuality which is a fundamental aspect of humanities research and scholarship. This blog post looks at some of the debates around the notion of text as object, method and practice, to understand how it has changed in the digital context.


The concepts of text and textuality have been central to the discourse on language and culture, and therefore by extension to most of the humanities disciplines, which are often referred to as text-based disciplines. The advent of new digital and multimedia technologies and the internet has brought about definitive changes in the ways in which we see and interpret texts today, particularly as manifested in new practices of reading and writing facilitated by these tools and dynamic interfaces now available in the age of the digital. The ‘text’ as an object of enquiry is also central to much of the discussion and literature on Digital Humanities, given that many scholars, particularly in the West trace its antecedents to practices of textual criticism and scholarship that stem from efforts in humanities computing. Everything from the early attempts in character and text encoding (see TEI) to new forms and methods of digital literary curation, either on large online archives or in the form of apps such as Storify or Scoop it have been part of the development of this discourse on the text. Significant among these is the emergence of processes such as text analysis, data mining, distant reading, and not-reading, all of which essentially refer to a process of reading by recognising patterns over a large corpus of texts, often with the help of a clustering algorithm[1]. The implications of this for literary scholarship are manifold, with many scholars seeing this as a point of ‘crisis’ for the traditional practices of reading and meaning-making such as close reading, or an attempt to introduce objectivity and a certain quantitative aspect, often construed as a form of scientism, into what is essentially a domain of interpretation. But an equal number of advocates of the process also see the use of these tools as enabling newer forms of literary scholarship by enhancing the ability to work with and across a wide range and number of texts. The simultaneous emergence of new kinds of digital objects, and a plethora of them, and the supposed obscuring of traditional methods in the process is perhaps the immediate source of this perceived discomfort. There are different perspectives on the nature of changes this has led to in understanding a concept that is elementary to the humanities. Apart from the fact that digitisation makes a large corpus of texts now accessible, subject to certain conditions of access of course, it also makes texts ‘ massively addressable at different levels of scale’ as suggested by Micheal Witmore. According to him “Addressable here means that one can query a position within the text at a certain level of abstraction”. This could be at the level of character, words, lines etc that may then be related to other texts at the same level of abstraction. The idea that the text itself is an aggregation of such ‘computational objects’ is new, but as Witmore points out in his essay, it is the nature of this computational object that requires further explanation. In fact, as he concludes in the essay, “textuality is addressability” and further...this is a condition, rather than a technology, action or event”. What this points towards is the rather flexible and somewhat ephemeral nature of the text itself, particularly the digital text, and the need to move out of a notion of textuality which has been shaped so far by the conventions of book culture, which look to ideal manifestations in provisional unities such as the book.[2]

The notion of the text itself as an object of enquiry has undergone significant change. Various disciplines have for long engaged with the text - as a concept, method or discursive space - and its definitions have changed over time that have added dimensions to ways of doing the humanities. With every turn in literary and cultural criticism in particular, the primacy of the written word as text has been challenged, what is understood as ‘textual’ in a very narrow sense has moved to the visual and other kinds of objects. The digital object presents a new kind of text that is difficult to grasp - the neat segregations of form, content, process etc seem to blur here, and there is a need to unravel these layers to understand its textuality. As Dr. Madhuja Mukherjee, with the Department of Film Studies, at Jadavpur University points out, with the opening up of the digital field, there are more possibilities to record, upload and circulate, as a result of which the very object of study has changed; the text as an object therefore has become very unstable, more so that it already is. Film is an example, where often DVDs of old films no longer exist, so one approaches the ‘text’ through other objects such as posters or found footage. Such texts also available through several online archives now offer possibilities of building layers of meaning through annotations and referencing. Another example she cites is of the Indian Memory project, where objects such as family photographs become available for study as texts for historiography or ethnographic work. She points out that this is not a new phenomenon, as the disciplines of literary and cultural studies, critical theory and history have explored and provided a base for these questions, but there is definitely a new found interest now due the increasing prevalence of digital methods and spaces. One example of such a digital text perhaps is the hypertext[3]. George Landow in his book on hypertext draws upon both Barthes and Foucault’s conceptualisation of textuality in terms of nodes, links, networks, web and path, which has been posited in some sense as the ideal text. Landow’s analysis emphasises the multilinearity of the text, in terms of its lack of a centre, and therefore the reader being able to organise the text according to his own organising principle - possibilities that hypertext now offers which the printed book could not. While hypertext illustrates the post-structural notion of what comprises an open text as it were, it may still be linear in terms of embodying certain ideological notions which shape its ultimate form. Hypertext, while in a pragmatic sense being the text of the digital is still at the end of a process of signification or meaning-making, often defined within the parameters set by print culture.

But to return to what has been one of the fundamental notions of textual criticism, the ‘text’ is manifested through practices of reading and writing [4]. So what have been the implications of digital technologies for these processes which have now become technologised, and by extension for our understanding of the text? While processes such as distant reading and not-reading demonstrate precisely the variability of meaning-making processes and the fluid nature of textuality, they also seem to question the premise of the method and form of criticism itself. Franco Moretti, his book Graphs, Maps and Trees talks about the possibilities accorded by clustering algorithms and pattern recognition as a means to wade through corpora, thus attempting to create what he calls an ‘abstract model of literary history’. He describes this approach as ‘within the old territory of literary history, a new object of study’...He further says, “Distant reading, I have once called this type of approach, where distance is however not an obstacle, but a specific kind of knowledge: fewer elements, hence a sharper sense of their overall interconnection. Shapes, relations, structures. Forms. Models.” The emphasis for Moretti therefore is on the method of reading or meaning-making. There seem to be two questions that emerge from this perceived shift - one is the availability of the data and tools that can ‘facilitate’ this kind of reading, and the second is a change in the nature of the object of enquiry itself, so much so that close reading or textual analysis is not engaging or adequate any longer and calls for other methods. An example much closer home of such new forms of textual criticism is that of ‘ Bichitra’, an online variorum of Rabindranath Tagore’s works developed by the School of Cultural Texts and Records at Jadavpur University. The traditional variorum in itself is a work of textual criticism, where all the editions of the work of an author are collated as a corpus to trace the changes and revisions made over a period of time. The Tagore varioum, while making available an exhaustive resource on the author’s work, also offers a collation tool that helps trace such variations across different editions of works, but with much less effort otherwise needed in manually reading through these texts. Like paper variorum editions, this online archive too allows for study of a wider number and diversity of texts on a single author through cross-referencing and collation.

As is apparent in the development of new kinds of tools and resources to facilitate reading, there is a problem of abundance that follows once the problem of access has been addressed to some extent. Clustering algorithms have been used to generate and process data in different contexts, apart from their usage in statistical data analysis. The role of data is pertinent here; and particularly that of big data. But the understanding of big data is still shrouded within the conventions of computational practice, so much so that its social aspects are only slowly being explored now, particularly in the context of reading practices. Big data as understood in the field of computing is data that is so vast or complex that it cannot be processed by existing database management tools or processing applications[5]. But if one were to treat data as text, as is an eventual possibility with literary criticism that uses computational methods, what becomes of the critical ability to decode the text - and does this further change the nature of the text itself as a discursive object, and the practice of reading and textual criticism as a result. Reading data as text then also presupposes a different kind of reader, one that is no longer the human subject. This would be a significant move in understanding how the processes of textuality also change to address new modes of content generation, and how much the contours of such textuality reflect the changes in the discursive practices that construct it. Most of the debate however has been framed within a narrative of loss - of criticality and a particular method of making meaning of the world. Close reading as a method too came with its own set of problems - which can be seen as part of a larger critique of the Formalists and later American New Criticism, specifically in terms of its focus on the text. As such, this further contributes to canonising a certain kind of text and thereby a form of cultural and literary production. [6] Distant reading as a method, though also seen as an attempt to address this problem by including corpora, still poses the same issues in terms of its approach, particularly as the text still serves as the primary and authoritative object of study. The emphasis therefore comes back to reading as a critical and discursive practice. The objects and tools are new; the skills to use them need to be developed. However, as much of the literature and processes demonstrate, the critical skills essentially remain the same, but now function at a meta-level of abstraction. Kathleen Fitzpatrick in her book on the rise of electronic publishing and planned technological obsolescence dwells on the manner in which much of our reading practice is still located in print or specifically book culture; the conflict arises with the shift to a digital process and interface, in terms of trying to replicate the experience of reading on paper. Add to this problem of abundance of data, and processes like curation, annotation, referencing, visualisation, abstraction etc acquire increased valence as methods of creatively reading or making meaning of content. [7]

Whether as object, method or practice, the notion of textua­­lity and the practice of the reading have undergone significant changes in the digital context, but whether this is a new domain of enquiry is a question one may ask. Matthew G. Kirschenbaum in his essay on re-making reading suggests that perhaps the function of these clustering algorithms, apart from serving to supplant or reiterate what we already know is to also ‘provoke’ new ideas or questions. This is an interesting use of the term, given that the suggestion to use quantitative methods such as clustering and pattern recognition in fields that are premised on close reading and interpretation is itself a provocative one and has implications for content. The conflict produced between close and distant reading, the shift from print to digital interfaces would therefore emerge as a space for new questions around the given notion of text and textuality. But if one were to extend that thought, it may be pertinent to ask if the Digital Humanities can now provide us with a vibrant field that will help produce a better and more nuanced understanding of the notion of the text itself as an object of enquiry. This would require one to work with and in some sense against the body of meaning already generated around the text, but in essence the very conflict may be where the epistemological questions about the field are located.


  1. Fitzpatrick, Kathleen, “Texts”, Planned Obsolescence – Publishing, Technology and Future of the Academy, New York and London: New York University Press, 2011. pp.89 – 119.
  2. Kirschenbaum, M.G, “The Remaking of Reading: Data Mining and the Digital Humanities”, Conference proceedings; National Science Foundation Symposium on Next Generation of Data Mining and Cyber-Enabled Discovery for Innovation, Balitmore, October 10-12, 2007, http://www. cs. umbc. edu/hillol/NGDM07/abstracts/talks/MKirschenbaum. pdf.
  3. Landow, George. P, Hypertext: The Convergence of Critical Theory and Technology, Balitmore: John Hopkins University Press, 1992 pp 2-12
  4. Moretti, Franco, Graphs, Maps and Trees: Abstract Models for a Literary History, Verso: London and New York, 2005. p.1
  5. Whitmore, Michael , “Text: A Massively Addressable Object”, Debates in the Digital Humanities, ed. Mathew K. Gold, University of Minnesota Press: 2012 pp 324 – 327 http://dhdebates.gc.cuny.edu/debates/text/24
  6. Wilkens, Mathew, “Canons,Close Reading and the Evolution of Method” Debates in the Digital Humanities, ed. Mathew K. Gold, University of Minnesota Press: 2012 pp 324 – 327 http://dhdebates.gc.cuny.edu/debates/text/24

[1] For more on cluster analysis and algorithms see http://en.wikipedia.org/wiki/Cluster_analysis

[2] See Witmore, 2012. pp 324 - 327

[3] A term coined by Theodor H. Nelson, which he describes as “a series of text chunks connected by links which offer the reader different pathways” ( As quoted in Landow, 1991. pp 2-12)

[4] Barthes, 1977. pp 155 - 164

[5] See http://en.wikipedia.org/wiki/Big_data

[6] See Wilkens (2012). pp 249-252

[7] See Fitzpatrick (2011), pp 89 -119

The views and opinions expressed on this page are those of their individual authors. Unless the opposite is explicitly stated, or unless the opposite may be reasonably inferred, CIS does not subscribe to these views and opinions which belong to their individual authors. CIS does not accept any responsibility, legal or otherwise, for the views and opinions of these individual authors. For an official statement from CIS on a particular issue, please contact us directly.