Centre for Internet & Society

On 19th and 20th January, HasGeek organized a hacknight to commemorate the life and works of Aaron Swartz. Zainab Bawa from HasGeek shares with us the developments.

Why host an Aaron Swartz memorial hacknight? In the aftermath of Aaron’s death, some people began expressing doubts, uncertanties and misinformed opinions about his activist causes. They questioned whether Aaron committed a ’crime’ by downloading articles from JSTOR and whether the means he used for liberating data were wrong in the first place. It was important to dispel these doubts and provide people with a better understanding about issues such as IT laws, copyright rules and access to information, and how these are implemented in different parts of the world.

Aaron had initiated several coding projects during his lifetime. Anand Chitipothu, who collaborated with Aaron at the Internet Archive and maintains his web.py framework, suggested that the hacknight could also be an opportunity where people get familiar with Aaron’s coding projects and work on some of them.

The hacknight: 87 people registered for the hacknight. Approximately 40 people turned up. Some participants proposed projects to liberate different kinds of public data such as electoral dataweather data, information about train timetables and crawling data from government and NIC websites. Developers worked on these projects to make the data searchable and usable.

Discussions during the hacknight: The hacknight started at 3 PM with a discussion about the life of Aaron Swartz and the political and legal implications of his coding projects and activism.

This discussion was led by Anand and Kiran Jonnalagadda of HasGeek.

Kiran gave an elaborate background about Aaron’s life starting with how he established RSS 1.0 as a standard and the collaboration between Aaron and Lawrence Lessig on using the RDF format for Creative Commons licensing, leading to Aaron’s work with Reddit and its acquisition by Condé Nast. Shortly after Reddit’s acquisition, Aaron left Reddit and began a career in activism. In this period, he started freeing data funded by public money which constitutionally belonged in the public domain. He published data from the catalogue of the Library of Congress and the US case law archives on the Internet Archive. Later, Aaron downloaded articles from JSTOR to release academic papers whose research was funded with public money. Before he could sift through the downloads, Aaron was caught by the police. He returned the hard disk containing the downloads. JSTOR and MIT did not pursue cases against him, but the United States government charged Aaron for breaking into the MIT campus and faking identity by changing the MAC address of his computer.

At the end of Kiran’s presentation, participants asked several questions about activism, what constitutes offensive speech, framework of IT laws in India, and the process of law-making.

At 5 PM, Sunil Abraham of the Centre for Internet and Society (CIS) joined the hacknight. He made a presentation about copyright laws, the Indian IT Act and Aaron’s work.

Sunil explained how Aaron believed in the importance of access to information by releasing data from copyright and thereby enabling freedom of expression. According to Sunil, Aaron Swartz is a very troublesome hero because his data liberation projects do not fall into one neat category. Moreover, the means he used for his activism are questioned by different activist groups. This makes it difficult to pinpoint exactly what one must credit Aaron for and what category of activism his work falls under.

After Sunil’s presentation, there was a half hour discussion about the scope of copyright laws in India, copyright exemptions and what constitutes copyright infringement. Participants agreed that the trouble lies with the broad interpretations of copyright and IT laws. This enables the state and private parties to target and harass a person, often on frivolous grounds.

Discussion about hacknight projects: At 6 PM, participants with project ideas and those who wanted to join projects gathered in the garden. Over tea and snacks, groups / pairs were formed. Participants reported two difficulties here:

  1. There weren’t enough projects to choose from i.e., fewer problems to solve
  2. Not everyone who proposed projects could break the problem down into tasks for individual team members to work on.

This affected participants’ motivation to stay through the night.

Web.py workshop: After the tea break, Anand conducted a workshop on web.py.

Some participants came to the hacknight mainly to attend this workshop. The code used in this workshop is available on github.com/anandology/webpy-workshop.

Anand also worked on the database module of web.py to decouple it and make it into a separate python module. This project requires more work before it is completed. The code is available at: http://github.com/anandology/sqlpy

Projects at the hacknight: A complete list of projects that participants worked on during the hacknight are available on the hacknight website. We talked with some of the teams and individual participants to understand their projects, the process they followed for solving the problems, and outcomes at the end of the hacknight.

Liberating electoral data: Arun Raghavan, an open source enthusiast, and four other participants (Arun K, Praveen, Mikul and Sumant) worked on scraping electorial data from http://ceokarnataka.kar.nic.in/. They planned to build a frontend which will make it easy for users to search their names and polling booth information. Currently, the electoral roll is published as a PDF document for each polling station along with a search form (which is unreliable and fails often) for individuals to find their names on the roll and the location of their polling station.

It was difficult to parse the data because the PDFs were not designed for machine readability. Hence, the team had to spend time understanding how to extract the text. The other problem was that the person’s name was written above the father’s name, but if the person’s name was very long, it overlapped the father’s name. This made it difficult to determine where the person’s name ended and where the father’s name began. The team managed to come up with a heuristic to distinguish between the person’s name and father’s name based on slight differences in the way the text was printed on each sheet.

Arun Raghavan and other team members used Python to parse data from the PDFs. They also tried extracting data by using the search form and saving results whenever it returned them (since it failed often). The search form required a JavaScript submit, so Praveen Kumar and Arun K learned to use casper.js to emulate a browser and extract data. Praveen also used casper.js to liberate his friend Aram Bhusal’s blog from Sulekha.com. Aram made a presentation about this at the January edition of the Bangalore JS meet.

At the end of the hacknight, the group almost managed to get a dump of an entire electoral roll. The project repositories:

  1. https://github.com/arunk/ceoscraper
  2. https://github.com/ford-prefect/ceo-kar-roll-scraper

Other data liberation projects:

  1. Indexing Government websites by category of information: Elvis D’souza worked on crawling government websites and indexing them by category, for e.g., education, import-export trade, science and technology, etc. According to him, government websites contain lots of information including documents and spreadsheets. At the hacknight, Elvis completed the indexing process and ran some statistics about information contained in these websites. He eventually wants to build a portal where people can access this index and the documents.
  2. Railway timetable data: Anand scraped data from the IRCTC website. Supreeth Srinivasmurthy worked with this data to plot a map. Bibhas Debnath also worked on the timetable data to build an API. A demo of this API is yet to be released.

  3. Parsing weather data: Asok Padda converted weather data from HTML format to Excel sheets. Hourly weather data for all weather stations in India during 2012 is parsed and uploaded to Internet Archive: http://archive.org/details/www.imdaws.com-2012

  4. Other projects: Kashyap Kondamundi started building an app which will help people to calculate the current values of their mutual funds. He built 70% of this app at the hacknight.

HasGeek has requested participants to post updates about their projects and share links to their code.

Overall achievements from the hacknight: Participants reported the following outcomes from the hacknight:

  1. Learning about new libraries and their applications
  2. Awareness about IT laws and copyright frameworks in India
  3. Opportunity to meet and network with other coders who have an interest in data-related projects or working on new project ideas.

Participants appreciated Anand’s presence as a mentor during the hacknight. He interacted with the teams and helped them when they were stuck with their projects, either with his expertise in Python or by suggesting alternative ways of approaching the problem.

HasGeek thanks CIS for sponsoring the venue and providing logistical support during the hacknight.

Filed under: ,
The views and opinions expressed on this page are those of their individual authors. Unless the opposite is explicitly stated, or unless the opposite may be reasonably inferred, CIS does not subscribe to these views and opinions which belong to their individual authors. CIS does not accept any responsibility, legal or otherwise, for the views and opinions of these individual authors. For an official statement from CIS on a particular issue, please contact us directly.