Category: Digital History

Ten Most Frequent Letter Writers in the Kew Garden Directors’ Correspondence

I’ve been informed that my original download missed a lot of the files. I’m going to recreated the two graphs below over the next few days with the missing data and rework this post.

top10

I’m working with Bea Alex on a blog post for the Kew Garden Directors’ Correspondence project. They shared their meta data collection with Trading Consequences and Bea reformatted it into a directory of 7438 xlm files (one for every letter digitized to date by the project). The metadata includes all the information found on the individual letter webpages (sample). Bea and the rest of the team in Edinburgh focused on extracting commodity-place relationships from the description field. We’re currently working with the data for coffee, cinchona, rubber, and palm to create an animated GIS time-map for the blog post we are writing. However, because this is one of the smallest collections we are processing in the Trading Consequences project, I decided to try and play around with the data a little more.

XML files are pretty ubiquitous once you start working with large data sets. They are generally easier to read and more portable than standard relational databases and presumably have numerous other advantages. The syntax is familiar if you know HTML, but I’ve still found it challenging to learn how to pull information out of these files. As with most things, coding in Mathematica, instead of Python, makes it easier. It turned out to be relatively straight forward to import all 7438 xml files, have Mathmatica recognize the pattern of the “Creator” field and pull out a list of all of the letter authors. From there, it was easy to tally up the duplicates, sort them in order of frequency (borrowing a bit of code from Bill Turkel) and graph the top ten (of the 1689 total authors).

How to Build a Macroscope

Timothy Bristow, a digital humanities librarian and Trading Consequences team member, and I are hosting a one day workshop on text mining in the humanities in the library at York University:

A macroscope is designed to capture the bigger picture, to render visible vastly complex systems. Large-scale text mining offers researchers the promise of such perspective, while posing distinct challenges around data access, licensing, dissemination, and preservation, digital infrastructure, project management, and project costs. Join our panel of researchers, librarians, and technologists as they discuss not only the operational demands of text mining the humanities, but also how Ontario institutions can better support this work. Read More

Trading Consequences’ First Year

Co-Authored with Beatrice Alex

Trading Consequences is a Digging Into Data funded collaboration between commodity historians, computational linguists, computer scientists and librarians. We have been working for a year to develop a system that will text mine more than two million pages of digitized historical documents to extract relevant information about the nineteenth-century commodity trade. We are particularly interested in identifying some new environmental consequences of the growing quantity of natural resources imported into Britain during the century.

During our first year we’ve gathered the digitized text data from a number of vendors, honed our key historical questions, created a list of more than four hundred commodities imported into Britain, and developed an early working prototype. In the process we’ve learned a lot about each others’ disciplines, making it increasingly possible for historians, computational linguists, and visualization experts to discuss and solve research challenges.

Our initial prototype has limited functionality and focuses on a smaller same of our corpus of documents. In the months ahead it will then become increasingly powerful and populated with more and more data. Late last year, we completed the first prototype. Here’s a picture of the overall architecture:

GIS and Time

[This is my first post for The Otter since I passed on the editorial duties to Josh MacFadyen in the summer]

One of the major weaknesses in using GIS for historical research are the limitations in showing change over time. GIS was designed with geography in mind and until recently historians needed to adapt the technology to meet our needs. Generally this meant creating a series of maps to show change overtime or as Dan MacFarlane did last week, include labels identifying how different layers represent different time periods. More recently, ArcGIS and Quantum GIS introduced features to recognize a time field in data and make it possible to include a time-line slider bar or animate the time series data in a video.


UK Tallow Imports, 1865-1904 from Jim Clifford on Vimeo.

Python and the Natural Language Toolkit

A graph showing the use of different words in a corpus of the U.S. Presidential Inaugural Addresses.

Since the end of the academic year, I’ve been able to focus a lot more attention on my post-doc research. This included a research trip in London archives and a week long course on databases at the Digital Humanities Institute in Victoria. Now I’ve started focusing on learning a programing language called Python. In the short-term, I don’t need to learn advanced computer skills for the Trading Consequences, as we have a team of highly skilled computer and linguistic experts. However, I do need a basic understanding of what we are actually doing when we text mine historical documents and looking forward to the end of the grant, I would like to be able to continued to work with the database. I would also like to develop the skills to continue this kind of research on my own in the future.

I always intended to start with the Programming Historian, but the new version will not come out for another few weeks, so instead, I began working through Learning Python the Hard Way over a few days in May. This was interesting, but solely focused on teaching programing and not particularly connected to the kind of research I would like to do. A few days ago I took a closer look at Natural Language Processing with Python, written by Steven Bird, Edward Loper and one of Trading Consequences team members, Ewan Klein. Reading the preface, it became clear the book was accessible people with no background in programming. The early chapters included both an introduction to computational linguistics and Python.

Environmental History Mobile App

By Sean Kheraj and Jim Clifford

The Environmental History Mobile application has now landed in the Apple App Store! iPhone, iPod Touch, and iPad users can now download and install this new mobile application on their devices. EH Mobile provides users with a single portal to connect with a wide range of global environmental history content on the internet. The app aggregates news, announcements, H-Environment messages, blogs, podcasts, the #envhist Twitter tag, and even Environmental History, the journal. This is a new way to connect with the environmental history community.

To download this app, simply search “Environmental History Mobile” in the Apple App Store or follow this link.