Reflections on History 5702

data science
open source

Martin Monkman



Some random ideas about data analysis methods.

Three Carleton University students taking their canoe out of the Rideau Canal.

Carleton University, Archives & Special Collections

I had the pleasure of speaking to the History 5702W graduate student seminar at Carleton University in Ottawa this morning (2016-01-25). Led by Dr. Shawn Graham, the seminar is on “Digital History (or, an introduction to hacking as a way of knowing)”.

My presentation covered three different research projects I’ve been involved with in my professional life at BC Stats, British Columbia’s provincial statistics bureau. My intention was to introduce the students to the fact that the methods they are encountering in their digitial history class are being applied in a variety of contexts outside the academy.

The three topics I spoke about:

Things that came up


This is one of the most useful keyboard shortcuts for those of us who have a slap-dash multi-tasking approach to web browsing.

Data are messy. Data is messy.

In all three of my examples I was able to demonstrate that no matter what you’re working on, the data are going to be messy. Nice tidy data (including digital text records) only exist in the examples in a textbook (and even then, I’m sure we can find plenty of examples where that’s not the case).

The first example was the responses to the open-ended question in the Work Environment Survey, where in one instance a survey respondent had pasted the ASCII version of the Captain Picard facepalm.

In the closing Q&A I also remembered another example of messy data; the under-representation of the eleventh day of the month in the Google Ngram viewer.

“Digital methods change what is even feasible to ask”

The above quote is from the syllabus for History 5702W, and it applies to all three of the examples I drew upon. I cited the quote in the context of the email log request; for any correspondence prior to about 1995, the only logs that were kept were occassional paper copies or manual logs. Now, millions of transactions are being recorded, and the methods available permit completely different questions than was possible. This extends to completely different freedom of information requests, which then pose completely different risks to personal information.

Digital archives

One of my images was drawn from the astounding collection of public domain images at The New York Public Library.

Digitization is applied in some surprising areas

We briefly talked about the virtual digital recreation of medieval manuscripts that have had pages removed (books that are “broken”). This allows scholars to view the unity of the manuscript, rather than having to traipse through the special collections departments of libraries and archives around the world.

For more about recreating broken books, see:

Things I learned &/or realized

Before my presentation, I sat in on the seminar led by one of the students on markdown and pandoc. To that point I’d used R markdown in the context of programming R within the RStudio IDE, but I learned a great deal in the seminar.

Old dog, new tricks

I learned about This blog post is being written in the dillinger platform, and I will push it to my new github blog. If you’re reading this, it worked.

Resistance is to be expected

The shift to open publication / reproducible research / open data holds promise, but because it’s disrputive to the status quo, there is significant resistance. The humanities might be the final frontier for open research, but it’s the set-in-their-ways Borg who need to learn that resistance is futile.

While this is a common discussion topic in the circles I normally lurk, it was refreshing to see it raised in a humanities seminar. And I got to hastily jot down the names of a few key pieces on this topic in the humanities;

And in conclusion

All in all, it was a very thought-provoking couple of hours. Thanks to Shawn Graham for giving me the opportunity, and to the students in History 5702W for a lively discussion.

Originally posted to, 2016-01-25