Matthew J. Lavin

Recent Blog Posts

Subscribe to this feed

Handling hyphens

Posted on Tuesday November 17, 2015

A quick blog post on hyphens. For computational linguists and other scholars working with text computationally, tokenization is a vital step. We want the computer program to recognize each token as a separate entity. For some linguists, “Apple” and “apple are two different terms. Likewise, each punctuation mark is a token unto itself. This is […]

Read More

Extract ‘mailto:’ hyperlinks from a .docx file: two methods

Posted on Tuesday September 01, 2015

Have you ever had a Word document full of hyperlinked text–say, names with links to e-mail addresses embedded in them? If so, you may have wondered how to extract those e-mail addresses without going through one by one, right-clicking, and copy/pasting the addresses to a new document. Turns out, there are several ways to do […]

Read More

Ranking Authorship Attribution with the XBox Rating Script

Posted on Friday February 13, 2015

Head-to-head ratings and match-ups are an interesting topic in computer science, especially when it comes to designing games play interaction. I know this because there’s a scene in “The Social Network” where Mark Zuckerberg uses a head-to-head ranking algorithm to to make a computer program that rates Harvard women’s physical appearance. The math is a […]

Read More

About Me

As you may have gathered from the rest of this site, my name is Matthew Lavin. I'm the Associate Program Coordinator of an initiative at St. Lawrence University in Canton, NY titled “Crossing Boundaries: Re-envisioning the Humanities for the 21st Century.”

Read More


My scholarly interests include American authorship, book technologies, book history and digital broadly, computational methods and humanities data, open access and copyright, and digital pedagogy.

Read More


My teaching interests include American literature courses with strong book historical and digital humanities themes. In the semesters to come, I'd like to teach general courses on technologies of the book and/or the digital age. I've also taught antebellum and postbellum American literature surveys, Native American literature, introduction to rhetoric, and literature for non-majors.

Read More


A collection of links related to coding, digital humanities practice, the humanities in the 21st-century, and more.

Read More


In this section of my site, I plan to post open source datasets associated with my scholarly work. In particular, I'm eager to share information about my ongoing efforts to develop scalable way to convert and cobble together useful book historical data in digital form.

Read More