Record Linkage Workshop Report, Part 2

The second half of the workshop was devoted to work in progress from the Digital Panopticon –summaries of which have already appeared (or will soon be appearing) on this blog, so watch this space! As such, I’ll say less about these papers than those from Session 1.

Jamie McLaughlin — ‘How to Disappear Completely: Linking Transportation Records in the Digital Panopticon

Jamie McLaughlin presented some of the insights gained from our recent (and still very early) explorations in linking records of the trial and transportation of convicts in eighteenth- and nineteenth-century London. Uncertainty ‘plagues the records’, and Jamie discussed some of the ways in which we have tried to maximize the quality of the name matches made across the records, such as the use of spelling and date variances, creating control scenarios, and the use of variant lists over general algorithms, all ultimately with an eye on computational performance — an issue which we cannot simply disregard, however much our desire for ‘perfect’ matching techniques. In short, we need to find an optimal, complementary balance of automated and manual work, allowing computers and humans to each do what they’re good at — an ideal strategy reflected in the case of the ‘robot butler’.

Lucy Williams — ‘What’s in a name? Convicts, Context and Multiple Record Linkage

Lucy Williams talked about her recent work in manually checking the automated linkage process undertaken by Jamie, particularly in identifying why good matches have failed to be made. One reason for this is simple name variance — variable spellings of the same surname are notoriously prevalent in eighteenth- and nineteenth-century records. Nor is the data from one record set (such as the Old Bailey Proceedings) carried over consistently to other records. But there is also the problem of “John Smith” — how do we prise apart and correctly link individuals tried at the same session of the Old Bailey who have the same name, spelled in exactly the same way? We can keep adding in information from other sources in order to try and verify these kind of multiple name matches, but that isn’t necessarily always the answer, particularly in terms of automated processes. Adding in all the John Smiths from the census, for instance, can simply lead to even more links. The crucial question for us then is, at what point do we draw a line under things and stop adding in contextual data?

Leave a Reply