Record Linkage Workshop Report, Part 1

In the first half of this workshop on record linkage we had three fantastic papers from guest speakers who were invited to talk about their own experiences of conducting record linkage in historical research. Each speaker offered a different perspective on the subject, allowing us to think about a wide range of issues relating to record linkage and generating ideas which will be extremely useful to us on the Digital Panopticon.

Jeremy Boulton — ‘Place, Mobility and Class Barriers: The Perils and Possibilities of Nominal Linkage in the Metropolis’

Jeremy Boulton of the University of Newcastle got the event off to a fantastic start with a fascinating and though-provoking window into his self-confessed ‘gruesome fascination’ with nominal record linkage. Reflecting on his experiences as part of the Pauper Lives in Georgian London and Manchester project, Jeremy spoke about the broader methodological (rather than strictly technical) issues associated with record linkage, highlighting both the benefits, but also the inherent dangers, of linking individuals across multiple historical records.

On the one hand, when carried out successfully, nominal record linkage can be an effective means by which to check the accuracy of our historical records. Whilst perfect accuracy is beyond attainment in historical record linkage (as E. A. Wrigley said many years ago, and which still holds true today), nevertheless the creation and collation of successful links allows us to identify the (otherwise imperceptible) lies and concealments of the people being record.

On the other hand, of course, the difficulties associated with nominal record linkage makes the successful creation of links (and thus exposing the ‘fiction in the archives’) a problematic task. Transcription errors (by both the original scribes and present-day transcribers) will defeat even the most sophisticated linkage methodologies, and confirming information can’t always be obtained.

In the latter part of his paper, Jeremy presented an absorbing case-study of the nominal record linkage of Godfrey Sykes, widely documented in sources such as pollbooks, newspapers, the London electoral database and charity subscriber registers — an apparently respectable Georgian businessman who, it turns out from further digging into the historical sources, fathered four bastards with a woman named Ann Farmer.

Gill Newton — ‘Urban Record Linkage before 1754’

Next, Gill Newton of the University of Cambridge shifted the focus onto the nuts and bolts of record linkage — a paper rich in technical detail which provided the audience with a valuable toolkit for undertaking record linkage, even for the particularly challenging context of creating re-constituted families from eighteenth-century London.

Starting with an informative background on the contents of an eighteenth-century parish register and what is meant by a re-constituted family, Gill then noted some of the key challenges which face any researcher looking to undertake urban record linkage. These include a high level of population turnover; rapid growth from migration; blurred parish and administrative boundaries; and a high risk of mistaken identities. There are, however, advantages to linking urban records, such as more detailed registers; a more diverse name base; the ability to sample viably; and the further information generated by civic administration.

Gill then treated us to a fascinating discussion of name distribution in eighteenth-century parish registers. Forenames were heavily bunched around the most common names (John, Mary, Elizabeth etc.). By contrast, whilst some surnames constituted a large proportion of the whole (such as Smith), the distribution of surnames had a much longer ‘tail’ compared to forenames. Moreover, there were stark differences in the patterns of name distribution between rural England and London.

Finally, Gill highlighted some of the most important tools for undertaking nominal record linkage, including phonetic matching and surname dictionary examples, as well as the principles of algorithmic record linkage. She offered some extremely useful tips on how to maximize the quality of the linkages created, emphasising that successful matching requires careful attention and a rigorous methodology — in other words, the cautionary mantra with record linkage should be: ‘garbage in, garbage out’.

Ciara Breathnach — ‘Irish Records Linkage 1864–1913: Big, Macro and Micro Data’

In the final paper of this first session, Ciara Breathnach from the University of Limerick talked about the approach and some of the findings from the Irish Record Linkage 1864–1913 project, on which she is the principal investigator. Funded by the Irish Research Council, and developed in partnership with the Digital Repository of Ireland, University of Limerick and Insight at NUI Galway, the project aims to provide a comprehensive map of infant and maternal mortality for Dublin from 1864 to 1913. The project will reconstruct family units and create longitudinal histories by linking records of Birth, Marriage and Death, which together include millions of name instances.

Starting with an overview of the Irish Record Linkage project, Ciara then discussed some of the forces which served to shape the recording of census and civil data in nineteenth-century Ireland, before moving on to discuss some of the differing definitions of ‘Big Data’, a term about which there is seemingly little agreement.

Ciara also provided useful information on the ontologies utilised by the Irish Record Linkage project, describing the ways in which the data has been analysed and linked, noting the necessity (in the case of such extensive numbers of available records) to sample in order to make such a project feasible.

Finally, through a case-study of Achill in Dublin Ciara presented a glimpse of the significant findings already generated by the Irish Record Linkage project. By mapping infant deaths in the parish in the 1890s, Ciara revealed the nature of the relationship between child mortality and the geography of local health care (in the form of doctors and nurses) in late nineteenth-century Ireland. As Ciara concluded, it is through these kind of detailed micro-level studies, produced by record linkage at the macro level, that we can gain a better understanding of the past.

We are very grateful to all three speakers for providing us with so much food for thought, and so many ideas to follow up!

