What’s in a Name?: Details and Data Linkage

A year in to the Digital Panopticon project we have begun record linkage with some of our key sources relating to Transportation. With several innovative iterations of initial linkage completed, thanks to Jamie McLaughlin, we have been able to trace more than three quarters of those sent for transportation from the Old Bailey, linking them to their voyage details in the British Transportation Registers. For some, we have also been able to link onwards to the Convict Indents compiled for them on board convict ships and once they arrived in Australia. This iterative process has taught us much about the nature of our different record sets, and about the complex job of connecting them together.

One of the biggest challenges in the linking process has been differentiating between the multiple cases of identical names and trials in the Old Bailey. However, with a schedule of record linkage due to connect not just our transportation datasets, but also imprisonment data and eventually civil data, such as the census and birth marriage and death information, in the coming months, the certainty of what to link and how becomes increasingly difficult.

When confronted with a sea of names, and no consistency in the recording of other contextual information between our diverse datasets, how are we to make the right choices and make sure that the correct history is connected to the right offender?

Between 1780, and 1900 there was only one Mary Ann Dring convicted at the Old Bailey she was sentenced to five years penal servitude in 1865 for feloniously uttering counterfeit coin. She had appeared in the old Bailey once previously in 1863 as a witness in the coining trial of another Woman, and twenty years later in 1885 might well have acted as a witness in a manslaughter case.

From a linkage perspective we are fortunate. In all of our criminal datasets there should only be one Old Bailey Mary Ann Dring. Indeed, this is very lucky because owing to just two lines of text for her own trial, the information we start off with in order to trace her is minimal:

Name: Mary Ann Dring

Approximate year of birth: 1817

Location: London.

Step one, is to link to the next big dataset for those who stayed in England to be imprisoned. In this case that is the PCOM 4 female licences for parole. By searching with the available information from Mary Ann Dring we took from the Old Bailey data, there is no problem in locating her licence. Those familiar with the licences will know that these documents give us the opportunity to, collect a vast amount more information on her. Confident that the right link has been made we can collect some key contextual detail that will allow us to identify Mary Ann Dring in further datasets.

Licence fields

The future datasets we link to will not, of course, contain the majority of this information. So we must utilise a few key details that will help us link to new records. For civil data we could certainly use information such as the fact that Mary Ann Drink was recorded as married with two children in 1865. She worked as a Charwoman, and had been resident in London, under her married name, since at least 1863 when she had her first conviction.

In the nearest census to Mary Ann’s Old Bailey conviction in 1865 (1861) there are 183 returns for a Mary Ann Dring born on or around 1817. If we make the not unreasonable assumption that our Mary Ann Dring was living in London for the five years prior to her Old Bailey appearance, we can rather luckily reduce that to four viable matches.  To most academic researchers or family historians, this is a small and manageable selection of information in which to choose.

MAD census entries

Yet even though we know she was married with two children, we are faced with four married women, two with two children, two with three, all living in London (and none with any occupation listed which is not unusual for a census entry with a male head of household). Given the parameters of most automated systems that might be required to make such a match, any of these census entries could be considered a valid match. Manually, it is possible for an individual researcher to reduce the choices to two viable matches. They are, from a linkage point of view, almost indistinguishable. The dates of birth for the two most likely candidates fall one year either side of 1817. Both are married, both have two children. Both are residents of London. Both have identical names.

In the 1871 census, six years from Mary Ann’s conviction and four years after her release from Prison, there are no records that would directly match to either of the entries for the 1861 census. Instead there is a choice of five women who all fall within five years of the original Mary Ann Dring’s birth year, but have notable differences in their personal information. Furthermore, depending on which links are made to census data, and what extra contextual information is added to May Ann’s case, there is the potential for relevant death records from London and the surrounding counties, spanning a fifteen year period.

The choices we would be faced with if we just looked for Mary Dring, without the middle name Ann would be several times the volume. If we looked for a Mary Smith with the same level of contextual detail we could well be faced with exploring hundreds of potential matches with no way to choose between them.

Each individual record linked to a convict has ramifications for future links. On the micro level this is the dilemma faced by every genealogist or family historian. The difficult decisions that have to be made in matching records to individuals. However, the Digital Panopticon’s task of linking almost 90,000 convicts across multiple datasets is not a micro history, nor a task that can be managed manually. The design of an automated system that can navigate and discern between multiple similar (or even identical) entries in a given dataset is essential. Or perhaps it is a question of ranking and displaying the multiple possible links in case of conflict?

It would seem that our challenge now is that of developing a suitably complex data linkage system, that can simultaneously maintain a high rate of matches that we can be confident in, and one that at the same time allow us to incorporate possible, contradictory, and conflicting data. Those with common names will no doubt prove our greatest challenge, but even someone as seemingly unique as Mary Ann Dring poses challenges about how we match, what we match, what we keep, and how to store and rank conflicting information across such a wide variety of datasets.

 

Leave a Reply