The Challenge of Visualising 100,000 Convict Lives

The Digital Panopticon project is linking together a wide variety of criminal justice, genealogical, and biometric records to trace thousands of convict lives from birth to death.  Each story will start with a birth date anywhere from the mid eighteenth century to the mid nineteenth century, and will include a variety of events including convictions for minor offences, one or more Old Bailey trials and punishments, possible subsequent convictions, marriage, children, census records, and death.  We are calling these life archives, though many will only present fragments of lives, depending on the amount of evidence available.  One such fragment we have already assembled is that of John Davis, born in about 1817, convicted of stealing some clothes and other items from a dwelling house in 1836, incarcerated for a month on the hulks, and transported on the ship Moffatt to New South Wales, where he arrived several months later.

John Davis

Life Archive for John Davis

How do we summarise 100,000 stories like this?  How can we find common patterns among all the individual narratives?  The project is exploring a variety of visualisation techniques in order to summarise this evidence without, as much as possible, obscuring the complexity of the individual stories.  We have already used visualisations to assess levels of missing evidence and detect errors in the Old Bailey Proceedings (Men as Wives: Visualising Errors in the Old Bailey Proceedings Data and Seeing Things Differently: Visualising Patterns of Data from the Old Bailey Proceedings), and to identify patterns in individual datasets (Transportation Under the Macroscope); and Open Data and the Digital Panopticon). But how do we use visualisations to document relations between datasets?

There is a bewildering array of visualisation formats available, as this Google Images screenshot indicates. Which one should we choose?

Which visualisation?!

Which visualisation?!

The choice obviously depends on the nature of the information to be displayed. Our most successful record linkage so far is between the records of sentences (from the Old Bailey Proceedings) and the records of punishments experienced (primarily execution, transportation, and imprisonment).  You may be surprised to read that there was a considerable discrepancy between the punishments judges dictated to convicts in the Old Bailey courtroom and the actual punishments they received.  Following their sentences, many convicts received reduced punishments as a result of pardons, other decisions taken by penal officials, and ill health or death.

Most useful to us for representing these patterns are Sankey diagrams, which depict flows in many to many relationships. Individual lines trace individual journeys, but where the same paths are followed by many people they are brought together as thicker lines, the thickness of the line denoting the volume of the flow.

Old Bailey sentences vs actual penal outcomes, 1790-99

Old Bailey sentences vs actual penal outcomes, 1790-99

For example, this diagram traces the convicts’ experiences in the 1790s, focusing on the two main sentences of that decade, death and transportation.  We can see from this that only a proportion (28%) of those sentenced to death were actually executed, with many others being transported (following a conditional pardon), and a few experiencing other outcomes such as going into service in the army or navy (during the French wars) or death.  Only around two-thirds of those sentenced to transportation, similarly, were actually transported, with the remained ending up in the hulks (and then presumably discharged after a period), or having a small number of other outcomes.

The advantage of presenting the information in this way—as opposed to, for example, a table—is that it is readily understandable without obscuring the variety of the possible outcomes.  Moreover, the patterns which stand out pose questions for further research, such as how and why did so many potential transportees manage to evade this punishment–and what determined which punishments they actually received?  These are issues we are currently investigating.

But what happens when the variables become more complex, and the number of stages prisoners might go through multiplies?  This is the problem we are working on now.  As noted, our multiple datasets include information about a variety of different types of events in convict lives.  Sankey diagrams should be able to help, as they can show multiple paths through several stages, which is what we want to do with convict lives.  Each life history can be a line in a Sankey diagram, which, when 1000s of lives are included, would reveal general patterns.  But how do we manage the large number of events, taking place at different times?  A problem here is that we want to introduce a time element to the variables (the actual dates of events), which makes it too complicated for a normal Sankey diagram.

There is no off-the-peg solution to this problem.  But here is a crude mock up using Excel of what we hope to achieve.  Eventually we will develop visualisations like this using D3, a JavaScript library for producing data visualisations.

Twenty-four convict lives from birth to punishment

Twenty-four convict lives from birth to punishment

This is based on twenty-four convict lives where we currently have eight or more records, including their birth, previous conviction (if any), Old Bailey conviction, and punishment (periods of incarceration in the hulks or a prison and subsequent release, or transportation, or execution).

It is hard to draw conclusions from the rather inelegant presentation, but you can start to see some interesting patterns.  A flat line means little time elapsed, while a steep line connotes a longer period.  We can see how many convicts had previous convictions, and how these often occurred years before the Old Bailey conviction which led to the punishment displayed.  In terms of punishment, we can see significant changes over time in the nineteenth century: crudely a shift from incarceration in the hulks followed by transportation; to prisons followed by transportation; to prisons leading to a prison licence.  What will happen when we replicate this format with tens of thousands of cases?  Will patterns become clearer, or will it just be a mess?

Convict lives by age at which events occurred

Convict lives by age at which events occurred

In fact, this visualization is in some respects already too complicated to interpret easily.  If we remove the date variable and just use the age at which events occurred, it simplifies things.  Here different patterns emerge: the wide age range of previous convictions (many first convictions took place at a young age), the wide age range of those convicted at the Old Bailey; the relatively short time gaps between conviction and commitment on the hulks, and between incarceration on the hulks and transportation (usually); the longer times spent in prison before transportation or licence; and the older ages of those sentenced to prison.

Obviously, this is work in progress, and we have a lot more work to do to create accessible and fine-tuned visualisations providing these types of information, while including thousands more cases. We hope that what we come up with will be of use not only to this project, but also to researchers in other fields who want to create visual representations of vast amounts of complex data in accessible formats.

Building Bentham’s Panopticon

This post describes a project that myself and a colleague from the Architecture department at the University of Liverpool, Dr Nick Webb, are currently working on–Building Bentham’s Panopticon– which is creating a 3D model of the Panopticon prison viewed through virtual reality software, Oculus Rift.


Bentham’s Panopticon was imagined as the ‘ideal’ prison; it was designed as a circular building with prisoners’ cells arranged around the outer wall and dominated by an inspection tower. From the tower the prison inspector would be able to gaze upon the prisoners at all times. The central inspection principle, Bentham argued, would result in ‘morals reformed, health preserved, industry invigorated, instruction diffused, public burdens lightened…all by a simple idea in architecture’ (Bentham, 1787).

Due to its escalating cost, his designs were never put in to practice. But the recent digitization of Bentham’s plans by Transcribe Bentham, alongside advances in virtual reality software, means that we now have the opportunity to digitally construct the Panopticon and venture inside.


This small element of the wider Digital Panopticon project seeks to explore how we can use digital technology to examine and recreate alternative ways of seeing and experiencing, in a particular space and place-the Panopticon prison- had it been built. Through the use of 3D modelling and virtual reality technology, we can recreate the perspective, positioning and movements- through sight lines, walking routes, and height and weight records- of the gaolers and prisoners who could have potentially been imprisoned within the walls of the Panopticon.

In doing so, this project takes its inspiration from Tim Hitchcock, who is currently modelling the Old Bailey courtroom, and contends that by, ‘building something in three dimensions, with space, physical form and performance, along with new forms of analysis of text; can change how we understand the experience of imprisonment; allow a more fully empathetic engagement with offenders; along with a better understanding of how their experience impacted on the exercise of power and authority’.[1]

Building Bentham’s Panopticon rests upon two lines of enquiry. Firstly, it seeks to rebuild and re-examine the idealized construction of prison discipline at its most ideological- to examine the beginning of the separate, silent system and the development of modern prison reform through architecture. But it also seeks to contribute to a history from below and examine how, by adding in height and weight records of offenders, we can rebuild the perspectives, movements, and thereby explore the potential for transgression that could have occurred within a prison like the Panopticon.

We are about halfway through our research, and are yet to add in biometric data of prisoners taken from the Digital Panopticon project. Yet, in building the model using SketchUp, we have already begun to discover important findings.


The use of 3D modelling has been essential to visualising Bentham’s process and building the interior of the Panopticon. Bentham’s plans, letters and writings about the Panopticon represent a conversation- between himself, architects, managers, and politicians- that include a series of changes to the design of the building and its regime. We are very early on in our findings, but constructing the Panopticon using 3D software, SketchUp, has demonstrated the significance of using this technology to investigate different lines of historical enquiry. Bentham’s never completed the design for the Panopticon, and the debate continued from the 1780s to 1820s. However, plans exist from 1787 and 1791 and these designs are the source from which we have built the 3D models.  However, the interior was never fully decided upon due to conflicts between, amongst others, Bentham, John Howard, and William Pitt the Younger.[2]As Nick Webb has argued previously, ‘This is important, as inferences have to be made due to representational source data such as architectural drawings almost always being incomplete’.[3] Therefore, it is necessary to delve in to primary and secondary resources to explore the context, and fill in the gaps in an informed way. For example, Bentham initially wanted the Panopticon to be made out of glass and cast iron. ‘Architecturally’, according to Janet Semple, ‘the Panopticon foreshadows Paxton’s Crystal Palace rather than Pentonville’.[4] However, despite technological innovation in glass manufacture in the late eighteenth century, the building materials were never decided upon, so Nick and I decided to use London stock brick as this was the most commonly used material in London at the close of the eighteenth century.

Panopticon build

The models take the form of an idealised, architectural plan, and our current focus is to examine how a series of changes and compromises in the design, seen through the application of 3D modelling, demonstrate the political ideas behind the introduction of the separate, silent system and solitary confinement, but also the relative positions and viewpoints of the different historical actors, in this case, the gaoler, chaplain, and inmates.


What currently interests us at the moment is lines of vision and mobilities as, for Bentham and Foucault, panopticism as a principle is about the power of the gaze- of observation, regulation and power. But I would argue that Foucault and Bentham both had simplistic arguments when it comes to this aspect. In terms of sight lines, or what people can see when stood or walking through a particular point in space, this study builds upon the work of Philip Steadman (UCL). Steadman sketched out two dimensional axonometric drawings of the Panopticon, but with the use of 3D, we are able to build the interior of the Panopticon and therefore provide a space in which the viewer can walk around the prison and inhabit the potential routes of the gaolers, chaplain, and offenders. Steadman draws upon architectural research to plot the totality of what can be seen from a fixed position- also known as an ‘isovist’. (Steadman, 2012: 16).


In Steadman’s image here, you’ll see that the shaded area shows the warder’s isovist. The warder must circulate continuously to watch all the prisoners on his floor. But Steadman’s method, while highlighting the problems in Bentham’s design, is set from a fixed point. Our study builds on this in two crucial ways: firstly, we are able, through the use of Oculus Rift and Virtual Reality Software, to recreate the viewpoints and sight perspective of the gaoler, chaplain, visitors, and offenders, and secondly, we are able to move beyond fixed isovist points to follow the potential mobilities of both gaoler and offender had they been incarcerated in the Panopticon.


So Bentham designed the process of observation to be one way; that is, that the governor, gaolers, other prison staff, and prison visitors to be able to observe the convicts, but that the convicts could only look upon the inspectors gallery. This was, in essence, the central inspection principle. The idea was that every prisoner should be under constant apprehension that he might be observed, night and day, even if no-one was actually looking in his direction at that very moment. He would thus be constantly fearful of being discovered in any misdemeanour.

Screen shot 2012-10-25 at 10.58.43

The Panopticon was a disciplinary technique for making a new social individual; a social laboratory where new subjects were made. Under Bentham’s design, the inmate doesn’t know when they are being watched, and assumes that they are under surveillance at any time. Therefore the prisoner is the subject of observation and power – and this is power through observation. By learning to internalize system of discipline, to watch himself or herself, the Panopticon, theoretically at least, aimed to produce reform through the regulation of the self. The aim of this kind of discipline was, according to Foucault, to turn inmates into quiet, orderly, tractable, malleable subjects or what he provocatively calls ‘Docile Bodies’. As Foucault stated, ‘solitude is the primary condition of total submission’ (Foucault, 1975: 237). Each individual, in his place, is securely confined to a cell from which he is seen from the front by the supervisor; but the side walls prevent him from coming into contact with his companions.

He is seen, but he does not see; he is the object of information, never a subject in communication (Foucault, 1975: 201).The prisoner is therefore, the object of power rather than an agent of power – ‘the object of information’ – never a ‘subject in communication’.


And it is this very notion- the power of the gaze and the power relations that manifest through looking- that Building Bentham’s Panopticon seeks to investigate. The use of 3D and Virtual Reality technology, allows us to put Foucault’s theory, and Bentham’s designs, to the test.


NB Please note that the models are incomplete at present, so may contain errors and inconsistencies.

PhD Work in Progress- Emma Watkins &The Case of George Fenby

The Case of George Fenby

You can see a video of Emma’s slides here: The Case of George Fenby

Euryalus Wash room

(Image courtesy of National Archives)

My PhD research explores the lives and criminal careers of convicts in the nineteenth century, specifically juveniles aged 7-14 – who were sentenced at the Old Bailey to either transportation to Australia or Penal servitude at home – in the period of 1816-1850.

After transcribing all juvenile criminals fitting my criteria a sample will be traced using data-linkage between different digital resources.[1] Then through utilising both a quantitative and qualitative approach, the common factors and experiences present in the lives of juvenile offenders will be identified, and biographies of individuals created.

This will allow for a rounded understanding of the context of these offenders, and enable me to approach the broader questions such as: (i) what part did social, economic, environmental and familiar factors play in criminal juvenile lives? And (ii) by comparing both those who were transported and those who served a penal sentence, which route led to greater criminal desistance and why?

Centred on transportation, this blog will use a case study interspersed with some initial trends of the whole transportation dataset. Notwithstanding the estimated 72, 500 convicts transported to Van Diemen’s Land alone, in my period there were 1411 juveniles sentenced to transportation to Australia as a whole.[2]


(Image courtesy of

Firstly, it is important to point out the disparity in numbers between male and females. There were only 77 females sentenced to transportation, compared with 1333 male. This disparity is clearly seen in this graph comparing age and gender of transportees (shown in Figure 1.0). This graph also shows the proportional increase of transportee sentences and age. But, the key word here is sentence as not all were sent.

Number and Age of Juveniles Sentenced to Transportation

     EW Fig 1.0 EW Fig 1.0 p2

Figure 1.0

The youngest transportee in the sample is George Fenby. According to some historians, juveniles were not usually transported until “they were a suitable age” – 14 or 15 years old.[3] If they were younger upon conviction they would spend this period of limbo in local gaols or on the Hulks (moored prison ships). Clearly, however, there were exceptions and juveniles were transported under the age of 14. George Fenby is an example.

 Description List CON18/1/15

EW Fig 1.2

Figure 1.1

Parish Birth Records

EW Fig 1.3

Figure 1.2

Fenby’s trip on-board the Manlius took four months. While his transportation records suggest that he was 10 years old, it would seem that he was in fact 12 years old when he first stepped foot in Van Diemen’s Land (see figure 1.1 and figure 1.2). Interestingly, whereas transportation officials believed the youngest male transported was 10, the youngest female in my sample – Mary Ann Oseman – was described as 14 when she arrived in Australia. Four years older than the youngest male.

Born in 1818, one of Hannah Fenby’s five children, George Fenby was convicted and sentenced for stealing two pairs of shoes from a shop, with his mother. The court believing him to be 9 years old.[4] Fenby’s mother, age 43, was also transported on board the Eliza for her part in the offence.

Subcategory of Offences

EW Fig 1.4EW Fig 1.4 p 2

Figure 1.3

Except the odd coining or fraud offence, all offences were theft. As we can see in the graph above (see Figure 1.3), the most common offences were Grand and Simple Larceny. Grand Larceny involved the theft to the value of at least 1 shilling in the absence of aggravating circumstances. But, in 1827, this offence was replaced by the new offence of Simple Larceny, which also did away with Petty Larceny and the complication of minimum values. Pickpocketing was also prevalent but this is skewed by male offenders, which is highlighted when we break down offences by gender (see Figure 1.4).

Male Offence Subcategory

EW Fig 1.5 m

Female Offence Subcategory

EW Fig 1.5 f

Figure 1.4

Still, even after breaking down offences by gender the most common offences in both sex remain Grand and Simple Larceny – if taken together. However, as well as some differences between genders it is noteworthy that the offence, shoplifting, is not prevalent in either sex. Yet, if we take into account the ‘spatial environment of the crime’, the most common place to steal from, in both genders, was the shop (see Figure 1.5). The Fenby’s are an example this.

 EW Fig 1.6

Figure 1.5

The question is: why was Fenby selected from the Euryalus Hulk for actual transportation at such a young age – when others were not? Perhaps his Conduct and Appropriation Records may shed some light (see Figure 1.6). After being with his first mistress, Mrs. Humphrey’s for less than two years, Fenby was re-assigned to a John Kerr. There Fenby began his misconduct relatively unremarkably, for example, he received twelve lashes for being absent in December 1833. However, just a year later his misconduct took on a new form. It was reported that Fenby took “liberties” with his master’s 6 year old daughter, himself only being approximately 14. As a result he was removed to Port Arthur, and placed in the worst class of boys at Point Puer (a juvenile penal settlement 1834-1849). This event even made the Colonial Times (see figure 1.7). At Point Puer Fenby received a further twelve lashes for “most riotous and improper conduct in the cells”. This was followed by more absenteeism. It is possible that Fenby’s behaviour in the colony is indicative of his behaviour while imprisoned before transportation, resulting in his early selection.

  EW Fig 1.7

Figure 1.6

 EW Fig 1.8

Figure 1.7

After receiving a certificate of freedom in 1836, he at some point made his way to Victoria, living for period in Geelong where he worked as a sawyer. There he again appeared in the papers in August 1842 having been charged with attempted highway robbery (see Figure 1.8). Unfortunately as yet I have no details of this charge other than a newspaper clipping and so do not know the outcome. However, we can be relatively confident that it was him, because it states his alias was Timms. There is a connection with Fenby and the name Timms. Not only was the name on his mother’s death certificate, but the connection is also shown in a newspaper notice addressed to George Fenby by his mother Hannah Timms (see Figure 1.9).

 EW Fig 1.9

Figure 1.8

 EW Fig 2.0

Figure 1.9

While there are unknowns in periods of this convict’s life, what is clear is that George Fenby lived an eventful 75 years, living 15 years longer than the average life span. Marrying three times and having children with his first and second wife. Fenby died in 1893 in Corryong, Victoria.

While George Fenby was badly behaved, he was not a repeat offender before he was transported. Notably, only 26% of male juveniles had former indictments or convictions acknowledged at court, compared with 49% of females. This suggests a greater reluctance to sentence girls to transportation unless they were known to be repeat offenders. For males there is a sudden recording of previous convictions just before 1830. While this is probably more of a change in recording, rather than boys suddenly deciding to commit more than one offence – it is of interest both because of its sudden importance – and because when we look at females we can see there is more of a correlation (see Figure 2.0).

 ew Fig 2.2

Figure 2.0

This all raises more questions than it answers. On what basis were juveniles sentenced to imprisonment or transportation? And of those sentenced to transportation, how many were actually sent? And how were they selected?

This idea that bad behaviour and previous offences led to a greater chance of those sentenced to seven years transportation actually being transported, is supported by the 1812 Parliamentary Paper. However, in the same paper, while the superintendent of hulks claimed that “probable utility” to the colony was not considered, he then goes on to say he would not send men over 50, and would probably not send women over 42, and definitely not over 45 – because they would be a “great burthen to the colony.”[5] Notwithstanding the fact that Hannah Fenby was 44 on her arrival.

At the other end of the scale, Captain Williams, government inspector of prisoners, believed many juveniles were too “diminutive” to be sent. So, did they base selection on the size and strength of the juveniles, implying utility was considered? Or was it those they perceived as “hardened” that were selected? Or were other factors, such as practicality, influential? Meaning whether ships were available.

Through tracing how many juveniles sentenced to transportation in my sample were actually transported, and using prosopography, I hope to approach these questions of selection.


Open Data and the Digital Panopticon

Of all historical periods and subjects, crime and justice in eighteenth- and nineteenth-century London is the most extensively digitised. Through the digitisation of countless numbers of court records, transportation registers, prison archives, trial reports, criminal biographies, last dying speeches and newspapers (amongst many other things), we can access a wealth of information about crime, policing and punishment in the metropolis, and about the fates of the offenders tried there, all at the click of a mouse.

To our great benefit, much of this data is openly available, a product of the dogged efforts of public bodies, academics, data developers, volunteers and enthusiasts; often (but certainly not always) supported by public funding. In the process it has opened up seemingly boundless possibilities for research.

Indeed, without several of these open datasets the Digital Panopticon could not be realised. In our efforts to trace the life courses and subsequent offending histories of London convicts transported to Australia or imprisoned in Britain in the late eighteenth and nineteenth centuries, we will be reliant on a number of open datasets such as the British Convict Transportation Registers and Female Prison Licences.

It seems timely, therefore, on Open Data Day, to celebrate these fantastic, freely-accessible resources, and to highlight just a couple of ways in which they will be useful to us on the Digital Panopticon. Taking place on 21 February 2015, Open Data Day will involve a series of events and gatherings which seek to develop support for, and to encourage, the adoption of open data policies by the world’s local, regional and national governments.

I have talked in a previous post about the ways in which visualisations of the openly-available British Convict Transportation Registers database can be used to put transportation under the ‘macroscope’ – to chart the complex patterns and interactions of penal transportation in their entirety, spanning the breadth of Australia and the length of a century, taking in the lives of tens of thousands of individuals along the way.

In this post I briefly want to highlight another open dataset which will be at the heart of the project – the prison licence records of females incarcerated in British jails in the nineteenth century, held by the National Archives (under the catalogue reference PCOM 4), the metadata for which is openly available on the Archive’s online catalogue.

The licences almost without exception record the age of the offender on conviction, a potentially useful piece of information for us on the Digital Panopticon in terms of record linkage. But, as with our other datasets, we want to know how accurately ages were recorded, and again in the case of the female licences by visualising the data it suggests some interesting things for us to think about.

Not least, it again reveals the tendency towards age heaping in the recording of ages at round numbers such as 20, 30 and 40, suggesting that recorded ages were regularly rounded up or down rather than representing the true age of the offender. If ages were recorded accurately, we would expect to see a smooth distribution of recorded ages. As seen in the graph below, however, this was far from the case in the recording of female prisoner ages in the nineteenth century, with spikes at the ages of 20, 30, 40 and 50, and dips at the ages 29, 31, 39, 41.

Age on Conviction as Recorded in the PCOM4 Female Prison Licences

Age on Conviction as Recorded in the PCOM4 Female Prison Licences

Does this mean, therefore, that we should disregard recorded ages as entirely inaccurate? Not necessarily – as the graph below demonstrates, when we compare the distribution of ages across different sets of records, it suggests that recorded ages were perhaps broadly reflective of age patterns. The distribution of offender ages is typically younger in the Old Bailey Proceedings (OBP) and in the Convict Indents (CIN – the records of those transported to Australia) compared to that of females imprisoned in Britain (PCOM4) – certainly what we would expect, given the nature of criminal justice policy at the time.

Ages of Female Offenders as Recorded across each Dataset

These are just a couple of ways in which the Digital Panopticon will be drawing upon the wealth of open data available to criminal justice historians. We are indebted to the hard work of all those who have contributed to the creation and dissemination of this embarrassment of riches which, in combination with the powerful digital technologies now at our fingertips, is opening up a whole new realm of research opportunities.


Record Linkage Workshop Report, Part 2

The second half of the workshop was devoted to work in progress from the Digital Panopticon –summaries of which have already appeared (or will soon be appearing) on this blog, so watch this space! As such, I’ll say less about these papers than those from Session 1.

Jamie McLaughlin — ‘How to Disappear Completely: Linking Transportation Records in the Digital Panopticon

Jamie McLaughlin presented some of the insights gained from our recent (and still very early) explorations in linking records of the trial and transportation of convicts in eighteenth- and nineteenth-century London. Uncertainty ‘plagues the records’, and Jamie discussed some of the ways in which we have tried to maximize the quality of the name matches made across the records, such as the use of spelling and date variances, creating control scenarios, and the use of variant lists over general algorithms, all ultimately with an eye on computational performance — an issue which we cannot simply disregard, however much our desire for ‘perfect’ matching techniques. In short, we need to find an optimal, complementary balance of automated and manual work, allowing computers and humans to each do what they’re good at — an ideal strategy reflected in the case of the ‘robot butler’.

Lucy Williams — ‘What’s in a name? Convicts, Context and Multiple Record Linkage

Lucy Williams talked about her recent work in manually checking the automated linkage process undertaken by Jamie, particularly in identifying why good matches have failed to be made. One reason for this is simple name variance — variable spellings of the same surname are notoriously prevalent in eighteenth- and nineteenth-century records. Nor is the data from one record set (such as the Old Bailey Proceedings) carried over consistently to other records. But there is also the problem of “John Smith” — how do we prise apart and correctly link individuals tried at the same session of the Old Bailey who have the same name, spelled in exactly the same way? We can keep adding in information from other sources in order to try and verify these kind of multiple name matches, but that isn’t necessarily always the answer, particularly in terms of automated processes. Adding in all the John Smiths from the census, for instance, can simply lead to even more links. The crucial question for us then is, at what point do we draw a line under things and stop adding in contextual data?

Record Linkage Workshop Report, Part 1

In the first half of this workshop on record linkage we had three fantastic papers from guest speakers who were invited to talk about their own experiences of conducting record linkage in historical research. Each speaker offered a different perspective on the subject, allowing us to think about a wide range of issues relating to record linkage and generating ideas which will be extremely useful to us on the Digital Panopticon.

Jeremy Boulton — ‘Place, Mobility and Class Barriers: The Perils and Possibilities of Nominal Linkage in the Metropolis’

Jeremy Boulton of the University of Newcastle got the event off to a fantastic start with a fascinating and though-provoking window into his self-confessed ‘gruesome fascination’ with nominal record linkage. Reflecting on his experiences as part of the Pauper Lives in Georgian London and Manchester project, Jeremy spoke about the broader methodological (rather than strictly technical) issues associated with record linkage, highlighting both the benefits, but also the inherent dangers, of linking individuals across multiple historical records.

On the one hand, when carried out successfully, nominal record linkage can be an effective means by which to check the accuracy of our historical records. Whilst perfect accuracy is beyond attainment in historical record linkage (as E. A. Wrigley said many years ago, and which still holds true today), nevertheless the creation and collation of successful links allows us to identify the (otherwise imperceptible) lies and concealments of the people being record.

On the other hand, of course, the difficulties associated with nominal record linkage makes the successful creation of links (and thus exposing the ‘fiction in the archives’) a problematic task. Transcription errors (by both the original scribes and present-day transcribers) will defeat even the most sophisticated linkage methodologies, and confirming information can’t always be obtained.

In the latter part of his paper, Jeremy presented an absorbing case-study of the nominal record linkage of Godfrey Sykes, widely documented in sources such as pollbooks, newspapers, the London electoral database and charity subscriber registers — an apparently respectable Georgian businessman who, it turns out from further digging into the historical sources, fathered four bastards with a woman named Ann Farmer.

Gill Newton — ‘Urban Record Linkage before 1754’

Next, Gill Newton of the University of Cambridge shifted the focus onto the nuts and bolts of record linkage — a paper rich in technical detail which provided the audience with a valuable toolkit for undertaking record linkage, even for the particularly challenging context of creating re-constituted families from eighteenth-century London.

Starting with an informative background on the contents of an eighteenth-century parish register and what is meant by a re-constituted family, Gill then noted some of the key challenges which face any researcher looking to undertake urban record linkage. These include a high level of population turnover; rapid growth from migration; blurred parish and administrative boundaries; and a high risk of mistaken identities. There are, however, advantages to linking urban records, such as more detailed registers; a more diverse name base; the ability to sample viably; and the further information generated by civic administration.

Gill then treated us to a fascinating discussion of name distribution in eighteenth-century parish registers. Forenames were heavily bunched around the most common names (John, Mary, Elizabeth etc.). By contrast, whilst some surnames constituted a large proportion of the whole (such as Smith), the distribution of surnames had a much longer ‘tail’ compared to forenames. Moreover, there were stark differences in the patterns of name distribution between rural England and London.

Finally, Gill highlighted some of the most important tools for undertaking nominal record linkage, including phonetic matching and surname dictionary examples, as well as the principles of algorithmic record linkage. She offered some extremely useful tips on how to maximize the quality of the linkages created, emphasising that successful matching requires careful attention and a rigorous methodology — in other words, the cautionary mantra with record linkage should be: ‘garbage in, garbage out’.

Ciara Breathnach — ‘Irish Records Linkage 1864–1913: Big, Macro and Micro Data’

In the final paper of this first session, Ciara Breathnach from the University of Limerick talked about the approach and some of the findings from the Irish Record Linkage 1864–1913 project, on which she is the principal investigator. Funded by the Irish Research Council, and developed in partnership with the Digital Repository of Ireland, University of Limerick and Insight at NUI Galway, the project aims to provide a comprehensive map of infant and maternal mortality for Dublin from 1864 to 1913. The project will reconstruct family units and create longitudinal histories by linking records of Birth, Marriage and Death, which together include millions of name instances.

Starting with an overview of the Irish Record Linkage project, Ciara then discussed some of the forces which served to shape the recording of census and civil data in nineteenth-century Ireland, before moving on to discuss some of the differing definitions of ‘Big Data’, a term about which there is seemingly little agreement.

Ciara also provided useful information on the ontologies utilised by the Irish Record Linkage project, describing the ways in which the data has been analysed and linked, noting the necessity (in the case of such extensive numbers of available records) to sample in order to make such a project feasible.

Finally, through a case-study of Achill in Dublin Ciara presented a glimpse of the significant findings already generated by the Irish Record Linkage project. By mapping infant deaths in the parish in the 1890s, Ciara revealed the nature of the relationship between child mortality and the geography of local health care (in the form of doctors and nurses) in late nineteenth-century Ireland. As Ciara concluded, it is through these kind of detailed micro-level studies, produced by record linkage at the macro level, that we can gain a better understanding of the past.

We are very grateful to all three speakers for providing us with so much food for thought, and so many ideas to follow up!

What’s in a Name?: Details and Data Linkage

A year in to the Digital Panopticon project we have begun record linkage with some of our key sources relating to Transportation. With several innovative iterations of initial linkage completed, thanks to Jamie McLaughlin, we have been able to trace more than three quarters of those sent for transportation from the Old Bailey, linking them to their voyage details in the British Transportation Registers. For some, we have also been able to link onwards to the Convict Indents compiled for them on board convict ships and once they arrived in Australia. This iterative process has taught us much about the nature of our different record sets, and about the complex job of connecting them together.

One of the biggest challenges in the linking process has been differentiating between the multiple cases of identical names and trials in the Old Bailey. However, with a schedule of record linkage due to connect not just our transportation datasets, but also imprisonment data and eventually civil data, such as the census and birth marriage and death information, in the coming months, the certainty of what to link and how becomes increasingly difficult.

When confronted with a sea of names, and no consistency in the recording of other contextual information between our diverse datasets, how are we to make the right choices and make sure that the correct history is connected to the right offender?

Between 1780, and 1900 there was only one Mary Ann Dring convicted at the Old Bailey she was sentenced to five years penal servitude in 1865 for feloniously uttering counterfeit coin. She had appeared in the old Bailey once previously in 1863 as a witness in the coining trial of another Woman, and twenty years later in 1885 might well have acted as a witness in a manslaughter case.

From a linkage perspective we are fortunate. In all of our criminal datasets there should only be one Old Bailey Mary Ann Dring. Indeed, this is very lucky because owing to just two lines of text for her own trial, the information we start off with in order to trace her is minimal:

Name: Mary Ann Dring

Approximate year of birth: 1817

Location: London.

Step one, is to link to the next big dataset for those who stayed in England to be imprisoned. In this case that is the PCOM 4 female licences for parole. By searching with the available information from Mary Ann Dring we took from the Old Bailey data, there is no problem in locating her licence. Those familiar with the licences will know that these documents give us the opportunity to, collect a vast amount more information on her. Confident that the right link has been made we can collect some key contextual detail that will allow us to identify Mary Ann Dring in further datasets.

Licence fields

The future datasets we link to will not, of course, contain the majority of this information. So we must utilise a few key details that will help us link to new records. For civil data we could certainly use information such as the fact that Mary Ann Drink was recorded as married with two children in 1865. She worked as a Charwoman, and had been resident in London, under her married name, since at least 1863 when she had her first conviction.

In the nearest census to Mary Ann’s Old Bailey conviction in 1865 (1861) there are 183 returns for a Mary Ann Dring born on or around 1817. If we make the not unreasonable assumption that our Mary Ann Dring was living in London for the five years prior to her Old Bailey appearance, we can rather luckily reduce that to four viable matches.  To most academic researchers or family historians, this is a small and manageable selection of information in which to choose.

MAD census entries

Yet even though we know she was married with two children, we are faced with four married women, two with two children, two with three, all living in London (and none with any occupation listed which is not unusual for a census entry with a male head of household). Given the parameters of most automated systems that might be required to make such a match, any of these census entries could be considered a valid match. Manually, it is possible for an individual researcher to reduce the choices to two viable matches. They are, from a linkage point of view, almost indistinguishable. The dates of birth for the two most likely candidates fall one year either side of 1817. Both are married, both have two children. Both are residents of London. Both have identical names.

In the 1871 census, six years from Mary Ann’s conviction and four years after her release from Prison, there are no records that would directly match to either of the entries for the 1861 census. Instead there is a choice of five women who all fall within five years of the original Mary Ann Dring’s birth year, but have notable differences in their personal information. Furthermore, depending on which links are made to census data, and what extra contextual information is added to May Ann’s case, there is the potential for relevant death records from London and the surrounding counties, spanning a fifteen year period.

The choices we would be faced with if we just looked for Mary Dring, without the middle name Ann would be several times the volume. If we looked for a Mary Smith with the same level of contextual detail we could well be faced with exploring hundreds of potential matches with no way to choose between them.

Each individual record linked to a convict has ramifications for future links. On the micro level this is the dilemma faced by every genealogist or family historian. The difficult decisions that have to be made in matching records to individuals. However, the Digital Panopticon’s task of linking almost 90,000 convicts across multiple datasets is not a micro history, nor a task that can be managed manually. The design of an automated system that can navigate and discern between multiple similar (or even identical) entries in a given dataset is essential. Or perhaps it is a question of ranking and displaying the multiple possible links in case of conflict?

It would seem that our challenge now is that of developing a suitably complex data linkage system, that can simultaneously maintain a high rate of matches that we can be confident in, and one that at the same time allow us to incorporate possible, contradictory, and conflicting data. Those with common names will no doubt prove our greatest challenge, but even someone as seemingly unique as Mary Ann Dring poses challenges about how we match, what we match, what we keep, and how to store and rank conflicting information across such a wide variety of datasets.


BCHS4 presentation: Visualising Digital Panopticon Data


The Digital Panopticon will assemble a larger collection of datasets than any other crime history project to date (including, amongst many others, the Old Bailey Proceedings, convict transportation registers and prison records), covering hundreds of thousands of individuals. To effectively bring together this information to reconstruct the lives of offenders, we need to develop a detailed understanding of our datasets – of what information is and isn’t recorded on offenders, and how this varied both over time and across different sets of records. Traditional methods of data analysis and representation such as manual counting and tables are inadequate to this end. This paper instead highlights the power of digital technologies in identifying previously unrecognised (and otherwise unrecognisable) patterns. The techniques of data visualisation in particular have been invaluable in uncovering how extensively, and in what manner, information on offender age, occupation and crime location was recorded within our sources. By using digital technologies to step back from our datasets, and see them in their entirety, we can develop a much fuller and more systematic understanding of the sources we are working with.


Seeing things Differently: Visualising Data on Crime and Punishment

We’re delighted to be able to announce our second project workshop.

It’s another afternoon workshop, this time in Sheffield, and the subject is Record Linkage (part of the Epistemologies research theme). We’re particularly interested in the challenges and rewards of applying automated (and semi-automated) nominal record linkage to very large-scale historical datasets, with all their variability, fuzziness and uncertainties; our work on the project starts from these questions:

How can we improve current record-linkage processes to maximise both the number of individuals linked across different datasets and the amount of information obtained about each individual? What is the minimum amount of contextual information needed in order to conduct successful large-scale record linkage of data pertaining to specific individuals?

In addition to presentations about our work from project team members, we have three guest speakers who will bring extensive experience of historical record linkage projects:

We think this will add up to a stimulating programme and discussion that will be of interest to many historians who need to link information about large numbers of individuals and using data that is continually growing in diversity and scale.

Download: Workshop Programme/Flyer (pdf).

Workshop Information

When: 2-5.30pm, Tuesday 4 November 2014
Where Humanities Research Institute, Gell Street, Sheffield

Attendance is free but numbers may be limited so you will need to register in advance: email Sharon Howard (

Transportation Under the Macroscope

Computers are brilliant microscopes. They make it easy to find needles in haystacks. Want to find references to the famous lawyer William Garrow amongst the millions of words in the printed reports of trials held at the Old Bailey, for instance? A keyword search produces the results in less than a second. Without computers it would take months. Likewise, as I explained in a recent post, through the techniques of data visualisation computers can be used to spot (what would otherwise be largely imperceptible) errors within the massive datasets that we are drawing upon in the Digital Panopticon project.

But computers are also fantastic macroscopes — today’s powerful digital technologies allow us to stand back from our sources and view them in their entirety. We can see the big picture, presenting complex and large-scale patterns in simple but effective ways. Microscopes allow us to see the infinitely small. Telescopes reveal the infinitely great. Macroscopes, meanwhile, peer in to the infinitely complex, allowing us to explore combinations, relationships and interactions between multiple elements.

By visualising the information recorded in the British Convict Transportation Registers, I’ve recently put penal transportation to Australia in the eighteenth and nineteenth centuries under the macroscope. This has produced some interesting insights into the relationship between Australian penal colonies, terms of transportation and how these changed and interacted over time.BCTR

The British Convict Transportation Registers database provides information on more than 123,000 offenders who were transported to Australia between 1787 and 1867. It’s a fantastic resource, and it will be at the heart of the Digital Panopticon project’s efforts to chart the criminal lives of London convicts sent to Australia. In charting these lives, we need to address some overarching starting questions. How many London convicts were actually transported to Australia for their crimes? Which parts of Australia were they sent to? How many years abroad did they face according to their sentences? Did this change over time, and what was the relationship between these different elements? Visualisations can help us to explore these questions across the long term and a large scale.

The total number of London convicts transported to Australia fluctuated greatly over the late eighteenth and nineteenth centuries, as Graph 1 below demonstrates. Relatively few convicts were transported in the years 1793–1804 when the Revolutionary War monopolised Britain’s shipping resources. With the end of the Napoleonic War in 1815 there were however large and rapid increases in the numbers of London convicts sent to Australia, reaching a massive peak in the 1830s. Thereafter, numbers gradually fell until the eventual abandonment of penal transportation in the 1860s. Interestingly, this pattern reflects a wider inverse relationship between the numbers of convicts transported and the years in which Britain was engaged in war throughout the eighteenth and nineteenth centuries.Graph 1

What Graph 1 doesn’t reveal is that the places in Australia where convicts were sent to changed over time. The individual penal colonies to which London convicts were sent operated at different times. As Graph 2 below shows, New South Wales was the first penal colony in Australia, and was later used alongside the penal colony of Van Diemen’s Land between the late 1820s and 1840, when transportation to Australia was at its peak. Following this, Van Diemen’s Land was used almost exclusively, until the 1850s, when Western Australia was the sole transportation location in Australia.

Graph 2

If the locations of penal transportation to Australia changed over time then so too did the lengths of time which offenders were sentenced to abroad. Between 1787 and the virtual abandonment of New South Wales as a penal colony in the late 1830s, as Graph 3 highlights, offenders were sentenced almost without exception to a term of 7 years, 14 years or life. Between 1840 and 1850, when Van Diemen’s Land was used exclusively, terms became more varied, with greater use of 10 and 15 year sentences. And especially after 1853, when Western Australia became the sole destination for transportees, an even greater variety of terms were put to use. This more nuanced tariff in transportation sentences was likely introduced to make transportation more favourable to penal reformers who increasingly viewed the practice with concern.

Graph 3

These changes in penal colony and terms of transportation were intimately linked, and the interaction between the two is clearly captured in Graph 4. The colonies operated at different times, and the law which underpinned them and the terms of transportation which could be imposed also changed in accordance. In short, the convicts who found themselves on the shores of New South Wales were primarily one of two kinds: either those sentenced to 7 years transportation; or those sentenced to a whole life abroad. By contrast, London convicts landing some 2,000 miles away on the shores of Western Australia and on the eve of transportation’s demise in the 1850s would each have had subtly different terms to serve out.

Graph 4

Through the macroscope of computer-generated visualisations, we can see these complex patterns and interactions in their entirety, spanning the breadth of Australia and the length of a century, taking in the lives of tens of thousands of individuals along the way.