Tag Archives: ages

Open Data and the Digital Panopticon

Of all historical periods and subjects, crime and justice in eighteenth- and nineteenth-century London is the most extensively digitised. Through the digitisation of countless numbers of court records, transportation registers, prison archives, trial reports, criminal biographies, last dying speeches and newspapers (amongst many other things), we can access a wealth of information about crime, policing and punishment in the metropolis, and about the fates of the offenders tried there, all at the click of a mouse.

To our great benefit, much of this data is openly available, a product of the dogged efforts of public bodies, academics, data developers, volunteers and enthusiasts; often (but certainly not always) supported by public funding. In the process it has opened up seemingly boundless possibilities for research.

Indeed, without several of these open datasets the Digital Panopticon could not be realised. In our efforts to trace the life courses and subsequent offending histories of London convicts transported to Australia or imprisoned in Britain in the late eighteenth and nineteenth centuries, we will be reliant on a number of open datasets such as the British Convict Transportation Registers and Female Prison Licences.

It seems timely, therefore, on Open Data Day, to celebrate these fantastic, freely-accessible resources, and to highlight just a couple of ways in which they will be useful to us on the Digital Panopticon. Taking place on 21 February 2015, Open Data Day will involve a series of events and gatherings which seek to develop support for, and to encourage, the adoption of open data policies by the world’s local, regional and national governments.

I have talked in a previous post about the ways in which visualisations of the openly-available British Convict Transportation Registers database can be used to put transportation under the ‘macroscope’ – to chart the complex patterns and interactions of penal transportation in their entirety, spanning the breadth of Australia and the length of a century, taking in the lives of tens of thousands of individuals along the way.

In this post I briefly want to highlight another open dataset which will be at the heart of the project – the prison licence records of females incarcerated in British jails in the nineteenth century, held by the National Archives (under the catalogue reference PCOM 4), the metadata for which is openly available on the Archive’s online catalogue.

The licences almost without exception record the age of the offender on conviction, a potentially useful piece of information for us on the Digital Panopticon in terms of record linkage. But, as with our other datasets, we want to know how accurately ages were recorded, and again in the case of the female licences by visualising the data it suggests some interesting things for us to think about.

Not least, it again reveals the tendency towards age heaping in the recording of ages at round numbers such as 20, 30 and 40, suggesting that recorded ages were regularly rounded up or down rather than representing the true age of the offender. If ages were recorded accurately, we would expect to see a smooth distribution of recorded ages. As seen in the graph below, however, this was far from the case in the recording of female prisoner ages in the nineteenth century, with spikes at the ages of 20, 30, 40 and 50, and dips at the ages 29, 31, 39, 41.

Age on Conviction as Recorded in the PCOM4 Female Prison Licences

Age on Conviction as Recorded in the PCOM4 Female Prison Licences

Does this mean, therefore, that we should disregard recorded ages as entirely inaccurate? Not necessarily – as the graph below demonstrates, when we compare the distribution of ages across different sets of records, it suggests that recorded ages were perhaps broadly reflective of age patterns. The distribution of offender ages is typically younger in the Old Bailey Proceedings (OBP) and in the Convict Indents (CIN – the records of those transported to Australia) compared to that of females imprisoned in Britain (PCOM4) – certainly what we would expect, given the nature of criminal justice policy at the time.

Ages of Female Offenders as Recorded across each Dataset

These are just a couple of ways in which the Digital Panopticon will be drawing upon the wealth of open data available to criminal justice historians. We are indebted to the hard work of all those who have contributed to the creation and dissemination of this embarrassment of riches which, in combination with the powerful digital technologies now at our fingertips, is opening up a whole new realm of research opportunities.


Seeing things differently: Visualizing patterns of data from the Old Bailey Proceedings


An edition of the Old Bailey Proceedings

The Old Bailey Proceedings are a rich historical resource, almost unimaginably so. They constitute the largest body of texts detailing the lives of non-elite people ever published. Words alone can’t quite do justice to the magnitude of the Proceedings – 197,745 accounts of trials covering 239 years (1674-1913); some 127 million words of text (at an average reading rate of 250 words per minute, this would take eight hours’ solid reading every single day for nearly three years to get through!); details of some 253,382 defendants, including name, gender, age and occupation, as well as details of 223,246 verdicts passed by the juries and 169,243 punishments sentenced by the judges.

The Proceedings clearly contain a huge amount of information, but they don’t record everything – like any historical source, they are selective in what they document. The amount of information that was recorded in the Proceedings on crimes, verdicts, punishments, defendants and so on also varied over time. And whilst the digitization of the Proceedings by The Old Bailey Online has revolutionised the way in which we search and use this rich historical resource, this also has its limits. The marking-up of the text of the Proceedings (assigning tags to particular pieces of information in the text – such as name or crime – so that this information can be systematically searched) makes it possible to undertake sophisticated statistical analysis. Crimes, verdicts, punishments, defendant age and defendant gender can all be counted at the click of a mouse. Nevertheless, marking-up inevitably involves choices (about what information to tag and the level of detail that is tagged), and those choices limit the ways in which the Proceedings can be studied using computers.

Statistical searches of the Proceedings can be carried out through The Old Bailey Online

Statistical searches of the Proceedings can be carried out through The Old Bailey Online

The question that we might ask, then, is what are the limitations of the Proceedings as a source of data on such things as punishments, defendant age and gender? Taking the Proceedings in their entirety, what are the limits in terms of the information that was recorded in the original trial reports? How frequently, for example, was the age of the defendant recorded? And what are the limits in terms of what we can actually search for systematically using digital technologies? Can we, for instance, systematically determine the lengths of imprisonment which offenders were sentenced to?

These are crucial questions for us because the Digital Panopticon will rely so heavily on the Proceedings as a source: in our effort to trace the life histories of offenders who were sentenced to transportation or imprisonment at the Old Bailey between 1787 and 1875, the Proceedings will obviously be a vital source of information. After identifying those who were sentenced to transportation or imprisonment recorded in the Proceedings we will then try to trace such individuals both before and after their conviction by linking the Proceedings with other sets of records.

In trying to better understand the limitations of the Proceedings as a source of data for the Digital Panopticon project, I have recently been making use of data visualization (‘dataviz’) – using computers to create visual representations of numbers. This includes the traditional graphs and pie charts that we are all familiar with, and which I will be talking about here. But it also includes more complex forms of visualization which I will be looking at in future posts (watch this space!).

Since the Proceedings contain such a vast amount of information, manual counting and tables are therefore inadequate in making sense of the data. Turning the raw numbers into a visual form makes it much easier to see overall patterns in the data. Here I give just a brief example of how dataviz has helped me to see the Proceedings differently, to appreciate the limits of this immense historical resource, and to think about how information from the Proceedings can be used most effectively in the Digital Panopticon project.

A data visualisation of the length of trial reports in the Proceedings over time, created by The Datamining with Criminal Intent project

A data visualization of the length of trial reports in the Proceedings over time, created by  William J. Turkel as part of the Datamining with Criminal Intent project (created using Mathematica 8)

One of the key things we want to know on the Digital Panopticon is how useful age data might be in helping us to link offenders recorded in the Proceedings with individuals documented in other sets of records (such as the convict transportation registers or census records). In the first instance, links will be made through name searches of the different types of records. But how can we be sure that the John Smith recorded in the Proceedings is the same individual as the John Smith recorded in the prison parole registers, for example? Age data might help us here. If John Smith is recorded as being 24 years’ old in the Proceedings at the time of his sentence to two years’ imprisonment at the Old Bailey, and the John Smith recorded in the parole registers is stated to be 26 years’ old, then we can be confident that this is indeed the same person. By the same token, if the John Smith recorded in the parole registers is said to be 60 years’ old, this would suggest not.

Ages could then be extremely useful, but it depends on how extensively, and how accurately, age data is recorded in the Proceedings (and our other sets of records). By visualizing the results of quantitative searches of the Proceedings we can get a clear sense of this, far more so than through the use of text-heavy tables which can be hard to “read” for patterns. A statistical search using The Old Bailey Online reveals that 171,168 defendants are recorded in the Proceedings in the years 1755-1870. Of these, age is recorded for 101,364 (59.3%) of them. So for the entire period of our study, we have age data for just over half of all the defendants at the Old Bailey.

Further digging into the data and visualisation of the findings reveals some of the deeper patterns in the age data. In the first instance, the recording of ages only began in the year 1790 for defendants found guilty, and from the 1860s for those found not guilty, as shown in the graph below. In the 1790s, we have age data for 65% of guilty defendants, increasing to 90% and above thereafter. By contrast, age data for the not guilty is missing until at least the 1850s, and in earnest until the 1860s.

Visualisation demonstrating the extent of age recording over time and by verdict

Visualization demonstrating the extent of age recording over time and by verdict

This gives a sense of how extensively ages are recorded in the Proceedings over time, and according to which categories of offenders. By visualizing the patterns of recorded ages we can also get a feel for how ages were actually recorded. The graph below, for instance, suggests that there was a tendency to revise the defendant’s recorded age up or down slightly to match a round figure. The numbers of defendants whose ages are recorded as 30, 40, 50 and (to a lesser extent) 60 are all significantly above the number we might expect according to the moving average (in other words, when the yellow bar goes above the green line in the graph). By contrast, ages just either side of these figures (such as 29, 31, 39, 41 and 51) are systematically below the average (when the yellow bar is below the green line). It may well also have been the tendency for those in their early twenties to have their recorded ages revised down to 18 or 19, since these two ages are also well above the expect number. In short, many more defendants were recorded as being 30 rather than 31, or 40 rather than 41, and the scale of the difference suggests that this resulted from a deliberate policy of revising the defendant’s age up or down to match the nearest round figure.

Visualisation demonstrating the “bunching” of recorded ages at 30, 40, 50 and 60

Visualization demonstrating the “bunching” of recorded ages at 30, 40, 50 and 60

Together this suggests that age data in the Proceedings will be of much use to us in the Digital Panopticon, particularly for the defendants found guilty and subsequently sentenced to transportation or imprisonment. In this instance we have extensive amounts of age data from 1790 onwards. In the case of our not guilty control group, however, we have no age data available in the Proceedings to work with before the 1860s. In this instance we will be reliant on other categories of information to link the not guilty defendants across datasets. And in light of the seeming tendency for recorded ages to be rounded up or down, this suggests that when we use age data to link individuals across datasets it would be more effective to work within age ranges rather than trying to compare specific numbers.

From these early explorations it seems clear that visualization will be invaluable in helping us to identify the overall patterns in the data of the Proceedings. The first step in this is identifying some of the limitations in terms of the information recorded in the Proceedings. Traditional forms of visualization are useful to this end. But there are also potential benefits in going beyond this, by using more complex forms of visualization to uncover deeper patterns in the data – patterns that would be difficult to detect through simple graphs or charts. This is what I will be turning to next.

Thinking about Dates and Data

Our headline dates (1780-1925) are far from being the whole story when it comes to thinking about data collection and record linkage. One of our stated objectives in our original application elaborates:

to chart the fortunes of all Londoners convicted at the Old Bailey between the departure of the First Fleet to Australia (1787) through to the death of the last transported Londoner in Australia in the early 1920s

But in order to do this, we need to look at data from significantly earlier than 1787, or even 1780. Our interest in convicts doesn’t start at the moment of the Old Bailey trial that sent them on their journeys to Australia. For 18th-century offenders, we don’t have census or civil registration records that we can use, so our focus will be on attempting to trace earliest contacts with the criminal justice system. But if we go too far back, we’ll spend a lot of time and computing resources processing data we don’t need, which will increase problems with noise and false positives (especially when we’re looking for needles in haystacks of unstructured data like newspaper or sessions papers).

Still, it seemed worth checking a more simple question initially. We knew some of the convicts transported in 1787 would have been held in the hulks for several years, as authorities sought a replacement for the American colonies (those pesky Revolutionaries). How long exactly? We wanted to pin down a more precise date than 1780.

Attribution: State Library of New South Wales

The First Fleet entering Port Jackson, January 26, 1788 (State Library of New South Wales)

The Old Bailey Online isn’t a very useful source for this question, however convenient it might be (a few moments with the stats search tells me, for example, that 1258 people were sentenced to transportation between 1781 and 1786), because sentences given after trials don’t necessarily reflect actual outcomes: not everyone who was sentenced to transportation was actually transported; and not everyone who was transported had been given that sentence in court (a significant proportion of of death sentences was subsequently commuted to transportation). In addition,between the collapse of transportation to the American colonies and the establishment of Australia as the primary recipient of transported convicts, there were experiments with transportation to other colonies.

I needed different sources, based on the actual transportation records, so it was a chance for me to start learning about the transportation and Australian datasets I’m not familiar with. In fact, there is plenty of source material: many of the transportation records routinely included information about the convicts’ trials – offence, court, and date convicted. Moreover, a number of projects have already produced readily usable and accessible datasets based on these sources.

I started with the State Library of Queensland British Convict Transportation Registers database (BCTR), created from Home Office registers (TNA HO11, for those who’re interested). We’ve already indexed this data in Connected Histories. The CH version wasn’t designed for this kind of data analysis, however, and to run individual searches would have been a long slow job, so I downloaded the full dataset and played with it (using OpenRefine) until I got the information I wanted. The earliest trial in there, it seemed, was that of John Martin, in July 1782.

The second relevant and easily accessible dataset was the First Fleet database (FF-DB), which is also available to download. This is a smaller dataset, containing the 780 or so convicts transported on the First Fleet, of whom 327 had been sentenced at the Old Bailey. Unlike the BCTR, it’s been compiled from a number of different primary and secondary sources. In FF-DB, the earliest Old Bailey trials were from 1781. The earliest trial of all was that of Samuel Woodham and John Ruglass, at the sessions of 30 May 1781.

Why hadn’t I found these in BCTR? Because, it transpired on reading the entries, in each case their journey to Australia was actually their second convict voyage. They’d escaped from their first convict destination and had been convicted of returning from transportation around 1784-5. BCTR only gave the date of the second conviction that actually put them on the ships to Australia, whereas FF-DB records both. Most of the 14 FF-DB convicts from 1782 trials had also returned from transportation (several had been involved in the Mercury mutiny) and been re-sentenced at a later date.

Don’t ya just love the way a ‘simple’ historical question is never so simple after all?

A different question I decided to ask the data: setting aside 1781-2 outliers, what was the more normal interval between conviction and departure for Australia for the Old Bailey First Fleeters? The following table is taken from the FF data (without taking the “re”-transported into account): 213 (65%) were originally tried in 1784 or earlier. Those who’d spent less than 3 years in the hulks could presumably consider themselves the lucky ones.

Year of conviction Number of convictions
1781 4
1782 14
1783 48
1784 147
1785 37
1786 49
1787 28

Now I needed to investigate the age range of the First Fleet convicts, which would help me to work out the likely earliest dates of contact with the justice system. Both the transportation and Old Bailey Online data contain at least some information about ages, although 18th-century information on this is often imprecise and not always accurate. I wasn’t too worried about this, since they didn’t need to be exact for this purpose.


What are the recorded ages of the First Fleet convicts in FF-DB? There is age information for 309 out of the OB sample of 327 (bearing in mind these are recorded as ages at the time of departure, so they’d have generally been a few years younger at the time of trial). I think it will hardly come as a major surprise to 18th-century crime historians that the majority (64%) were between 20 and 30 years old, and the vast majority (95%) were over 15 and under 40.

That age data could be skewed in various ways, though: it’s conceivable that those selecting prisoners for the First Fleet tended to choose younger people who’d be more likely to survive the passage, and be stronger workers at the other end;  on the other hand, though, we might reasonably speculate that very young offenders would be less likely to be transported.

Age data is available for only about 3% of Old Bailey Online defendants between 1740 and 1780 (contrasting sharply with the later 19th-century Proceedings – which in itself tells us a lot about changes in record-keeping generally and surveillance of the criminal elements in society in particular). We have no idea how representative that 3% was so I’m wary of taking any hard numbers from it. (And again, I can imagine that very young offenders might be slightly less likely to appear at the Old Bailey than at lower courts.) But  it does show a reasonably similar profile to FF-DB, with very, very few defendants under 15, though rather more between 40 and 50 – which might (if we could really trust it) back up my notion that the First Fleet convicts tended to be selected from younger prisoners.

Using the age of 45 (in 1787) as an upper limit would give a birth year c. 1742 – let’s round that down to 1740 for convenience. So, if they were unlikely to appear in criminal justice records much before the age of 15, that takes us to 1755. That too will not be quite the final word: we’ll probably do manual searches in earlier records for the handful of First Fleeters aged over 45, and for individuals who appear to have exceptionally rich stories. But in terms of data collection for automated searching/processing, that is likely to be close to our “real” starting date.