Category Archives: Data

Men as Wives: Visualising Errors in the Old Bailey Proceedings Data

In a recent post I talked about some of the ways in which data visualisations have helped me to see patterns in the information recorded in the Old Bailey Proceedings on things such as crimes, verdicts, punishments and the ages of defendants, patterns that might otherwise have been missed if using traditional methods of representing data such as tables. Here I just want to give a brief update on my analysis of the Proceedings, particularly the recording of defendant occupations and social status in the Proceedings in the eighteenth and nineteenth centuries. Again, visualisations have been extremely useful, especially in identifying errors in the data.

As with the recording of defendant ages, it might well be the case that information on the occupation/social status of those tried at the Old Bailey in the eighteenth and nineteenth centuries could be useful to us on the Digital Panopticon project in tracing offenders across different sets of records. Just as an age or a birth date might allow us to establish whether the “John Smith” tried at the Old Bailey and the “John Smith” transported to Australia was indeed the same person, likewise information on occupation or social status can help us to prove/disprove such name matches across records. But as with ages it depends on how extensively, and in what manner, such information on occupation/social status is recorded in our sources. And to this end, as with information on defendant age, the techniques of data visualisation can be useful.

Searches of the Proceedings for defendant occupation/social status can be carried out using the “custom search” page of the Old Bailey Proceedings Online.

Searches of the Proceedings for defendant occupation/social status can be carried out using the “custom search” page of the Old Bailey Proceedings Online.

However, whereas with defendant ages I was able to use the “statistics search” function of the Old Bailey Proceedings Online to generate numbers for analysis, this wasn’t possible in the case of defendant occupation/social status. In the process of digitising the original trial reports, defendant occupation was indeed tagged as a distinct category of information, and thus it can be searched for systematically in the “custom search” page of the Old Bailey Proceedings Online. But this can’t be used to quantitatively analyse the recording of defendant occupations in the Proceedings. In order to do this I needed to look at the website’s underlying data file of defendant information.

This is a large file which includes numerous fields of tagged information relating to all the defendants tried at the Old Bailey and reported in the Proceedings. Since much of this information is in the form of text rather than numbers, software such as Excel isn’t very useful in analysing the data. Instead I turned to Tableau Public, a free, web-based tool that is powerful but still easy to use. There are numerous other data visualisation tools available which are ideal for novices. All need to be used with caution, but used carefully they can be invaluable. (I’m going to talk in more detail about the actual process of using tools such as Tableau to undertake crime history in my next post, so watch this space.)

By running our file on Old Bailey defendant information through Tableau I’ve been able to create some fairly simple but nonetheless useful visualisations. For the data on defendant occupation and social status this has revealed two things in particular.

Pie chart demonstrating frequency of recording defendant occupation

Pie chart demonstrating frequency of recording defendant occupation

First of all, it has highlighted how little information we actually have on the occupational and social status of Old Bailey defendants from the seventeenth to the twentieth centuries. Across the entire publication history of the Proceedings between 1674 and 1913, occupation or social status is recorded for only 11% of all the defendants put on trial. In the years 1755 to 1834, occupation/social status is recorded for 15% of defendants, but between 1834 and 1906 virtually no defendants’ occupations were recorded. On the whole, therefore, we have occupation information for only a small proportion of defendants, and none at all for our specific period c. 1787-1875.

The sheer variety of occupations that are recorded in the Proceedings were also made clear by visualising the data. The bubble chart below for example give an indication of this, and the relative frequency with which different categories are recorded. One of the problems is that the same occupations were recorded in the Proceedings in slightly different ways (“servant” and “servants”, for example) or with variant spelling (such as “taylor” and “tailor”). If we wanted to utilise occupation or social status labels to verify name matches across sets of records this suggests that we would need to use sophisticated forms of keyword searching.

Bubble chart showing categories of defendant occupation

Bubble chart showing categories of defendant occupation

Bubble chart of defendant occupations by gender

Bubble chart of defendant occupations by gender

But visualisations have been especially useful in highlighting some of the errors in the recording of occupations within the Old Bailey Proceedings data. One of the things that I wanted to find out was how occupation labels varied according to the gender of the defendants tried at the Old Bailey. In order to do this I used Tableau to create the following bubble chart of the most common forms of recorded occupations/social status for male and female defendants in the years when we have significant amounts of information on this. One of the things that really struck me in this bubble chart was the amount of men whose occupation label is recorded in our Proceedings dataset as “wife”. This clearly seemed to be an error in the data, but I wanted to know what the source of the problem was so I went back to the original data file and filtered it for male defendants with the occupation/social status label of “wife”. And I then looked at the trial reports in the Old Bailey Proceedings for these cases.

Trial report in the Old Bailey Proceedings in which the husband of a female defendant has been tagged with the social status of “wife”

Trial report in the Old Bailey Proceedings in which the husband of a female defendant has been tagged with the social status of “wife”

It turns out that many of these cases were due to errors in the digitisation process which resulted from the unusual nature of the trial reports themselves. The cases were actually ones (such as this example below) in which a female defendant had been named in the trial report as the wife of her husband, and thus the automated tagging process used to digitise the Proceedings had recorded both the husband and the wife as defendants and assigned them both the role of “wife”. This practice in the Proceedings of naming the female defendant as the wife of her husband largely disappeared in the nineteenth century, and therefore most of these errors in the data file tend to come from the eighteenth century. By identifying these kinds of anomalies, visualisations therefore allow us to find errors in the data. Such errors can then be rectified. This leaves us with a much “cleaner” dataset, and thereby increasing the chances of successful record linkage.

Historians of crime (particularly the history of crime in Britain) have been quick to exploit the plethora of digitised criminal justice (and associated) records that are now available online. We all make us of resources such as the Old Bailey Proceedings Online, Eighteenth-Century Collections Online and digitised newspapers. But whilst we have been quick to take advantage of the benefits offered by these digitised records – such as keyword searching to find needles in haystacks – we have been less ready to understand the full effects of the digitisation process for how we study our sources and the information that we extract from them. By using data visualisations we can better understand the implications of digitisation, including the ways in which the actual process of turning a paper record into a digital format might result in errors (relatively rare, it should be said, in the case of the Old Bailey Proceedings Online) in the information we compile.

Adventures with Data Linkage

The British Convict Transportation Registers is a database detailing the journeys of over 123,000 people transported to Australia in the 18th and 19th centuries. Compiled from British Home Office records, it contains information such as the name of each person being transported, the date they departed, and their final destination.

The early stages of the Digital Panopticon have allowed us to perform some preliminary data linkage between these registers and people sentenced to transportation in the Old Bailey Proceedings. We’ve made the links primarily by name, with a degree of tolerance for spelling. We found that many names actually matched exactly, suggesting that perhaps names were in some cases directly copied from one record to another. A further 7% of names could be matched via an algorithm known as Soundex, which attempts to identify names which sound similar when spoken, but might be (accidentally) spelt differently. A remaining handful were matched by virtue of having a small Levenshtein Distance. Levenshtein is a simple metric by which the variance between two text strings is quantified. Including matches with a very small Levenshtein Distance, where perhaps only a single letter is different or omitted, helps take account of minor clerical errors.

Percentages of names matched between the British Transportation Records and Old Bailey Proceedings, under various conditions.

Results of attempted name matching between the British Transportation Records and Old Bailey Proceedings.

In total, about 70% of the people sentenced to transportation in the Proceedings appear in the transportation records. We can be quite confident of about half of these, because in some cases the date of conviction is actually given in the transportation record. If the date and name match, it becomes very likely that we’re dealing with the same individual. For transportation records where a conviction date is not given, we have to examine five or six years worth of Old Bailey records to make sure we don’t miss a possible match. This greatly increases the possibility of a false positive, so we can be less sure about these links.

One interesting trend is that the number of exact links decreases significantly in cases where the conviction date is not given. A greater proportion of these links had to be made with Soundex or Levenshtein Distance. This suggests that the links made without a conviction date are less reliable, as we might expect. Therefore, for the time being we will discard these.

With our most reliable links in hand, we can begin looking for patterns between the details of conviction and transportation. One of the most interesting pieces of information contained in the transportation records is the destination of convict ships. An obvious question is whether convicts were directed to particular destinations based upon their offence, gender or age. One might imagine colonies having a need for people with particular skills or attributes at particular times, and the system might have attempted to address these needs. Luckily, occupation is indeed sporadically recorded in the Old Bailey Proceedings.

In fact, the data shows that the overwhelming factor in deciding where a convict was sent was the particular year when they left England. Transportation was almost exclusively to New South Wales before 1831, and overwhelmingly to Van Diemens Land after 1838. There is a brief period from 1832 to 1835 where roughly equal numbers of convicts are sent to both destinations. However, even during that period, there doesn’t appear to be any correlation between the characteristics of a convict and their destination. Neither gender or age, crime or occupation seem to have made any difference. Once a person was in the transportation system, their final destination was entirely arbitrary. There was no easily identifiable tendency to send people with particular attributes to particular destinations.

Sankey diagram, showing proportions of different age groups transported to different destinations, including where the destination is unknown because a link between records could not be made.

Sankey diagram, showing proportions of different age groups transported to different destinations between 1832 and 1835, including where the destination is unknown because a link between records could not be made.

If we cannot find a pattern in where people were sent, perhaps we can find a pattern in how long it took them to be sent there. For every convict there is a period of time between when they were convicted and when they actually set sail aboard a ship. The interval between conviction and transportation is hugely variable. A few people were transported in little over a month. Some people, as we have noted, spent six years waiting to be transported.

Line graph showing the minimum, maximum and average intervals between conviction and transportation over time, 1787 - 1852.

Line graph showing the minimum, maximum and average intervals between conviction and transportation between 1787 and 1852.

The data shows that again, time was a very important factor. Transportation almost halted between 1835 and 1844, as did sentences of transportation. In contrast, the system seems to have been at peak efficiency between about 1814 and 1834, but even then there are a few outliers (represented by the green line) who still had to wait a very long time to be transported.

Detail of a scatterplot variation showing every interval between Proceedings conviction and BTR transporation, represented by horizontal bars running from conviction date to transportation date. Females are blue, males are orange.

Detail of a scatterplot variation showing every interval between Proceedings conviction and BTR transporation, represented by horizontal bars running from conviction date to transportation date. Females are blue, males are orange.

If we look at the data in more detail, we can see that a great many of those sentenced to transportation, at least early in the period, are simply waiting for the next boat to depart. Convicts sentenced at multiple sessions are stored up until, presumably, there are enough to justify a voyage. Nevertheless, there are people who seem to miss multiple voyages; people convicted at the same session as those who depart on the next boat who are, for whatever reason, left behind. Can we detect any common characteristics among these people?

It is not at all easy to find a pattern, but there may be one: Male prisoners below the age of 15 appear to be kept for longer, on average, than those who are older. It’s worth noting that the minimum and maximum intervals show no such trend; there are still people under fifteen who are transported very quickly, and people over fifteen who are held for a very long time. But in terms of the average, there is a definite increase which starts abruptly at the age of fifteen and then accelerates as prisoners get younger. In fact, on average, male prisoners under fifteen are kept for twice as long as those over fifteen.

Age plotted against minimum, maximum and average days between conviction and transportation, for males sentenced at the Old Bailey 1787-1852.

Age plotted against minimum, maximum and average days between conviction and transportation, for males sentenced at the Old Bailey 1787-1852.

This is a finding which we can begin to investigate and verify. Certainly, the pattern is not repeated for female prisoners, whose average transportation time remains remarkably consistent regardless of age. As the project gathers more data and continues its initial investigations, we hope to be able to explore this possible trend in more detail.

This is the very first linking exercise we have done, and there is undoubtedly scope to refine the process. Every dataset we add will help us to evaluate our findings more thoroughly and ask more detailed questions. The next step may be to try and link the Old Bailey and Transportation Registers to the Convict Database, which contains information such as height, and prisoner health. These may well be important factors in determining the treatment of prisoners and providing further clues as to the nature of a journey through the eighteenth century criminal justice system.

Visualising Life-Grids and Narrating the Lives of Convicts

One of the great opportunities presented by the Digital Panopticon project (and one of the most exciting in my opinion) is in uncovering more about the processes of crime and punishment by placing thousands of offenders, and their offences, back within the context of their own lives.

Tracing offenders through the records has been a preoccupation of several groups of historians and criminologists (for example Barry Godfrey, Heather Shore, Pam Cox, David Cox, Helen Johnston, Zoe Alker, Joanne Turner, and Stephen Farrall) in the last decade. On account of the laborious nature of record linkage those studies which have focussed on tracing groups offenders through civil as well as criminal datasets have been able to examine a few hundred offenders at a time. Those pioneering this methodology have taken the collected information and sorted it into ‘lifegrids’ which chart life events and changes for each individual. Lifegrids might typically include details of birth marriage and death, family evolution, employment and residential addresses, and offending and punishment history. Of course, the depth and breadth of documents and information available on different groups of, or individual offenders, dictates how much material can be recorded in each life grid.

Other than life-grid format, there are a number of ways that this information can be presented and communicated. Even the simplest visualisations are able to show the role that offending had in any one person’s life. This might be through indicating what proportion of an individual’s life was spent in custody, or how many offences were recorded against them at what stage of their life. It is possible to chart how someone’s offending accelerated and decelerated. From an institutional perspective it is possible to indicate how an individual’s weight and health changed over time, or how their behaviour and privileges impacted upon their experience of punishment. The myriad of ways in which this fascinating and complex data can be presented has some exciting potential for how others see, interrogate, and engage with this fantastically rich data.

To begin to explore these possibilities, we have been working with an example offender: Patrick Madden (one of a number of offenders included in Johnston, Godfrey and Cox’s ESRC funded research on ‘The costs of imprisonment’).

P Madden

Born and raised in Sheffield, Patrick began offending around the age of sixteen. Although often motivated by property, Patrick’s offences were primarily violent in nature. Madden had 15 offences recorded against him over an almost thirty year period. Each of these was committed either in Sheffield or other close-by northern towns such as Wakefield and Doncaster. It was in these locations that he was incarcerated, accept for one occasion of penal servitude when he served seven years of penal servitude in London, and the south of England. It does not appear as if Patrick ever married or had children, nor that he managed to establish a life for himself that did not involve repeat offending for long before dying at the age of 52.

 

Patrick Maddens lifegrid, of course, contains much more information than this brief overview might suggest. Patrick’s civil and penal records allow us to know about many elements of Patrick’s life right down to his familial relationships and sexual preferences. However, even if we take the most ‘bare bones’ approach to Patrick’s life narrative, it is possible to start creating some interesting visualisations based on his experiences and offending history.

DataHero Patrick Madden years of imprisonment in life course (1) DataHero Patrick Madden type of offending over life course

 

DataHero Weight over period of imprisonment line DataHero Penal class over time of imprisonment

 

Yet the size and scale of the research being undertaken by the Digital Panopticon means that we are faced not just with presenting Patrick Madden’s life, but instead the lives of all of the ‘Patricks’ that went through the old bailey between the late 18th and early 20th centuries. This poses two distinct challenges which we will face in presenting the mass of information traditionally held in lifegrids.  First is that the range of records being linked together for each offender is unprecedented. Some records are well known to our researchers and relatively straightforward to visualise, such as criminal registers that allow us to examine date, place and type of offence. Others such as the changing picture of family life that might evolve from three successive census entries, or the seemingly random personal or professional information that can be carried in a newspaper report, are far more difficult to quantify and visualise. This first problem will become clearer and hopefully less significant as more records are collected and linked. It should be fairly straightforward to identify the information which can be presented easily, and to adapt that which cannot. The second challenges we must meet is that of potentially presenting to other researchers and the public tens of thousands of individual life and offending histories. What we need to work on is finding a way of presenting a range of different information about our offenders both individually and in aggregate so that it is possible for users to access information about an individual they are interested in, but also to see how such an individual compares and contrasts with others in the study – something which enables researchers to identify how typical an individual’s experience was.

BG offered some initial ideas of how we might best achieve this when we met in Oxford. By creating ‘strand’ visualisations which present a mass of offenders by a few ‘key values’ –  for example the year of their first recorded offence, nature of offence, or length of offending career – and then allowing users to further restrict what strands are shown to them by other values – for example sex and location- it would be possible to access information about a single individual, whilst getting a sense of how they match up to their contemporaries.

BG visualisation

We hope that this will prove an excellent starting point as we work to develop future visualisations and methods of presentation which will allow the Digital Panopticon team, fellow researchers, and members of the public to explore, understand, and get the most from the fantastic wealth of data at our fingertips.

 

Seeing things differently: Visualizing patterns of data from the Old Bailey Proceedings

An OBP

An edition of the Old Bailey Proceedings

The Old Bailey Proceedings are a rich historical resource, almost unimaginably so. They constitute the largest body of texts detailing the lives of non-elite people ever published. Words alone can’t quite do justice to the magnitude of the Proceedings – 197,745 accounts of trials covering 239 years (1674-1913); some 127 million words of text (at an average reading rate of 250 words per minute, this would take eight hours’ solid reading every single day for nearly three years to get through!); details of some 253,382 defendants, including name, gender, age and occupation, as well as details of 223,246 verdicts passed by the juries and 169,243 punishments sentenced by the judges.

The Proceedings clearly contain a huge amount of information, but they don’t record everything – like any historical source, they are selective in what they document. The amount of information that was recorded in the Proceedings on crimes, verdicts, punishments, defendants and so on also varied over time. And whilst the digitization of the Proceedings by The Old Bailey Online has revolutionised the way in which we search and use this rich historical resource, this also has its limits. The marking-up of the text of the Proceedings (assigning tags to particular pieces of information in the text – such as name or crime – so that this information can be systematically searched) makes it possible to undertake sophisticated statistical analysis. Crimes, verdicts, punishments, defendant age and defendant gender can all be counted at the click of a mouse. Nevertheless, marking-up inevitably involves choices (about what information to tag and the level of detail that is tagged), and those choices limit the ways in which the Proceedings can be studied using computers.

Statistical searches of the Proceedings can be carried out through The Old Bailey Online

Statistical searches of the Proceedings can be carried out through The Old Bailey Online

The question that we might ask, then, is what are the limitations of the Proceedings as a source of data on such things as punishments, defendant age and gender? Taking the Proceedings in their entirety, what are the limits in terms of the information that was recorded in the original trial reports? How frequently, for example, was the age of the defendant recorded? And what are the limits in terms of what we can actually search for systematically using digital technologies? Can we, for instance, systematically determine the lengths of imprisonment which offenders were sentenced to?

These are crucial questions for us because the Digital Panopticon will rely so heavily on the Proceedings as a source: in our effort to trace the life histories of offenders who were sentenced to transportation or imprisonment at the Old Bailey between 1787 and 1875, the Proceedings will obviously be a vital source of information. After identifying those who were sentenced to transportation or imprisonment recorded in the Proceedings we will then try to trace such individuals both before and after their conviction by linking the Proceedings with other sets of records.

In trying to better understand the limitations of the Proceedings as a source of data for the Digital Panopticon project, I have recently been making use of data visualization (‘dataviz’) – using computers to create visual representations of numbers. This includes the traditional graphs and pie charts that we are all familiar with, and which I will be talking about here. But it also includes more complex forms of visualization which I will be looking at in future posts (watch this space!).

Since the Proceedings contain such a vast amount of information, manual counting and tables are therefore inadequate in making sense of the data. Turning the raw numbers into a visual form makes it much easier to see overall patterns in the data. Here I give just a brief example of how dataviz has helped me to see the Proceedings differently, to appreciate the limits of this immense historical resource, and to think about how information from the Proceedings can be used most effectively in the Digital Panopticon project.

A data visualisation of the length of trial reports in the Proceedings over time, created by The Datamining with Criminal Intent project

A data visualization of the length of trial reports in the Proceedings over time, created by  William J. Turkel as part of the Datamining with Criminal Intent project (created using Mathematica 8)

One of the key things we want to know on the Digital Panopticon is how useful age data might be in helping us to link offenders recorded in the Proceedings with individuals documented in other sets of records (such as the convict transportation registers or census records). In the first instance, links will be made through name searches of the different types of records. But how can we be sure that the John Smith recorded in the Proceedings is the same individual as the John Smith recorded in the prison parole registers, for example? Age data might help us here. If John Smith is recorded as being 24 years’ old in the Proceedings at the time of his sentence to two years’ imprisonment at the Old Bailey, and the John Smith recorded in the parole registers is stated to be 26 years’ old, then we can be confident that this is indeed the same person. By the same token, if the John Smith recorded in the parole registers is said to be 60 years’ old, this would suggest not.

Ages could then be extremely useful, but it depends on how extensively, and how accurately, age data is recorded in the Proceedings (and our other sets of records). By visualizing the results of quantitative searches of the Proceedings we can get a clear sense of this, far more so than through the use of text-heavy tables which can be hard to “read” for patterns. A statistical search using The Old Bailey Online reveals that 171,168 defendants are recorded in the Proceedings in the years 1755-1870. Of these, age is recorded for 101,364 (59.3%) of them. So for the entire period of our study, we have age data for just over half of all the defendants at the Old Bailey.

Further digging into the data and visualisation of the findings reveals some of the deeper patterns in the age data. In the first instance, the recording of ages only began in the year 1790 for defendants found guilty, and from the 1860s for those found not guilty, as shown in the graph below. In the 1790s, we have age data for 65% of guilty defendants, increasing to 90% and above thereafter. By contrast, age data for the not guilty is missing until at least the 1850s, and in earnest until the 1860s.

Visualisation demonstrating the extent of age recording over time and by verdict

Visualization demonstrating the extent of age recording over time and by verdict

This gives a sense of how extensively ages are recorded in the Proceedings over time, and according to which categories of offenders. By visualizing the patterns of recorded ages we can also get a feel for how ages were actually recorded. The graph below, for instance, suggests that there was a tendency to revise the defendant’s recorded age up or down slightly to match a round figure. The numbers of defendants whose ages are recorded as 30, 40, 50 and (to a lesser extent) 60 are all significantly above the number we might expect according to the moving average (in other words, when the yellow bar goes above the green line in the graph). By contrast, ages just either side of these figures (such as 29, 31, 39, 41 and 51) are systematically below the average (when the yellow bar is below the green line). It may well also have been the tendency for those in their early twenties to have their recorded ages revised down to 18 or 19, since these two ages are also well above the expect number. In short, many more defendants were recorded as being 30 rather than 31, or 40 rather than 41, and the scale of the difference suggests that this resulted from a deliberate policy of revising the defendant’s age up or down to match the nearest round figure.

Visualisation demonstrating the “bunching” of recorded ages at 30, 40, 50 and 60

Visualization demonstrating the “bunching” of recorded ages at 30, 40, 50 and 60

Together this suggests that age data in the Proceedings will be of much use to us in the Digital Panopticon, particularly for the defendants found guilty and subsequently sentenced to transportation or imprisonment. In this instance we have extensive amounts of age data from 1790 onwards. In the case of our not guilty control group, however, we have no age data available in the Proceedings to work with before the 1860s. In this instance we will be reliant on other categories of information to link the not guilty defendants across datasets. And in light of the seeming tendency for recorded ages to be rounded up or down, this suggests that when we use age data to link individuals across datasets it would be more effective to work within age ranges rather than trying to compare specific numbers.

From these early explorations it seems clear that visualization will be invaluable in helping us to identify the overall patterns in the data of the Proceedings. The first step in this is identifying some of the limitations in terms of the information recorded in the Proceedings. Traditional forms of visualization are useful to this end. But there are also potential benefits in going beyond this, by using more complex forms of visualization to uncover deeper patterns in the data – patterns that would be difficult to detect through simple graphs or charts. This is what I will be turning to next.

Thinking about Dates and Data

Our headline dates (1780-1925) are far from being the whole story when it comes to thinking about data collection and record linkage. One of our stated objectives in our original application elaborates:

to chart the fortunes of all Londoners convicted at the Old Bailey between the departure of the First Fleet to Australia (1787) through to the death of the last transported Londoner in Australia in the early 1920s

But in order to do this, we need to look at data from significantly earlier than 1787, or even 1780. Our interest in convicts doesn’t start at the moment of the Old Bailey trial that sent them on their journeys to Australia. For 18th-century offenders, we don’t have census or civil registration records that we can use, so our focus will be on attempting to trace earliest contacts with the criminal justice system. But if we go too far back, we’ll spend a lot of time and computing resources processing data we don’t need, which will increase problems with noise and false positives (especially when we’re looking for needles in haystacks of unstructured data like newspaper or sessions papers).

Still, it seemed worth checking a more simple question initially. We knew some of the convicts transported in 1787 would have been held in the hulks for several years, as authorities sought a replacement for the American colonies (those pesky Revolutionaries). How long exactly? We wanted to pin down a more precise date than 1780.

Attribution: State Library of New South Wales

The First Fleet entering Port Jackson, January 26, 1788 (State Library of New South Wales)

The Old Bailey Online isn’t a very useful source for this question, however convenient it might be (a few moments with the stats search tells me, for example, that 1258 people were sentenced to transportation between 1781 and 1786), because sentences given after trials don’t necessarily reflect actual outcomes: not everyone who was sentenced to transportation was actually transported; and not everyone who was transported had been given that sentence in court (a significant proportion of of death sentences was subsequently commuted to transportation). In addition,between the collapse of transportation to the American colonies and the establishment of Australia as the primary recipient of transported convicts, there were experiments with transportation to other colonies.

I needed different sources, based on the actual transportation records, so it was a chance for me to start learning about the transportation and Australian datasets I’m not familiar with. In fact, there is plenty of source material: many of the transportation records routinely included information about the convicts’ trials – offence, court, and date convicted. Moreover, a number of projects have already produced readily usable and accessible datasets based on these sources.

I started with the State Library of Queensland British Convict Transportation Registers database (BCTR), created from Home Office registers (TNA HO11, for those who’re interested). We’ve already indexed this data in Connected Histories. The CH version wasn’t designed for this kind of data analysis, however, and to run individual searches would have been a long slow job, so I downloaded the full dataset and played with it (using OpenRefine) until I got the information I wanted. The earliest trial in there, it seemed, was that of John Martin, in July 1782.

The second relevant and easily accessible dataset was the First Fleet database (FF-DB), which is also available to download. This is a smaller dataset, containing the 780 or so convicts transported on the First Fleet, of whom 327 had been sentenced at the Old Bailey. Unlike the BCTR, it’s been compiled from a number of different primary and secondary sources. In FF-DB, the earliest Old Bailey trials were from 1781. The earliest trial of all was that of Samuel Woodham and John Ruglass, at the sessions of 30 May 1781.

Why hadn’t I found these in BCTR? Because, it transpired on reading the entries, in each case their journey to Australia was actually their second convict voyage. They’d escaped from their first convict destination and had been convicted of returning from transportation around 1784-5. BCTR only gave the date of the second conviction that actually put them on the ships to Australia, whereas FF-DB records both. Most of the 14 FF-DB convicts from 1782 trials had also returned from transportation (several had been involved in the Mercury mutiny) and been re-sentenced at a later date.

Don’t ya just love the way a ‘simple’ historical question is never so simple after all?

A different question I decided to ask the data: setting aside 1781-2 outliers, what was the more normal interval between conviction and departure for Australia for the Old Bailey First Fleeters? The following table is taken from the FF data (without taking the “re”-transported into account): 213 (65%) were originally tried in 1784 or earlier. Those who’d spent less than 3 years in the hulks could presumably consider themselves the lucky ones.

Year of conviction Number of convictions
1781 4
1782 14
1783 48
1784 147
1785 37
1786 49
1787 28

Now I needed to investigate the age range of the First Fleet convicts, which would help me to work out the likely earliest dates of contact with the justice system. Both the transportation and Old Bailey Online data contain at least some information about ages, although 18th-century information on this is often imprecise and not always accurate. I wasn’t too worried about this, since they didn’t need to be exact for this purpose.

First-Fleet-OB-ages2

What are the recorded ages of the First Fleet convicts in FF-DB? There is age information for 309 out of the OB sample of 327 (bearing in mind these are recorded as ages at the time of departure, so they’d have generally been a few years younger at the time of trial). I think it will hardly come as a major surprise to 18th-century crime historians that the majority (64%) were between 20 and 30 years old, and the vast majority (95%) were over 15 and under 40.

That age data could be skewed in various ways, though: it’s conceivable that those selecting prisoners for the First Fleet tended to choose younger people who’d be more likely to survive the passage, and be stronger workers at the other end;  on the other hand, though, we might reasonably speculate that very young offenders would be less likely to be transported.

Age data is available for only about 3% of Old Bailey Online defendants between 1740 and 1780 (contrasting sharply with the later 19th-century Proceedings – which in itself tells us a lot about changes in record-keeping generally and surveillance of the criminal elements in society in particular). We have no idea how representative that 3% was so I’m wary of taking any hard numbers from it. (And again, I can imagine that very young offenders might be slightly less likely to appear at the Old Bailey than at lower courts.) But  it does show a reasonably similar profile to FF-DB, with very, very few defendants under 15, though rather more between 40 and 50 – which might (if we could really trust it) back up my notion that the First Fleet convicts tended to be selected from younger prisoners.

Using the age of 45 (in 1787) as an upper limit would give a birth year c. 1742 – let’s round that down to 1740 for convenience. So, if they were unlikely to appear in criminal justice records much before the age of 15, that takes us to 1755. That too will not be quite the final word: we’ll probably do manual searches in earlier records for the handful of First Fleeters aged over 45, and for individuals who appear to have exceptionally rich stories. But in terms of data collection for automated searching/processing, that is likely to be close to our “real” starting date.