Author Archives: Richard Ward

Open Data and the Digital Panopticon

Of all historical periods and subjects, crime and justice in eighteenth- and nineteenth-century London is the most extensively digitised. Through the digitisation of countless numbers of court records, transportation registers, prison archives, trial reports, criminal biographies, last dying speeches and newspapers (amongst many other things), we can access a wealth of information about crime, policing and punishment in the metropolis, and about the fates of the offenders tried there, all at the click of a mouse.

To our great benefit, much of this data is openly available, a product of the dogged efforts of public bodies, academics, data developers, volunteers and enthusiasts; often (but certainly not always) supported by public funding. In the process it has opened up seemingly boundless possibilities for research.

Indeed, without several of these open datasets the Digital Panopticon could not be realised. In our efforts to trace the life courses and subsequent offending histories of London convicts transported to Australia or imprisoned in Britain in the late eighteenth and nineteenth centuries, we will be reliant on a number of open datasets such as the British Convict Transportation Registers and Female Prison Licences.

It seems timely, therefore, on Open Data Day, to celebrate these fantastic, freely-accessible resources, and to highlight just a couple of ways in which they will be useful to us on the Digital Panopticon. Taking place on 21 February 2015, Open Data Day will involve a series of events and gatherings which seek to develop support for, and to encourage, the adoption of open data policies by the world’s local, regional and national governments.

I have talked in a previous post about the ways in which visualisations of the openly-available British Convict Transportation Registers database can be used to put transportation under the ‘macroscope’ – to chart the complex patterns and interactions of penal transportation in their entirety, spanning the breadth of Australia and the length of a century, taking in the lives of tens of thousands of individuals along the way.

In this post I briefly want to highlight another open dataset which will be at the heart of the project – the prison licence records of females incarcerated in British jails in the nineteenth century, held by the National Archives (under the catalogue reference PCOM 4), the metadata for which is openly available on the Archive’s online catalogue.

The licences almost without exception record the age of the offender on conviction, a potentially useful piece of information for us on the Digital Panopticon in terms of record linkage. But, as with our other datasets, we want to know how accurately ages were recorded, and again in the case of the female licences by visualising the data it suggests some interesting things for us to think about.

Not least, it again reveals the tendency towards age heaping in the recording of ages at round numbers such as 20, 30 and 40, suggesting that recorded ages were regularly rounded up or down rather than representing the true age of the offender. If ages were recorded accurately, we would expect to see a smooth distribution of recorded ages. As seen in the graph below, however, this was far from the case in the recording of female prisoner ages in the nineteenth century, with spikes at the ages of 20, 30, 40 and 50, and dips at the ages 29, 31, 39, 41.

Age on Conviction as Recorded in the PCOM4 Female Prison Licences

Age on Conviction as Recorded in the PCOM4 Female Prison Licences

Does this mean, therefore, that we should disregard recorded ages as entirely inaccurate? Not necessarily – as the graph below demonstrates, when we compare the distribution of ages across different sets of records, it suggests that recorded ages were perhaps broadly reflective of age patterns. The distribution of offender ages is typically younger in the Old Bailey Proceedings (OBP) and in the Convict Indents (CIN – the records of those transported to Australia) compared to that of females imprisoned in Britain (PCOM4) – certainly what we would expect, given the nature of criminal justice policy at the time.

Ages of Female Offenders as Recorded across each Dataset

These are just a couple of ways in which the Digital Panopticon will be drawing upon the wealth of open data available to criminal justice historians. We are indebted to the hard work of all those who have contributed to the creation and dissemination of this embarrassment of riches which, in combination with the powerful digital technologies now at our fingertips, is opening up a whole new realm of research opportunities.

 

Record Linkage Workshop Report, Part 2

The second half of the workshop was devoted to work in progress from the Digital Panopticon –summaries of which have already appeared (or will soon be appearing) on this blog, so watch this space! As such, I’ll say less about these papers than those from Session 1.

Jamie McLaughlin — ‘How to Disappear Completely: Linking Transportation Records in the Digital Panopticon

Jamie McLaughlin presented some of the insights gained from our recent (and still very early) explorations in linking records of the trial and transportation of convicts in eighteenth- and nineteenth-century London. Uncertainty ‘plagues the records’, and Jamie discussed some of the ways in which we have tried to maximize the quality of the name matches made across the records, such as the use of spelling and date variances, creating control scenarios, and the use of variant lists over general algorithms, all ultimately with an eye on computational performance — an issue which we cannot simply disregard, however much our desire for ‘perfect’ matching techniques. In short, we need to find an optimal, complementary balance of automated and manual work, allowing computers and humans to each do what they’re good at — an ideal strategy reflected in the case of the ‘robot butler’.

Lucy Williams — ‘What’s in a name? Convicts, Context and Multiple Record Linkage

Lucy Williams talked about her recent work in manually checking the automated linkage process undertaken by Jamie, particularly in identifying why good matches have failed to be made. One reason for this is simple name variance — variable spellings of the same surname are notoriously prevalent in eighteenth- and nineteenth-century records. Nor is the data from one record set (such as the Old Bailey Proceedings) carried over consistently to other records. But there is also the problem of “John Smith” — how do we prise apart and correctly link individuals tried at the same session of the Old Bailey who have the same name, spelled in exactly the same way? We can keep adding in information from other sources in order to try and verify these kind of multiple name matches, but that isn’t necessarily always the answer, particularly in terms of automated processes. Adding in all the John Smiths from the census, for instance, can simply lead to even more links. The crucial question for us then is, at what point do we draw a line under things and stop adding in contextual data?

Record Linkage Workshop Report, Part 1

In the first half of this workshop on record linkage we had three fantastic papers from guest speakers who were invited to talk about their own experiences of conducting record linkage in historical research. Each speaker offered a different perspective on the subject, allowing us to think about a wide range of issues relating to record linkage and generating ideas which will be extremely useful to us on the Digital Panopticon.

Jeremy Boulton — ‘Place, Mobility and Class Barriers: The Perils and Possibilities of Nominal Linkage in the Metropolis’

Jeremy Boulton of the University of Newcastle got the event off to a fantastic start with a fascinating and though-provoking window into his self-confessed ‘gruesome fascination’ with nominal record linkage. Reflecting on his experiences as part of the Pauper Lives in Georgian London and Manchester project, Jeremy spoke about the broader methodological (rather than strictly technical) issues associated with record linkage, highlighting both the benefits, but also the inherent dangers, of linking individuals across multiple historical records.

On the one hand, when carried out successfully, nominal record linkage can be an effective means by which to check the accuracy of our historical records. Whilst perfect accuracy is beyond attainment in historical record linkage (as E. A. Wrigley said many years ago, and which still holds true today), nevertheless the creation and collation of successful links allows us to identify the (otherwise imperceptible) lies and concealments of the people being record.

On the other hand, of course, the difficulties associated with nominal record linkage makes the successful creation of links (and thus exposing the ‘fiction in the archives’) a problematic task. Transcription errors (by both the original scribes and present-day transcribers) will defeat even the most sophisticated linkage methodologies, and confirming information can’t always be obtained.

In the latter part of his paper, Jeremy presented an absorbing case-study of the nominal record linkage of Godfrey Sykes, widely documented in sources such as pollbooks, newspapers, the London electoral database and charity subscriber registers — an apparently respectable Georgian businessman who, it turns out from further digging into the historical sources, fathered four bastards with a woman named Ann Farmer.

Gill Newton — ‘Urban Record Linkage before 1754’

Next, Gill Newton of the University of Cambridge shifted the focus onto the nuts and bolts of record linkage — a paper rich in technical detail which provided the audience with a valuable toolkit for undertaking record linkage, even for the particularly challenging context of creating re-constituted families from eighteenth-century London.

Starting with an informative background on the contents of an eighteenth-century parish register and what is meant by a re-constituted family, Gill then noted some of the key challenges which face any researcher looking to undertake urban record linkage. These include a high level of population turnover; rapid growth from migration; blurred parish and administrative boundaries; and a high risk of mistaken identities. There are, however, advantages to linking urban records, such as more detailed registers; a more diverse name base; the ability to sample viably; and the further information generated by civic administration.

Gill then treated us to a fascinating discussion of name distribution in eighteenth-century parish registers. Forenames were heavily bunched around the most common names (John, Mary, Elizabeth etc.). By contrast, whilst some surnames constituted a large proportion of the whole (such as Smith), the distribution of surnames had a much longer ‘tail’ compared to forenames. Moreover, there were stark differences in the patterns of name distribution between rural England and London.

Finally, Gill highlighted some of the most important tools for undertaking nominal record linkage, including phonetic matching and surname dictionary examples, as well as the principles of algorithmic record linkage. She offered some extremely useful tips on how to maximize the quality of the linkages created, emphasising that successful matching requires careful attention and a rigorous methodology — in other words, the cautionary mantra with record linkage should be: ‘garbage in, garbage out’.

Ciara Breathnach — ‘Irish Records Linkage 1864–1913: Big, Macro and Micro Data’

In the final paper of this first session, Ciara Breathnach from the University of Limerick talked about the approach and some of the findings from the Irish Record Linkage 1864–1913 project, on which she is the principal investigator. Funded by the Irish Research Council, and developed in partnership with the Digital Repository of Ireland, University of Limerick and Insight at NUI Galway, the project aims to provide a comprehensive map of infant and maternal mortality for Dublin from 1864 to 1913. The project will reconstruct family units and create longitudinal histories by linking records of Birth, Marriage and Death, which together include millions of name instances.

Starting with an overview of the Irish Record Linkage project, Ciara then discussed some of the forces which served to shape the recording of census and civil data in nineteenth-century Ireland, before moving on to discuss some of the differing definitions of ‘Big Data’, a term about which there is seemingly little agreement.

Ciara also provided useful information on the ontologies utilised by the Irish Record Linkage project, describing the ways in which the data has been analysed and linked, noting the necessity (in the case of such extensive numbers of available records) to sample in order to make such a project feasible.

Finally, through a case-study of Achill in Dublin Ciara presented a glimpse of the significant findings already generated by the Irish Record Linkage project. By mapping infant deaths in the parish in the 1890s, Ciara revealed the nature of the relationship between child mortality and the geography of local health care (in the form of doctors and nurses) in late nineteenth-century Ireland. As Ciara concluded, it is through these kind of detailed micro-level studies, produced by record linkage at the macro level, that we can gain a better understanding of the past.

We are very grateful to all three speakers for providing us with so much food for thought, and so many ideas to follow up!

BCHS4 presentation: Visualising Digital Panopticon Data

Abstract:

The Digital Panopticon will assemble a larger collection of datasets than any other crime history project to date (including, amongst many others, the Old Bailey Proceedings, convict transportation registers and prison records), covering hundreds of thousands of individuals. To effectively bring together this information to reconstruct the lives of offenders, we need to develop a detailed understanding of our datasets – of what information is and isn’t recorded on offenders, and how this varied both over time and across different sets of records. Traditional methods of data analysis and representation such as manual counting and tables are inadequate to this end. This paper instead highlights the power of digital technologies in identifying previously unrecognised (and otherwise unrecognisable) patterns. The techniques of data visualisation in particular have been invaluable in uncovering how extensively, and in what manner, information on offender age, occupation and crime location was recorded within our sources. By using digital technologies to step back from our datasets, and see them in their entirety, we can develop a much fuller and more systematic understanding of the sources we are working with.

Slides:

Seeing things Differently: Visualising Data on Crime and Punishment

Transportation Under the Macroscope

Computers are brilliant microscopes. They make it easy to find needles in haystacks. Want to find references to the famous lawyer William Garrow amongst the millions of words in the printed reports of trials held at the Old Bailey, for instance? A keyword search produces the results in less than a second. Without computers it would take months. Likewise, as I explained in a recent post, through the techniques of data visualisation computers can be used to spot (what would otherwise be largely imperceptible) errors within the massive datasets that we are drawing upon in the Digital Panopticon project.

But computers are also fantastic macroscopes — today’s powerful digital technologies allow us to stand back from our sources and view them in their entirety. We can see the big picture, presenting complex and large-scale patterns in simple but effective ways. Microscopes allow us to see the infinitely small. Telescopes reveal the infinitely great. Macroscopes, meanwhile, peer in to the infinitely complex, allowing us to explore combinations, relationships and interactions between multiple elements.

By visualising the information recorded in the British Convict Transportation Registers, I’ve recently put penal transportation to Australia in the eighteenth and nineteenth centuries under the macroscope. This has produced some interesting insights into the relationship between Australian penal colonies, terms of transportation and how these changed and interacted over time.BCTR

The British Convict Transportation Registers database provides information on more than 123,000 offenders who were transported to Australia between 1787 and 1867. It’s a fantastic resource, and it will be at the heart of the Digital Panopticon project’s efforts to chart the criminal lives of London convicts sent to Australia. In charting these lives, we need to address some overarching starting questions. How many London convicts were actually transported to Australia for their crimes? Which parts of Australia were they sent to? How many years abroad did they face according to their sentences? Did this change over time, and what was the relationship between these different elements? Visualisations can help us to explore these questions across the long term and a large scale.

The total number of London convicts transported to Australia fluctuated greatly over the late eighteenth and nineteenth centuries, as Graph 1 below demonstrates. Relatively few convicts were transported in the years 1793–1804 when the Revolutionary War monopolised Britain’s shipping resources. With the end of the Napoleonic War in 1815 there were however large and rapid increases in the numbers of London convicts sent to Australia, reaching a massive peak in the 1830s. Thereafter, numbers gradually fell until the eventual abandonment of penal transportation in the 1860s. Interestingly, this pattern reflects a wider inverse relationship between the numbers of convicts transported and the years in which Britain was engaged in war throughout the eighteenth and nineteenth centuries.Graph 1

What Graph 1 doesn’t reveal is that the places in Australia where convicts were sent to changed over time. The individual penal colonies to which London convicts were sent operated at different times. As Graph 2 below shows, New South Wales was the first penal colony in Australia, and was later used alongside the penal colony of Van Diemen’s Land between the late 1820s and 1840, when transportation to Australia was at its peak. Following this, Van Diemen’s Land was used almost exclusively, until the 1850s, when Western Australia was the sole transportation location in Australia.

Graph 2

If the locations of penal transportation to Australia changed over time then so too did the lengths of time which offenders were sentenced to abroad. Between 1787 and the virtual abandonment of New South Wales as a penal colony in the late 1830s, as Graph 3 highlights, offenders were sentenced almost without exception to a term of 7 years, 14 years or life. Between 1840 and 1850, when Van Diemen’s Land was used exclusively, terms became more varied, with greater use of 10 and 15 year sentences. And especially after 1853, when Western Australia became the sole destination for transportees, an even greater variety of terms were put to use. This more nuanced tariff in transportation sentences was likely introduced to make transportation more favourable to penal reformers who increasingly viewed the practice with concern.

Graph 3

These changes in penal colony and terms of transportation were intimately linked, and the interaction between the two is clearly captured in Graph 4. The colonies operated at different times, and the law which underpinned them and the terms of transportation which could be imposed also changed in accordance. In short, the convicts who found themselves on the shores of New South Wales were primarily one of two kinds: either those sentenced to 7 years transportation; or those sentenced to a whole life abroad. By contrast, London convicts landing some 2,000 miles away on the shores of Western Australia and on the eve of transportation’s demise in the 1850s would each have had subtly different terms to serve out.

Graph 4

Through the macroscope of computer-generated visualisations, we can see these complex patterns and interactions in their entirety, spanning the breadth of Australia and the length of a century, taking in the lives of tens of thousands of individuals along the way.

Six PhD Studentships: Liverpool, Sheffield and Tasmania

The Digital Panopticon Project is delighted to announce the availability of six PhD studentships, funded by both the AHRC and the participating Universities.  These are exciting opportunities to exploit the rich resources collected by the Project while working within a large team of interdisciplinary experts in both the UK and Australia.

In each case, applications must be made to the institution at which the studentship will be held. Deadlines are as follows (please note update to Liverpool and Sheffield deadlines):

  • Sheffield: 28 July 
  • Liverpool: 28 July 
  • Tasmania: 31 July

Sheffield/Liverpool interviews will be held 11-12 August. The AHRC-funded studentship (Impact of digital history resources) is open to UK/EU students only. The other LIverpool and Sheffield studentships are also open to international students, but please note that only UK/EU-level tuition fees can be covered, and you would need to make up the fees shortfall. The studentships will also include a maintenance grant (currently around £13000 p.a.). Please contact UTAS for more details about eligibility/funding levels for the Tasmania studentship.

University of Liverpool

Longitudinal studies of the health of the poor

Using prison data (from both local prisons and national penitentiaries) this studentship will examine the height/weight and the health histories of working class men and women over the course of their lives. We have access to a huge and detailed database on the chronic and acute illnesses of thousands of prisoners in the British convict system, and they will allow the PhD researcher to examine what illnesses were prevalent, how they were treated, what impact they had over the lifetime of the prisoner, the longevity of life of the prisoner, and a range of other possible issues. This studentship will appeal to students of the history of medicine; social historians, and crime historians; and the student will be supported by an experienced team of interdisciplinary researchers and experts in convict/health history.

The lives and criminal careers of convicts in the 19th century

This studentship will follow, chart, and analyse the lives of offenders tried at the Old bailey both before their appearance at court, during their sentence, and afterwards when they were released. The PhD will examine the reasons why offenders began their criminal career, the impact that punishment in the British convict prison system had on them, and how that legacy carried over into their lives after they re-entered society. This is an exciting opportunity to study criminal careers using historical data, working with experts in the field. The studentship will appeal to researchers in nineteenth-century social history, history of crime, criminal careers, and/or desistence studies.

For more information on either of the Liverpool studentships:

  • Academic queries about the project and studentships should be addressed to Prof. Barry Godfrey, Barry.Godfrey@liverpool.ac.uk.
  • For information about applications contact Rebekah Hughes, slsjpgr@liv.ac.uk.

 

University of Sheffield

The Social and Spatial Worlds of Old Bailey Convicts, 1785-1875

The studentship will investigate the social and geographical origins and destinations of men and women convicted at the Old Bailey between 1785 and 1875, in order to shed light on patterns of mobility, the causes of crime, and understandings of identity in early industrial Britain.  Using evidence of origins from judicial records, the project will trace convicts from their places of origin, through residence and work in London before their arrests, to (if imprisoned) places of imprisonment and subsequent life histories.  Analysis of the language used in trial testimonies can provide an indication of how identities were shaped by complex backgrounds, and evidence of criminal and convict mobility has the potential to contribute to our understanding of geographical mobility and social integration before and after the introduction of the railroads.   This is an exciting opportunity to use newly assembled data to study the lives of non-elite people. The studentship will appeal to researchers interested in eighteenth- and nineteenth-century social history, the history of crime, and geographical and social mobility.

For more information, and to apply, go to http://www.sheffield.ac.uk/postgraduate/research/scholarships/projects/oldbaileyconvicts

The Impact of Digital Resources in the History of Crime

This project will examine the impact of the widespread availability of digital resources on attitudes towards crime and its history.  Core case studies will include the Old Bailey Proceedings Online, Founders and Survivors (records of the 73,000 men women and children who were transported to Tasmania), and, following its launch, the Digital Panopticon website.  This project will investigate both academic and non-academic uses of internet information provided in the UK and Australia, using a combination of quantitative and qualitative methodologies.  A wide range of sources can be used to measure the extent to which these sites have shaped how the history of crime has been written, and to assess their impact on users’ perceptions of the crimes and punishments, including individual criminal lives, documented on these websites.  It will also be possible to investigate how using these resources has shaped wider attitudes towards crime and punishment in contemporary society.  The studentship will appeal to researchers interested in the history of crime, public history, and the digital humanities. AHRC-funded.

For more information, and to apply, go to http://www.sheffield.ac.uk/postgraduate/research/scholarships/projects/digitalresources

Criminal Recidivism in 18th and 19th-Century London

The eighteenth and nineteenth centuries witnessed the development of the concepts of habitual  offending and the criminal class.  Taking advantage of the extensive records of both petty and serious crime digitised and linked together by the Digital Panopticon project, this studentship will investigate these phenomena from the perspective of the judicial records, by tracing the incidence and character of repeat offending.  The project will seek to understand the extent to which multiple arrests were a product of policing and/or underlying criminal activity, to identify the social and cultural factors which made some Londoners prone to reoffending and rearrest, and to examine the relationship between the chronology of recidivism and the evolution of contemporary thought about reoffending.  This research will allow the student to draw some conclusions about both the causes of crime and the background to nineteenth-century thought about crime.  It will appeal to researchers interested in the history of crime and policing, and the social history of eighteenth- and nineteenth-century England more generally.

For more information, and to apply, go to http://www.sheffield.ac.uk/postgraduate/research/scholarships/projects/criminalrecidivism

University of Tasmania

Labour Markets and Convict Offending

Who amongst the convicts sent to Britain’s nineteenth-century penal colony in Van Diemen’s Land were put to hard labour or ordered to work in irons? Did these patterns change over time, and if so, were they driven by convict behaviour, changes in penal administration, or the performance of the wider colonial economy?  This project will provide an outstanding opportunity for a student with a background in history, economics or sociology to explore these questions while working as part of an international team of researchers. As well as conducting their own archival research the successful applicant will be given access to an extensive existing database of convicts and associated records.

Applications for the Tasmania studentship close on 31 July.

For more information contact Trevor Scaife, Trevor.Scaife@utas.edu.au

 

Men as Wives: Visualising Errors in the Old Bailey Proceedings Data

In a recent post I talked about some of the ways in which data visualisations have helped me to see patterns in the information recorded in the Old Bailey Proceedings on things such as crimes, verdicts, punishments and the ages of defendants, patterns that might otherwise have been missed if using traditional methods of representing data such as tables. Here I just want to give a brief update on my analysis of the Proceedings, particularly the recording of defendant occupations and social status in the Proceedings in the eighteenth and nineteenth centuries. Again, visualisations have been extremely useful, especially in identifying errors in the data.

As with the recording of defendant ages, it might well be the case that information on the occupation/social status of those tried at the Old Bailey in the eighteenth and nineteenth centuries could be useful to us on the Digital Panopticon project in tracing offenders across different sets of records. Just as an age or a birth date might allow us to establish whether the “John Smith” tried at the Old Bailey and the “John Smith” transported to Australia was indeed the same person, likewise information on occupation or social status can help us to prove/disprove such name matches across records. But as with ages it depends on how extensively, and in what manner, such information on occupation/social status is recorded in our sources. And to this end, as with information on defendant age, the techniques of data visualisation can be useful.

Searches of the Proceedings for defendant occupation/social status can be carried out using the “custom search” page of the Old Bailey Proceedings Online.

Searches of the Proceedings for defendant occupation/social status can be carried out using the “custom search” page of the Old Bailey Proceedings Online.

However, whereas with defendant ages I was able to use the “statistics search” function of the Old Bailey Proceedings Online to generate numbers for analysis, this wasn’t possible in the case of defendant occupation/social status. In the process of digitising the original trial reports, defendant occupation was indeed tagged as a distinct category of information, and thus it can be searched for systematically in the “custom search” page of the Old Bailey Proceedings Online. But this can’t be used to quantitatively analyse the recording of defendant occupations in the Proceedings. In order to do this I needed to look at the website’s underlying data file of defendant information.

This is a large file which includes numerous fields of tagged information relating to all the defendants tried at the Old Bailey and reported in the Proceedings. Since much of this information is in the form of text rather than numbers, software such as Excel isn’t very useful in analysing the data. Instead I turned to Tableau Public, a free, web-based tool that is powerful but still easy to use. There are numerous other data visualisation tools available which are ideal for novices. All need to be used with caution, but used carefully they can be invaluable. (I’m going to talk in more detail about the actual process of using tools such as Tableau to undertake crime history in my next post, so watch this space.)

By running our file on Old Bailey defendant information through Tableau I’ve been able to create some fairly simple but nonetheless useful visualisations. For the data on defendant occupation and social status this has revealed two things in particular.

Pie chart demonstrating frequency of recording defendant occupation

Pie chart demonstrating frequency of recording defendant occupation

First of all, it has highlighted how little information we actually have on the occupational and social status of Old Bailey defendants from the seventeenth to the twentieth centuries. Across the entire publication history of the Proceedings between 1674 and 1913, occupation or social status is recorded for only 11% of all the defendants put on trial. In the years 1755 to 1834, occupation/social status is recorded for 15% of defendants, but between 1834 and 1906 virtually no defendants’ occupations were recorded. On the whole, therefore, we have occupation information for only a small proportion of defendants, and none at all for our specific period c. 1787-1875.

The sheer variety of occupations that are recorded in the Proceedings were also made clear by visualising the data. The bubble chart below for example give an indication of this, and the relative frequency with which different categories are recorded. One of the problems is that the same occupations were recorded in the Proceedings in slightly different ways (“servant” and “servants”, for example) or with variant spelling (such as “taylor” and “tailor”). If we wanted to utilise occupation or social status labels to verify name matches across sets of records this suggests that we would need to use sophisticated forms of keyword searching.

Bubble chart showing categories of defendant occupation

Bubble chart showing categories of defendant occupation

Bubble chart of defendant occupations by gender

Bubble chart of defendant occupations by gender

But visualisations have been especially useful in highlighting some of the errors in the recording of occupations within the Old Bailey Proceedings data. One of the things that I wanted to find out was how occupation labels varied according to the gender of the defendants tried at the Old Bailey. In order to do this I used Tableau to create the following bubble chart of the most common forms of recorded occupations/social status for male and female defendants in the years when we have significant amounts of information on this. One of the things that really struck me in this bubble chart was the amount of men whose occupation label is recorded in our Proceedings dataset as “wife”. This clearly seemed to be an error in the data, but I wanted to know what the source of the problem was so I went back to the original data file and filtered it for male defendants with the occupation/social status label of “wife”. And I then looked at the trial reports in the Old Bailey Proceedings for these cases.

Trial report in the Old Bailey Proceedings in which the husband of a female defendant has been tagged with the social status of “wife”

Trial report in the Old Bailey Proceedings in which the husband of a female defendant has been tagged with the social status of “wife”

It turns out that many of these cases were due to errors in the digitisation process which resulted from the unusual nature of the trial reports themselves. The cases were actually ones (such as this example below) in which a female defendant had been named in the trial report as the wife of her husband, and thus the automated tagging process used to digitise the Proceedings had recorded both the husband and the wife as defendants and assigned them both the role of “wife”. This practice in the Proceedings of naming the female defendant as the wife of her husband largely disappeared in the nineteenth century, and therefore most of these errors in the data file tend to come from the eighteenth century. By identifying these kinds of anomalies, visualisations therefore allow us to find errors in the data. Such errors can then be rectified. This leaves us with a much “cleaner” dataset, and thereby increasing the chances of successful record linkage.

Historians of crime (particularly the history of crime in Britain) have been quick to exploit the plethora of digitised criminal justice (and associated) records that are now available online. We all make us of resources such as the Old Bailey Proceedings Online, Eighteenth-Century Collections Online and digitised newspapers. But whilst we have been quick to take advantage of the benefits offered by these digitised records – such as keyword searching to find needles in haystacks – we have been less ready to understand the full effects of the digitisation process for how we study our sources and the information that we extract from them. By using data visualisations we can better understand the implications of digitisation, including the ways in which the actual process of turning a paper record into a digital format might result in errors (relatively rare, it should be said, in the case of the Old Bailey Proceedings Online) in the information we compile.

Seeing things differently: Visualizing patterns of data from the Old Bailey Proceedings

An OBP

An edition of the Old Bailey Proceedings

The Old Bailey Proceedings are a rich historical resource, almost unimaginably so. They constitute the largest body of texts detailing the lives of non-elite people ever published. Words alone can’t quite do justice to the magnitude of the Proceedings – 197,745 accounts of trials covering 239 years (1674-1913); some 127 million words of text (at an average reading rate of 250 words per minute, this would take eight hours’ solid reading every single day for nearly three years to get through!); details of some 253,382 defendants, including name, gender, age and occupation, as well as details of 223,246 verdicts passed by the juries and 169,243 punishments sentenced by the judges.

The Proceedings clearly contain a huge amount of information, but they don’t record everything – like any historical source, they are selective in what they document. The amount of information that was recorded in the Proceedings on crimes, verdicts, punishments, defendants and so on also varied over time. And whilst the digitization of the Proceedings by The Old Bailey Online has revolutionised the way in which we search and use this rich historical resource, this also has its limits. The marking-up of the text of the Proceedings (assigning tags to particular pieces of information in the text – such as name or crime – so that this information can be systematically searched) makes it possible to undertake sophisticated statistical analysis. Crimes, verdicts, punishments, defendant age and defendant gender can all be counted at the click of a mouse. Nevertheless, marking-up inevitably involves choices (about what information to tag and the level of detail that is tagged), and those choices limit the ways in which the Proceedings can be studied using computers.

Statistical searches of the Proceedings can be carried out through The Old Bailey Online

Statistical searches of the Proceedings can be carried out through The Old Bailey Online

The question that we might ask, then, is what are the limitations of the Proceedings as a source of data on such things as punishments, defendant age and gender? Taking the Proceedings in their entirety, what are the limits in terms of the information that was recorded in the original trial reports? How frequently, for example, was the age of the defendant recorded? And what are the limits in terms of what we can actually search for systematically using digital technologies? Can we, for instance, systematically determine the lengths of imprisonment which offenders were sentenced to?

These are crucial questions for us because the Digital Panopticon will rely so heavily on the Proceedings as a source: in our effort to trace the life histories of offenders who were sentenced to transportation or imprisonment at the Old Bailey between 1787 and 1875, the Proceedings will obviously be a vital source of information. After identifying those who were sentenced to transportation or imprisonment recorded in the Proceedings we will then try to trace such individuals both before and after their conviction by linking the Proceedings with other sets of records.

In trying to better understand the limitations of the Proceedings as a source of data for the Digital Panopticon project, I have recently been making use of data visualization (‘dataviz’) – using computers to create visual representations of numbers. This includes the traditional graphs and pie charts that we are all familiar with, and which I will be talking about here. But it also includes more complex forms of visualization which I will be looking at in future posts (watch this space!).

Since the Proceedings contain such a vast amount of information, manual counting and tables are therefore inadequate in making sense of the data. Turning the raw numbers into a visual form makes it much easier to see overall patterns in the data. Here I give just a brief example of how dataviz has helped me to see the Proceedings differently, to appreciate the limits of this immense historical resource, and to think about how information from the Proceedings can be used most effectively in the Digital Panopticon project.

A data visualisation of the length of trial reports in the Proceedings over time, created by The Datamining with Criminal Intent project

A data visualization of the length of trial reports in the Proceedings over time, created by  William J. Turkel as part of the Datamining with Criminal Intent project (created using Mathematica 8)

One of the key things we want to know on the Digital Panopticon is how useful age data might be in helping us to link offenders recorded in the Proceedings with individuals documented in other sets of records (such as the convict transportation registers or census records). In the first instance, links will be made through name searches of the different types of records. But how can we be sure that the John Smith recorded in the Proceedings is the same individual as the John Smith recorded in the prison parole registers, for example? Age data might help us here. If John Smith is recorded as being 24 years’ old in the Proceedings at the time of his sentence to two years’ imprisonment at the Old Bailey, and the John Smith recorded in the parole registers is stated to be 26 years’ old, then we can be confident that this is indeed the same person. By the same token, if the John Smith recorded in the parole registers is said to be 60 years’ old, this would suggest not.

Ages could then be extremely useful, but it depends on how extensively, and how accurately, age data is recorded in the Proceedings (and our other sets of records). By visualizing the results of quantitative searches of the Proceedings we can get a clear sense of this, far more so than through the use of text-heavy tables which can be hard to “read” for patterns. A statistical search using The Old Bailey Online reveals that 171,168 defendants are recorded in the Proceedings in the years 1755-1870. Of these, age is recorded for 101,364 (59.3%) of them. So for the entire period of our study, we have age data for just over half of all the defendants at the Old Bailey.

Further digging into the data and visualisation of the findings reveals some of the deeper patterns in the age data. In the first instance, the recording of ages only began in the year 1790 for defendants found guilty, and from the 1860s for those found not guilty, as shown in the graph below. In the 1790s, we have age data for 65% of guilty defendants, increasing to 90% and above thereafter. By contrast, age data for the not guilty is missing until at least the 1850s, and in earnest until the 1860s.

Visualisation demonstrating the extent of age recording over time and by verdict

Visualization demonstrating the extent of age recording over time and by verdict

This gives a sense of how extensively ages are recorded in the Proceedings over time, and according to which categories of offenders. By visualizing the patterns of recorded ages we can also get a feel for how ages were actually recorded. The graph below, for instance, suggests that there was a tendency to revise the defendant’s recorded age up or down slightly to match a round figure. The numbers of defendants whose ages are recorded as 30, 40, 50 and (to a lesser extent) 60 are all significantly above the number we might expect according to the moving average (in other words, when the yellow bar goes above the green line in the graph). By contrast, ages just either side of these figures (such as 29, 31, 39, 41 and 51) are systematically below the average (when the yellow bar is below the green line). It may well also have been the tendency for those in their early twenties to have their recorded ages revised down to 18 or 19, since these two ages are also well above the expect number. In short, many more defendants were recorded as being 30 rather than 31, or 40 rather than 41, and the scale of the difference suggests that this resulted from a deliberate policy of revising the defendant’s age up or down to match the nearest round figure.

Visualisation demonstrating the “bunching” of recorded ages at 30, 40, 50 and 60

Visualization demonstrating the “bunching” of recorded ages at 30, 40, 50 and 60

Together this suggests that age data in the Proceedings will be of much use to us in the Digital Panopticon, particularly for the defendants found guilty and subsequently sentenced to transportation or imprisonment. In this instance we have extensive amounts of age data from 1790 onwards. In the case of our not guilty control group, however, we have no age data available in the Proceedings to work with before the 1860s. In this instance we will be reliant on other categories of information to link the not guilty defendants across datasets. And in light of the seeming tendency for recorded ages to be rounded up or down, this suggests that when we use age data to link individuals across datasets it would be more effective to work within age ranges rather than trying to compare specific numbers.

From these early explorations it seems clear that visualization will be invaluable in helping us to identify the overall patterns in the data of the Proceedings. The first step in this is identifying some of the limitations in terms of the information recorded in the Proceedings. Traditional forms of visualization are useful to this end. But there are also potential benefits in going beyond this, by using more complex forms of visualization to uncover deeper patterns in the data – patterns that would be difficult to detect through simple graphs or charts. This is what I will be turning to next.