In a recent post I talked about some of the ways in which data visualisations have helped me to see patterns in the information recorded in the Old Bailey Proceedings on things such as crimes, verdicts, punishments and the ages of defendants, patterns that might otherwise have been missed if using traditional methods of representing data such as tables. Here I just want to give a brief update on my analysis of the Proceedings, particularly the recording of defendant occupations and social status in the Proceedings in the eighteenth and nineteenth centuries. Again, visualisations have been extremely useful, especially in identifying errors in the data.
As with the recording of defendant ages, it might well be the case that information on the occupation/social status of those tried at the Old Bailey in the eighteenth and nineteenth centuries could be useful to us on the Digital Panopticon project in tracing offenders across different sets of records. Just as an age or a birth date might allow us to establish whether the “John Smith” tried at the Old Bailey and the “John Smith” transported to Australia was indeed the same person, likewise information on occupation or social status can help us to prove/disprove such name matches across records. But as with ages it depends on how extensively, and in what manner, such information on occupation/social status is recorded in our sources. And to this end, as with information on defendant age, the techniques of data visualisation can be useful.
However, whereas with defendant ages I was able to use the “statistics search” function of the Old Bailey Proceedings Online to generate numbers for analysis, this wasn’t possible in the case of defendant occupation/social status. In the process of digitising the original trial reports, defendant occupation was indeed tagged as a distinct category of information, and thus it can be searched for systematically in the “custom search” page of the Old Bailey Proceedings Online. But this can’t be used to quantitatively analyse the recording of defendant occupations in the Proceedings. In order to do this I needed to look at the website’s underlying data file of defendant information.
This is a large file which includes numerous fields of tagged information relating to all the defendants tried at the Old Bailey and reported in the Proceedings. Since much of this information is in the form of text rather than numbers, software such as Excel isn’t very useful in analysing the data. Instead I turned to Tableau Public, a free, web-based tool that is powerful but still easy to use. There are numerous other data visualisation tools available which are ideal for novices. All need to be used with caution, but used carefully they can be invaluable. (I’m going to talk in more detail about the actual process of using tools such as Tableau to undertake crime history in my next post, so watch this space.)
By running our file on Old Bailey defendant information through Tableau I’ve been able to create some fairly simple but nonetheless useful visualisations. For the data on defendant occupation and social status this has revealed two things in particular.
First of all, it has highlighted how little information we actually have on the occupational and social status of Old Bailey defendants from the seventeenth to the twentieth centuries. Across the entire publication history of the Proceedings between 1674 and 1913, occupation or social status is recorded for only 11% of all the defendants put on trial. In the years 1755 to 1834, occupation/social status is recorded for 15% of defendants, but between 1834 and 1906 virtually no defendants’ occupations were recorded. On the whole, therefore, we have occupation information for only a small proportion of defendants, and none at all for our specific period c. 1787-1875.
The sheer variety of occupations that are recorded in the Proceedings were also made clear by visualising the data. The bubble chart below for example give an indication of this, and the relative frequency with which different categories are recorded. One of the problems is that the same occupations were recorded in the Proceedings in slightly different ways (“servant” and “servants”, for example) or with variant spelling (such as “taylor” and “tailor”). If we wanted to utilise occupation or social status labels to verify name matches across sets of records this suggests that we would need to use sophisticated forms of keyword searching.
But visualisations have been especially useful in highlighting some of the errors in the recording of occupations within the Old Bailey Proceedings data. One of the things that I wanted to find out was how occupation labels varied according to the gender of the defendants tried at the Old Bailey. In order to do this I used Tableau to create the following bubble chart of the most common forms of recorded occupations/social status for male and female defendants in the years when we have significant amounts of information on this. One of the things that really struck me in this bubble chart was the amount of men whose occupation label is recorded in our Proceedings dataset as “wife”. This clearly seemed to be an error in the data, but I wanted to know what the source of the problem was so I went back to the original data file and filtered it for male defendants with the occupation/social status label of “wife”. And I then looked at the trial reports in the Old Bailey Proceedings for these cases.
It turns out that many of these cases were due to errors in the digitisation process which resulted from the unusual nature of the trial reports themselves. The cases were actually ones (such as this example below) in which a female defendant had been named in the trial report as the wife of her husband, and thus the automated tagging process used to digitise the Proceedings had recorded both the husband and the wife as defendants and assigned them both the role of “wife”. This practice in the Proceedings of naming the female defendant as the wife of her husband largely disappeared in the nineteenth century, and therefore most of these errors in the data file tend to come from the eighteenth century. By identifying these kinds of anomalies, visualisations therefore allow us to find errors in the data. Such errors can then be rectified. This leaves us with a much “cleaner” dataset, and thereby increasing the chances of successful record linkage.
Historians of crime (particularly the history of crime in Britain) have been quick to exploit the plethora of digitised criminal justice (and associated) records that are now available online. We all make us of resources such as the Old Bailey Proceedings Online, Eighteenth-Century Collections Online and digitised newspapers. But whilst we have been quick to take advantage of the benefits offered by these digitised records – such as keyword searching to find needles in haystacks – we have been less ready to understand the full effects of the digitisation process for how we study our sources and the information that we extract from them. By using data visualisations we can better understand the implications of digitisation, including the ways in which the actual process of turning a paper record into a digital format might result in errors (relatively rare, it should be said, in the case of the Old Bailey Proceedings Online) in the information we compile.