Category Archives: Events

Record Linkage Workshop Report, Part 1

In the first half of this workshop on record linkage we had three fantastic papers from guest speakers who were invited to talk about their own experiences of conducting record linkage in historical research. Each speaker offered a different perspective on the subject, allowing us to think about a wide range of issues relating to record linkage and generating ideas which will be extremely useful to us on the Digital Panopticon.

Jeremy Boulton — ‘Place, Mobility and Class Barriers: The Perils and Possibilities of Nominal Linkage in the Metropolis’

Jeremy Boulton of the University of Newcastle got the event off to a fantastic start with a fascinating and though-provoking window into his self-confessed ‘gruesome fascination’ with nominal record linkage. Reflecting on his experiences as part of the Pauper Lives in Georgian London and Manchester project, Jeremy spoke about the broader methodological (rather than strictly technical) issues associated with record linkage, highlighting both the benefits, but also the inherent dangers, of linking individuals across multiple historical records.

On the one hand, when carried out successfully, nominal record linkage can be an effective means by which to check the accuracy of our historical records. Whilst perfect accuracy is beyond attainment in historical record linkage (as E. A. Wrigley said many years ago, and which still holds true today), nevertheless the creation and collation of successful links allows us to identify the (otherwise imperceptible) lies and concealments of the people being record.

On the other hand, of course, the difficulties associated with nominal record linkage makes the successful creation of links (and thus exposing the ‘fiction in the archives’) a problematic task. Transcription errors (by both the original scribes and present-day transcribers) will defeat even the most sophisticated linkage methodologies, and confirming information can’t always be obtained.

In the latter part of his paper, Jeremy presented an absorbing case-study of the nominal record linkage of Godfrey Sykes, widely documented in sources such as pollbooks, newspapers, the London electoral database and charity subscriber registers — an apparently respectable Georgian businessman who, it turns out from further digging into the historical sources, fathered four bastards with a woman named Ann Farmer.

Gill Newton — ‘Urban Record Linkage before 1754’

Next, Gill Newton of the University of Cambridge shifted the focus onto the nuts and bolts of record linkage — a paper rich in technical detail which provided the audience with a valuable toolkit for undertaking record linkage, even for the particularly challenging context of creating re-constituted families from eighteenth-century London.

Starting with an informative background on the contents of an eighteenth-century parish register and what is meant by a re-constituted family, Gill then noted some of the key challenges which face any researcher looking to undertake urban record linkage. These include a high level of population turnover; rapid growth from migration; blurred parish and administrative boundaries; and a high risk of mistaken identities. There are, however, advantages to linking urban records, such as more detailed registers; a more diverse name base; the ability to sample viably; and the further information generated by civic administration.

Gill then treated us to a fascinating discussion of name distribution in eighteenth-century parish registers. Forenames were heavily bunched around the most common names (John, Mary, Elizabeth etc.). By contrast, whilst some surnames constituted a large proportion of the whole (such as Smith), the distribution of surnames had a much longer ‘tail’ compared to forenames. Moreover, there were stark differences in the patterns of name distribution between rural England and London.

Finally, Gill highlighted some of the most important tools for undertaking nominal record linkage, including phonetic matching and surname dictionary examples, as well as the principles of algorithmic record linkage. She offered some extremely useful tips on how to maximize the quality of the linkages created, emphasising that successful matching requires careful attention and a rigorous methodology — in other words, the cautionary mantra with record linkage should be: ‘garbage in, garbage out’.

Ciara Breathnach — ‘Irish Records Linkage 1864–1913: Big, Macro and Micro Data’

In the final paper of this first session, Ciara Breathnach from the University of Limerick talked about the approach and some of the findings from the Irish Record Linkage 1864–1913 project, on which she is the principal investigator. Funded by the Irish Research Council, and developed in partnership with the Digital Repository of Ireland, University of Limerick and Insight at NUI Galway, the project aims to provide a comprehensive map of infant and maternal mortality for Dublin from 1864 to 1913. The project will reconstruct family units and create longitudinal histories by linking records of Birth, Marriage and Death, which together include millions of name instances.

Starting with an overview of the Irish Record Linkage project, Ciara then discussed some of the forces which served to shape the recording of census and civil data in nineteenth-century Ireland, before moving on to discuss some of the differing definitions of ‘Big Data’, a term about which there is seemingly little agreement.

Ciara also provided useful information on the ontologies utilised by the Irish Record Linkage project, describing the ways in which the data has been analysed and linked, noting the necessity (in the case of such extensive numbers of available records) to sample in order to make such a project feasible.

Finally, through a case-study of Achill in Dublin Ciara presented a glimpse of the significant findings already generated by the Irish Record Linkage project. By mapping infant deaths in the parish in the 1890s, Ciara revealed the nature of the relationship between child mortality and the geography of local health care (in the form of doctors and nurses) in late nineteenth-century Ireland. As Ciara concluded, it is through these kind of detailed micro-level studies, produced by record linkage at the macro level, that we can gain a better understanding of the past.

We are very grateful to all three speakers for providing us with so much food for thought, and so many ideas to follow up!

Event: Record Linkage Workshop, Sheffield, 4 November 2014

We’re delighted to be able to announce our second project workshop.

It’s another afternoon workshop, this time in Sheffield, and the subject is Record Linkage (part of the Epistemologies research theme). We’re particularly interested in the challenges and rewards of applying automated (and semi-automated) nominal record linkage to very large-scale historical datasets, with all their variability, fuzziness and uncertainties; our work on the project starts from these questions:

How can we improve current record-linkage processes to maximise both the number of individuals linked across different datasets and the amount of information obtained about each individual? What is the minimum amount of contextual information needed in order to conduct successful large-scale record linkage of data pertaining to specific individuals?

In addition to presentations about our work from project team members, we have three guest speakers who will bring extensive experience of historical record linkage projects:

We think this will add up to a stimulating programme and discussion that will be of interest to many historians who need to link information about large numbers of individuals and using data that is continually growing in diversity and scale.

Download: Workshop Programme/Flyer (pdf).

Workshop Information

When: 2-5.30pm, Tuesday 4 November 2014
Where Humanities Research Institute, Gell Street, Sheffield

Attendance is free but numbers may be limited so you will need to register in advance: email Sharon Howard (sharon.howard@sheffield.ac.uk).

Event: DP @ British Crime Historians Symposium, Liverpool, September 2014

We’re delighted that our quest to take over the entire known universe of the history of crime continues with a panel session at this year’s British Crime Historians Symposium:

The Digital Panopticon: New perspectives on criminal justice records and the practice of transportation

  • Robert Shoemaker, ‘Identifying the criminal: The state and record keeping in the eighteenth and nineteenth centuries’
  • Richard Ward, ‘Seeing things differently: Visualising data on crime and punishment’
  • Lucy Williams, ‘Bound for Botany Bay? Assessing the differences between Old Bailey penal sentences and their implementation’

Event: Digital Humanities Congress 2014, Sheffield

Date: 4-6 September 2014
Location: The Edge, University of Sheffield, UK
Website: http://www.shef.ac.uk/hri/dhc/dhc2014

The Digital Humanities Congress is a conference held in Sheffield every two years. Its purpose is to promote the sharing of knowledge, ideas and techniques within the digital humanities.

Members of the Digital Panopticon project team will be discussing work on the project so far and related themes at this conference, in particular at two AHRC Digital Transformations roundtable sessions (visualising data; scaling up the arts and humanities). See the programme for more details.

Registration is now open and there are early bird discounts until 16 July. There are also discounted rates for postgraduate students.

Visualising Life-Grids and Narrating the Lives of Convicts

One of the great opportunities presented by the Digital Panopticon project (and one of the most exciting in my opinion) is in uncovering more about the processes of crime and punishment by placing thousands of offenders, and their offences, back within the context of their own lives.

Tracing offenders through the records has been a preoccupation of several groups of historians and criminologists (for example Barry Godfrey, Heather Shore, Pam Cox, David Cox, Helen Johnston, Zoe Alker, Joanne Turner, and Stephen Farrall) in the last decade. On account of the laborious nature of record linkage those studies which have focussed on tracing groups offenders through civil as well as criminal datasets have been able to examine a few hundred offenders at a time. Those pioneering this methodology have taken the collected information and sorted it into ‘lifegrids’ which chart life events and changes for each individual. Lifegrids might typically include details of birth marriage and death, family evolution, employment and residential addresses, and offending and punishment history. Of course, the depth and breadth of documents and information available on different groups of, or individual offenders, dictates how much material can be recorded in each life grid.

Other than life-grid format, there are a number of ways that this information can be presented and communicated. Even the simplest visualisations are able to show the role that offending had in any one person’s life. This might be through indicating what proportion of an individual’s life was spent in custody, or how many offences were recorded against them at what stage of their life. It is possible to chart how someone’s offending accelerated and decelerated. From an institutional perspective it is possible to indicate how an individual’s weight and health changed over time, or how their behaviour and privileges impacted upon their experience of punishment. The myriad of ways in which this fascinating and complex data can be presented has some exciting potential for how others see, interrogate, and engage with this fantastically rich data.

To begin to explore these possibilities, we have been working with an example offender: Patrick Madden (one of a number of offenders included in Johnston, Godfrey and Cox’s ESRC funded research on ‘The costs of imprisonment’).

P Madden

Born and raised in Sheffield, Patrick began offending around the age of sixteen. Although often motivated by property, Patrick’s offences were primarily violent in nature. Madden had 15 offences recorded against him over an almost thirty year period. Each of these was committed either in Sheffield or other close-by northern towns such as Wakefield and Doncaster. It was in these locations that he was incarcerated, accept for one occasion of penal servitude when he served seven years of penal servitude in London, and the south of England. It does not appear as if Patrick ever married or had children, nor that he managed to establish a life for himself that did not involve repeat offending for long before dying at the age of 52.

 

Patrick Maddens lifegrid, of course, contains much more information than this brief overview might suggest. Patrick’s civil and penal records allow us to know about many elements of Patrick’s life right down to his familial relationships and sexual preferences. However, even if we take the most ‘bare bones’ approach to Patrick’s life narrative, it is possible to start creating some interesting visualisations based on his experiences and offending history.

DataHero Patrick Madden years of imprisonment in life course (1) DataHero Patrick Madden type of offending over life course

 

DataHero Weight over period of imprisonment line DataHero Penal class over time of imprisonment

 

Yet the size and scale of the research being undertaken by the Digital Panopticon means that we are faced not just with presenting Patrick Madden’s life, but instead the lives of all of the ‘Patricks’ that went through the old bailey between the late 18th and early 20th centuries. This poses two distinct challenges which we will face in presenting the mass of information traditionally held in lifegrids.  First is that the range of records being linked together for each offender is unprecedented. Some records are well known to our researchers and relatively straightforward to visualise, such as criminal registers that allow us to examine date, place and type of offence. Others such as the changing picture of family life that might evolve from three successive census entries, or the seemingly random personal or professional information that can be carried in a newspaper report, are far more difficult to quantify and visualise. This first problem will become clearer and hopefully less significant as more records are collected and linked. It should be fairly straightforward to identify the information which can be presented easily, and to adapt that which cannot. The second challenges we must meet is that of potentially presenting to other researchers and the public tens of thousands of individual life and offending histories. What we need to work on is finding a way of presenting a range of different information about our offenders both individually and in aggregate so that it is possible for users to access information about an individual they are interested in, but also to see how such an individual compares and contrasts with others in the study – something which enables researchers to identify how typical an individual’s experience was.

BG offered some initial ideas of how we might best achieve this when we met in Oxford. By creating ‘strand’ visualisations which present a mass of offenders by a few ‘key values’ –  for example the year of their first recorded offence, nature of offence, or length of offending career – and then allowing users to further restrict what strands are shown to them by other values – for example sex and location- it would be possible to access information about a single individual, whilst getting a sense of how they match up to their contemporaries.

BG visualisation

We hope that this will prove an excellent starting point as we work to develop future visualisations and methods of presentation which will allow the Digital Panopticon team, fellow researchers, and members of the public to explore, understand, and get the most from the fantastic wealth of data at our fingertips.

 

Visualising Data Workshop Report: part 2

The second half of the workshop was devoted to work in progress and plans for the Digital Panopticon – I’ll say less about these than those in part 1 because longer versions should be appearing (or have already appeared) here on the blog!

Barry Godfrey briefly introduced the project and the challenges of visualisation of our data.

  • we’re looking at systematic changes in punishment over a long period of time (late 18th to early 20th century); but we’re also looking at individuals over their lifetimes and at many thousands of individuals.
  • It’s not just about temporality: we’re also deeply concerned with spatiality – not simply the long distance movement of transportation but movement within Britain.
  • another theme of the project is ethical – the responsibilities of revealing so much information about people: how much does this extend to visualisation too?
  • finally, there are many potential audiences for DP data visualisation – in addition to researchers and academics, students, teachers, genealogists and other non-traditional users of criminal data. How to cater for so many different people and their needs?

Jamie McLaughlin demonstrated some of our early explorations in record linkage and data visualisation, including a number of Sankey diagrams to show connections between two datasets (Old Bailey Proceedings and British Convict Transportation Register). In particular, he’s been comparing the outcomes for defendants sentenced to transportation and those who were sentenced to death which was subsequently commuted to transportation. Another topic of interest is the people sentenced to be transported who don’t subsequently turn up in the transportation records: what happened to them? Can we find them again elsewhere?

Richard Ward focused on visualising (again, extensively using Tableau Public)  a single dataset, the Proceedings, and covering much of the ground on questions of age in his recent blog post here(I learned along the way that the proper demographic term for the tendency to round ages is age heaping.) He also introduced the topic of occupations/status labels – which are problematic in the Proceedings for a number of reasons – and hopefully this will be covered in his next blog post. [slides]

Barry and Lucy Williams rounded off the session by looking at the challenges involved in visualising life grids. Barry’s previous research on 600 prisoners used a wealth of different sources including licenses, medical sources, and other prison records, as well as civil data, and tried to build up as complete a picture as possible of each prisoner’s whole life: this was summarised in life grids. We looked at interesting options for visualising the life of a single prisoner – but how to multiply up to thousands of them? [blog post]

The following discussion introduced a number of suggestions and possible ideas and resources to follow up. Certain themes however, resurfaced throughout the day as key issues:

The importance of seeing data visualisations as part of a process with changing needs and purposes over the course of the project, and for different people. Part of the challenge is that we want to cater not just for the specific research agendas of the project team members but also for a range of other researchers.

The twin challenges of scaling up and the very long period of time we’re covering; but also the sheer variety of different types of source and data that we’re dealing with. The Proceedings are a very different kind of record from the (mostly) highly structured tabular data of Founders and Survivors, and from the English imprisonment records we’ll be working with.

It was all in all a great day! We were bowled over by the wealth of ideas from our three external speakers and the additional input of everyone who attended for the day, not least Andrew Prescott: thanks to everyone who came for making it such an enjoyable and stimulating event. And I’d add a final thank you to Deb Oxley for organising the event and being a splendid host.

Visualising Data Workshop Report: part 1

The first half of the workshop consisted of speakers we invited to introduce the ways in which they have used visualisation in research, and look at how these could be useful to the Digital Panopticon and researchers attending the event. I’ve included as many links to relevant resources as I could find. (See also the Storify of the event.)

Professor Min Chen of the Oxford e-Research Centre got the day off to a great start. He treated us to a dizzying array of examples of different kinds of visualisations, emphasising the importance of who visualisations are being created for. He surveyed the long history of data visualisation and outlined four levels of visualisation:

  1. disseminative (‘this is’) – presentational aids for dissemination
  2. operational (‘what?’) – enable intuitive and speedy observation of captured data
  3. analytical (‘why?’) – investigative, can be used to examine complex relationships
  4. inventive (‘how?’) – aid improving existing models, methods etc

He also got us to think about ‘modes’ of visualisation, the different perspectives/needs of analysts, presenters and viewers. Question asked: ‘what would be a visual language for the Digital Panopticon?’ – taking into account the different kinds of data we’re working with.

These were just some of the examples!

  • Poem Viewer from the Imagery Lenses for Visualizing Text Corpora project (Oxford and Utah collaboration) – designed to support close reading by visualising the sounds of poetry.
  • Temporal Visualization of Boundary-based Geo-information Using Radial Projection – visualising movement of 200 glaciers over 10 years (recorded in satellite images). This was highly challenging: line graphs were too messy, maps not very helpful; a solution was found in radial visualizations.
  • Visualizing facial dynamics – humans are very good at expression recognition, but computers are terrible; project investigating methods to do this
  • Use of glyphs (simple stylised icons) rather than text labels in complex workflow diagrams, and to enable display of multiple measurements simultaneously.
  • Idea of parallel coordinates for visualising multi-dimensional data. (Lots of interest in this!)
  • How to visualise time without animation? – summarising into a single picture can help to see patterns.

Next, William Allen of the Oxford Migration Observatory talked about ‘Doing the Best with Data: critical realism and visualisation’. The Observatory’s goals are to communicate social science research beyond academia; migration is complex and doing this accessibly is challenging, so they make extensive use of visual techniques.

Visualisations are appealing, as they appear to offer comprehensive and independent windows, but actually achieving this needs to approach visualisation as an iterative and critical process. Use of critical realism approach as a lens for evaluation, critical testing of given categories. Rather than ‘what works?’ it’s better to ask ‘what about this visualisation works, for whom in which contexts, for what purposes?’

The media monitoring project was set up to monitor and analyse systematically what the press actually say about migration, over a period of time. Analysis of how press portrays migrant groups uses corpus linguistic methods (43 million words for 2010-12!). Allen showed us a number of visualisations using the tool Tableau Public (which some members of DP team have also been using).

Allen spoke of the ‘frontiers of visualisation’

  • political: how data/research are used by range of actors, decisions made through research
  • technical: the software and built-in assumptions/settings
  • virtual: interactivity, challenges of opening analysis up to public stakeholders

Questions and problems arising from the Observatory’s work: how do we visualise large datasets and patterns in them? Every decision comes with assumptions about what works. Also emphaised the danger that visualisation software can be a black box – eg, misleading on scale.

Additional resource: The Observatory website has a terrific page of data and resources with ‘ready-made charts and maps on migration in the UK as well as a description of key data sources and their limitations’, and a create your own chart facility. Go and play!

Our third speaker, Arthur Downing (Oxford), gave a presentation on Network Analysis and Visualisation for historians.

A network is a particular set of connections between agents: network analysis is analysis of the patterns of these connections (‘nodes’ and ‘links’). It differs from standard social science methodology (which tend to chop up objects by categories like race and gender and then looks at averages), in that network analysis starts with connections between objects/actors and then looks at their attributes. This is important because there can be different patterns of connections within superficially similar scenarios.

Some fascinating case studies he introduced:

Downing’s own work on 19th-century Friendly Societies – a network analysis of proposers and seconders showed that top 20% of recruiters were responsible for 80% of members. But using ‘eigenvector centrality’ (which takes into account degree of node and degree of nodes connected to each node), also showed that some people were important even though they weren’t large recruiters.

Network analysis for maps can show more complex patterns than standard maps:

  • Spread of Freemasons in the US – on a conventional map this just looks like a ‘frontier’ movement, but when mapped as a network, a  different picture emerges with more complex directions of flows
  • Social networks between Australian lodges – most migration is short distance and internal, though migration from England and Wales is very important

Pitfalls and problems:

  • identifying the boundaries of networks can be difficult
  • sampling is hard to justify as any missing ties can skew interpretation
  • longitudinal analysis is difficult – network analysis by definition is a snapshot in time; but may want to know how long does a tie persist. One answer is to breaks down into phases and look at different periods

Conceptually this is very different to standard statistics: ‘analysis of an endogenous system where endogeneity is what is interesting’, but potentially a great method for social history since it’s all about exploring complexity.

In subsequent discussion, concerns about ideological assumptions going into visualisations and how to communicate them to the user – but a reminder that this is a problem with traditional charts and tables too, with no simple answer.

We were deeply grateful to all three speakers for providing us with so much food for thought, and so many ideas to follow up!

[Part 2 of the report to follow shortly…]

Event: Visualising Data Workshop, Oxford, April 2014

We are delighted to be able to announce our first project workshop on Visualising Data, part of our Epistemologies research theme. We anticipate that the workshop will be of interest to many people (not just from large projects!) interested in the potential benefits and pitfalls of visualising large historical datasets.

Along the way, we’ll be reflecting on one of our key research questions:

What can visualisation techniques tell us about the overall shape/distinctive patterns in the data, and what does this reveal about the various processes by which the data were created, and their constraints/limitations?

We’re in the process of exploring data visualisation techniques that will enable us to analyse the datasets both individually and collectively, and members of the project team will talk and invite discussion about both the academic and technical challenges this presents. But we also have three excellent external speakers to provide perspectives from a range of fields and projects: Rob Procter (Warwick), Min Chen (Oxford) and William Allen (Oxford).

It’s an afternoon workshop which we hope will enable as many UK-based people as possible to make a one-day trip of it.

Download the Visualising Data Flyer for full programme details.

Workshop Information

When: 2pm-6pm, Monday 14 April 2014
Where: Wharton Room, All Souls College, High St, Oxford, UK.
Twitter: #dpdataviz

How to attend: Email Sharon Howard (sharon.howard@sheffield.ac.uk) to register. Places are very limited, so contact asap!

Event: Representing Penal Histories: Displaying and Narrating the Criminal Past (Nottingham, Jan 2014)

Our Criminal Past AHRC Network – third network event

Date: 31 January 2014, 10am-4.30pm

Venue: Galleries of Justice, Nottingham

Free event but registration is required: more details here.

Prof. Barry Godfrey will be talking about ‘Conceiving the Digital Panopticon’ and other DP suspects team members will almost certainly be lurking. It promises to be an enjoyable and thought-provoking day.