Extracting Keywords

Extracting Keywords from Crowdsourced Collections (2024-25)

Extracting Keywords from Crowdsourced Collections was a University of Oxford project funded by the Digital Scholarship at Oxford (DiSc) programme. It explored how natural language processing (NLP) and other digital and AI methods can help identify themes, emotions, and ideas within the Their Finest Hour Online Archive.

Overview

As the archive was created by thousands of contributors, each record varies in tone and detail. Some entries are long and reflective; others are short or factual. This richness makes the collection unique, but it also means that traditional search tools can miss the deeper themes running through it.

The project ran from October 2024 to June 2025 and included a series of online workshops with collaborators across Oxford and beyond. These sessions brought together historians, archivists, digital humanists, and technical specialists to share methods, review outputs, and refine approaches. The team is also developing prototype workflows and interactive tools, evaluating how different tools and models performed when applied to crowdsourced historical text.

Methods

The project tested a range of NLP, digital, and AI-based approaches - including keyword extraction, named entity recognition (NER), emotion analysis, and topic modelling - to see how these might help the data 'speak for itself'. The team also considered how automated methods might reduce some of the unconscious bias that can appear when humans manually assign tags or keywords to collections. In doing so, the team aimed to make it easier for researchers and the public to explore materials in ways that reflect current fields of interest such as the history of emotions, memory studies, and everyday life during the Second World War.

Project Team

The project was led by Prof Stuart Lee (Principal Investigator), with Catherine Conisbee and Dr Matthew Kidd as Research Associates.

discox logo grouped artboards 13
english logo border