Extracting Keywords

Overview

As the archive was created by thousands of contributors, each record varies in tone and detail. Some entries are long and reflective; others are short or factual. This richness makes the collection unique, but it also means that traditional search tools can miss the deeper themes running through it.

The project ran from October 2024 to June 2025 and included a series of online workshops with collaborators across Oxford and beyond. These sessions brought together historians, archivists, digital humanists, and technical specialists to share methods, review outputs, and refine approaches. The team is also developing prototype workflows and interactive tools, evaluating how different tools and models performed when applied to crowdsourced historical text.

Methods

The project tested a range of NLP, digital, and AI-based approaches - including keyword extraction, named entity recognition (NER), emotion analysis, and topic modelling - to see how these might help the data 'speak for itself'. One aim was to make it easier for researchers and the public to explore materials in ways that reflect current fields of interest, such as the history of emotions, memory studies, and everyday life during the Second World War.

Project Team

The project was led by Prof Stuart Lee (Principal Investigator), with Catherine Conisbee and Dr Matthew Kidd as Research Associates.

Extracting Keywords

Extracting Keywords from Crowdsourced Collections (2024-25)

ABOUT

ONLINE ARCHIVE

FOLLOW US