Skip to main navigation Skip to search Skip to main content

Towards efficient and accessible geoparsing of UK local media: A benchmark dataset and LLM-based approach

  • University of Surrey

Research output: Contribution to journalArticlepeer-review

Abstract

Location mentions in local news are crucial for examining issues like spatial inequalities, news deserts, and the impact of media ownership on news diversity. However, while geoparsing—extracting and resolving location mentions—has advanced through statistical and deep learning methods, its use in local media studies remains limited and fragmented due to technical challenges and a lack of practical frameworks. To address these challenges, we identify key considerations for successful geoparsing and review spatially oriented local media studies, finding over-reliance on limited geospatial vocabularies, limited toponym disambiguation, and inadequate validation of methods. These findings underscore the need for adaptable and robust solutions, and recent advancements in fine-tuned Large Language Models (LLMs) for geoparsing offer a promising direction by simplifying technical implementation and excelling at understanding contextual nuances. However, their application to UK local media—marked by fine-grained geographies and colloquial place names—remains underexplored due to the absence of benchmark datasets. This gap hinders researchers’ ability to evaluate and refine geoparsing methods for this domain. To address this, we introduce the Local Media UK Geoparsing (LMUK-Geo) dataset, a hand-annotated corpus of UK local news articles designed to support the development and evaluation of geoparsing pipelines. We also propose an LLM-driven approach for toponym disambiguation that replaces fine-tuning with accessible prompt engineering. Using LMUK-Geo, we benchmark our approach against a fine-tuned method. Both perform well on the novel dataset: the fine-tuned model excels in minimising coordinate-error distances, while the prompt-based method offers a scalable alternative for district-level classification, particularly when relying on predictions agreed upon by multiple models. Our contributions establish a foundation for geoparsing local media, advancing methodological frameworks and practical tools to enable systematic and comparative research.
Original languageEnglish
JournalComputational Humanities Research
Volume1
Early online date9 Oct 2025
DOIs
Publication statusPublished - 9 Oct 2025

Keywords

  • Geoparsing
  • Large language models
  • Location extraction
  • Prompt engineering
  • Spatial analysis
  • Toponym disambiguation
  • UK local media

Fingerprint

Dive into the research topics of 'Towards efficient and accessible geoparsing of UK local media: A benchmark dataset and LLM-based approach'. Together they form a unique fingerprint.

Cite this