Feeding a Gazetteer: Leveraging word embeddings for toponym mining

Sinai Rusinek, Nitzan Gado

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We propose a workflow for retrieving place names from a corpus of Hebrew historical newspapers. We show that using an initial curated set of unambiguous toponyms and vector similarity is a productive method to populate a gazetteer with previously unknown variant names of known places, as well as names of places which are not yet included in the Gazetteer. We examine several parameters for enhancing accuracy and suggest a workflow that combines computation with human expertise, and is valuable to spatial history as well as to other domains.

Original languageEnglish
Title of host publicationProceedings of the 5th ACM SIGSPATIAL International Workshop on Geospatial Humanities, GeoHumanities 2021
EditorsLudovic Moncla, Carmen Brando, Katherine McDonough
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450391023
DOIs
StatePublished - 2 Nov 2021
Event5th ACM SIGSPATIAL International Workshop on Geospatial Humanities, GeoHumanities 2021 - Beijing, China
Duration: 2 Nov 2021 → …

Publication series

NameProceedings of the 5th ACM SIGSPATIAL International Workshop on Geospatial Humanities, GeoHumanities 2021

Conference

Conference5th ACM SIGSPATIAL International Workshop on Geospatial Humanities, GeoHumanities 2021
Country/TerritoryChina
CityBeijing
Period2/11/21 → …

Bibliographical note

Publisher Copyright:
© 2021 Owner/Author.

Keywords

  • Gazetteers
  • Historical Newspapers
  • Natural Language Processing
  • Toponym extraction
  • Word Embeddings

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems

Fingerprint

Dive into the research topics of 'Feeding a Gazetteer: Leveraging word embeddings for toponym mining'. Together they form a unique fingerprint.

Cite this