Scientific Data Extraction from Oceanographic Papers

Bartal Eyofinsson Veyhe, Tomer Sagi, Katja Hose

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


Scientific data collected in the oceanographic domain is invaluable to researchers when performing meta-analyses and examining changes over time in oceanic environments. However, many of the data samples and subsequent analyses published by researchers are not uploaded to a repository leaving the scientific paper as the only available source. Automated extraction of scientific data is, therefore, a valuable tool for such researchers. Specifically, much of the most valuable data in scientific papers are structured as tables, making these a prime target for information extraction research. Using the data relies on an additional step where the concepts mentioned in the tables, such as names of measures, units, and biological species, are identified within a domain ontology. Unfortunately, state-of-the-art table extraction leaves much to be desired and has not been attempted on a large scale on oceanographic papers. Furthermore, while entity linking in the context of a full paragraph of text has been heavily researched, it is still lacking in this harder task of linking single concepts. In this work, we present an annotated benchmark dataset of data tables from oceanographic papers. We further present the result of an evaluation on the extraction of these tables and the linking of the contained entities to the domain and general-purpose knowledge bases using the current state of the art. We highlight the challenges and quantify the performance of current tools for table extraction and table-concept linking.

Original languageEnglish
Title of host publicationACM Web Conference 2023 - Companion of the World Wide Web Conference, WWW 2023
PublisherAssociation for Computing Machinery, Inc
Number of pages5
ISBN (Electronic)9781450394161
StatePublished - 30 Apr 2023
Externally publishedYes
Event2023 World Wide Web Conference, WWW 2023 - Austin, United States
Duration: 30 Apr 20234 May 2023

Publication series

NameCompanion Proceedings of the ACM Web Conference 2023


Conference2023 World Wide Web Conference, WWW 2023
Country/TerritoryUnited States

Bibliographical note

Publisher Copyright:
© 2023 ACM.


  • Entity Linking
  • Scientific data
  • Table extraction

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software


Dive into the research topics of 'Scientific Data Extraction from Oceanographic Papers'. Together they form a unique fingerprint.

Cite this