Abstract
Identifying similar narrative sections across longer documents would help identify key events within a corpus, enrich understanding of those events, provide a mechanism for organizing corpora according to their event content, and allow for bottom-up testing of theories of narrative. This paper proposes an automated method for narrative alignment across large textual corpora using techniques from natural language processing and similarity-based image segmentation. This method proceeds by segmenting each document into a series of events, constructs sequences of abstracted representations of those events, compares pairs of sequences to generate image matrices, segments the images, identifies similar segments to discover commonly occurring narrative units, and, finally, returns the source sentences to make the clusters of narrative similarity readable. Preliminary tests of elements of this method were conducted on a small heterogeneous corpus (< 100 documents) and a moderate heterogeneous corpus (10k documents). Further implementation as described in this position paper is necessary to scale to the full 251k document corpus from which the moderate corpus was drawn.
Original language | English |
---|---|
Title of host publication | Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 |
Editors | Feng Luo, Kemafor Ogan, Mohammed J. Zaki, Laura Haas, Beng Chin Ooi, Vipin Kumar, Sudarsan Rachuri, Saumyadipta Pyne, Howard Ho, Xiaohua Hu, Shipeng Yu, Morris Hui-I Hsiao, Jian Li |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 1673-1677 |
Number of pages | 5 |
ISBN (Electronic) | 9781479999255 |
DOIs | |
State | Published - 22 Dec 2015 |
Externally published | Yes |
Event | 3rd IEEE International Conference on Big Data, IEEE Big Data 2015 - Santa Clara, United States Duration: 29 Oct 2015 → 1 Nov 2015 |
Publication series
Name | Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 |
---|
Conference
Conference | 3rd IEEE International Conference on Big Data, IEEE Big Data 2015 |
---|---|
Country/Territory | United States |
City | Santa Clara |
Period | 29/10/15 → 1/11/15 |
Bibliographical note
Publisher Copyright:© 2015 IEEE.
Keywords
- Computational models of narrative
- big data
- computational linguistics
- text mining
ASJC Scopus subject areas
- Computer Networks and Communications
- Computer Science Applications
- Information Systems
- Software