Using a novel clumpiness measure to unite data with metadata: Finding common sequence patterns in immune receptor germline v genes

Gregory W. Schwartz, Ali Shokoufandeh, Santiago Ontañón, Uri Hershberg

Research output: Contribution to journalArticlepeer-review


When finding relationships in biological systems, we often describe hierarchies based on one facet of the data. However, when using this hierarchy to elucidate relationships between metadata, the distribution of metadata labels within the hierarchy may exhibit different levels of aggregation - uniform, random, or clumped. As of now, there exists no measure for finding the level of aggregation, or "clumpiness", between labels distributed among the leaves of a hierarchical container. We propose a clumpiness measure to aid in the quantification of relationships between metadata. We validated our measure with random trees and found that the measure is resistant to changes in the tree size, label size, and the number of types of labels, compared to the closest alternative measures. We used our clumpiness measure to quantify the relationships between light and heavy chains in human and mouse B cell and T cell receptor V genes based on their motifs. We found that the B cell heavy chains were the most aggregated while the T cell chains were the least aggregated and that the IGL chain was clumped the most with the T cell chains out of all of the B cell chains.

Original languageEnglish
Pages (from-to)24-29
Number of pages6
JournalPattern Recognition Letters
StatePublished - 15 Apr 2016
Externally publishedYes

Bibliographical note

Funding Information:
Gregory W. Schwartz is funded by the U.S. Department of Education Graduate Assistance in Areas of National Need (GAANN) program, CFDA Number: 84.200. Research reported in this publication was supported by the National Institute Of Allergy And Infectious Diseases of the National Institutes of Health under Award Number P01AI106697 and the National Science Foundation Information & Intelligent Systems under Award Number 1551338. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the National Science Foundation.

Publisher Copyright:
© 2016 Elsevier B.V. All rights reserved.


  • Adaptive immunity
  • Aggregation
  • Hierarchical clustering
  • Immune receptor repertoire
  • Multiscale analysis
  • Tree analysis

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence


Dive into the research topics of 'Using a novel clumpiness measure to unite data with metadata: Finding common sequence patterns in immune receptor germline v genes'. Together they form a unique fingerprint.

Cite this