Increased interest in the immune system's involvement in pathophysiological phenomena coupled with decreased DNA sequencing costs have led to an explosion of antibody and T cell receptor sequencing data collectively termed “adaptive immune receptor repertoire sequencing” (AIRR-seq or Rep-Seq). The AIRR Community has been actively working to standardize protocols, metadata, formats, APIs, and other guidelines to promote open and reproducible studies of the immune repertoire. In this paper, we describe the work of the AIRR Community's Data Representation Working Group to develop standardized data representations for storing and sharing annotated antibody and T cell receptor data. Our file format emphasizes ease-of-use, accessibility, scalability to large data sets, and a commitment to open and transparent science. It is composed of a tab-delimited format with a specific schema. Several popular repertoire analysis tools and data repositories already utilize this AIRR-seq data format. We hope that others will follow suit in the interest of promoting interoperable standards.
Bibliographical noteFunding Information:
We would like to thank Heng Li, Tom White, and Jian Ye for useful discussions and Marie-Paule Lefranc for a careful reading of the manuscript. The work of JV, SM, SB, and SK was supported by the National Institutes of Health under award number R01AI104739 to SK. SC was supported in part by an NIAID-funded R01 (AI097403). UL is supported in part by a grant from the Chan Zuckerberg Initiative (2018-182652). FM is supported by NIH grant R01 GM113246. BC is supported by the Canada Foundation for Innovation Cyberinfrastructure program. The work of AR and UH was supported by the National Institutes of Health under award number P01 AI106697.
© Copyright © 2018 Vander Heiden, Marquez, Marthandan, Bukhari, Busse, Corrie, Hershberg, Kleinstein, Matsen, Ralph, Rosenfeld, Schramm, The AIRR Community, Christley, and Laserson.
- B cell
- T cell
ASJC Scopus subject areas
- Immunology and Allergy