Background: In an earlier study, we hypothesized that genomic segments with different sequence organization patterns (OPs) might display functional specificity despite their similar GC content. Here we tested this hypothesis by dividing the human genome into 100 kb segments, classifying these segments into five compositional groups according to GC content, and then characterizing each segment within the five groups by oligonucleotide counting (k-mer analysis; also referred to as compositional spectrum analysis, or CSA), to examine the distribution of sequence OPs in the segments. We performed the CSA on the entire DNA, i.e., its coding and non-coding parts the latter being much more abundant in the genome than the former.Results: We identified 38 OP-type clusters of segments that differ in their compositional spectrum (CS) organization. Many of the segments that shared the same OP type were enriched with genes related to the same biological processes (developmental, signaling, etc.), components of biochemical complexes, or organelles. Thirteen OP-type clusters showed significant enrichment in genes connected to specific gene-ontology terms. Some of these clusters seemed to reflect certain events during periods of horizontal gene transfer and genome expansion, and subsequent evolution of genomic regions requiring coordinated regulation.Conclusions: There may be a tendency for genes that are involved in the same biological process, complex or organelle to use the same OP, even at a distance of ~ 100 kb from the genes. Although the intergenic DNA is non-coding, the general pattern of sequence organization (e.g., reflected in over-represented oligonucleotide " words") may be important and were protected, to some extent, in the course of evolution.
Bibliographical noteFunding Information:
The authors want to thank the anonymous reviewer for helpful comments and suggestions, Zeev Frenkel for useful discussions and suggestions, and Alexander Frenkel for help in improving the scripts. This work was supported by the Israeli Ministry of Absorption (SF and VK). SF was also supported by a fellowship for excellence from the Converging Technologies Program of The Council for Higher Education.
- Compositional spectra analysis
- Horizontal gene transfer
- Sequence organization pattern
- Whole genome duplication
ASJC Scopus subject areas