Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances
AbstractThe mapping defined by inter-nucleotide distances (InD) provides a reversible numerical representation of the primary structure of DNA. If nucleotides were independently placed along the genome, a finite mixture model of four geometric distributions could be fitted to the InD where the four marginal distributions would be the expected distributions of the four nucleotide types. We analyze a finite mixture model of geometric distributions (f_2), with marginals not explicitly addressed to the nucleotide types, as an approximation to the InD. We use BIC in the composite likelihood framework for choosing the number of components of the mixture and the EM algorithm for estimating the model parameters. Based on divergence profiles, an experimental study was carried out on the complete genomes of 45 species to evaluate f_2. Although the proposed model is not suited to the InD, our analysis shows that divergence profiles involving the empirical distribution of the InD are also exhibited by profiles involving f_2. It suggests that statistical regularities of the InD can be described by the model f_2. Some characteristics of the DNA sequences captured by the model f_2 are illustrated. In particular, clusterings of subgroups of eukaryotes (primates, mammalians, animals and plants) are detected.
How to Cite
Freitas, A. V., Afreixo, V., & Cruz, S. E. (2013). Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances. Statistics, Optimization & Information Computing, 1(1), 8-28. https://doi.org/10.19139/soic.v1i1.6
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).