Samenvatting
Unusual patterns in nucleic acid or protein sequences are often suspected for their biological relevance. Repeating patterns of nucleotides are one such type and are typically searched in large genome sequences. In this exercise, our interest is to look for repeating patterns, which are conserved in a set of homologous DNA sequences, not only in terms of their counts/occurrences, but also their spacing/separating distances. We refer to such patterns as consistent repeating patterns. It becomes desirable to know the probability of multiple occurrence of pattern in sequences and whether the spacing due to occurrences of pattern in the sequence exhibits any statistically significant property. The information derived through statistical analysis may help in planning experiments or even raise new queries that may require attention to better understand the molecular mechanisms. A case study with four hundred 16S rDNA sequences resulted into nine most consistent repeating patterns. The statistical significance of counts of these patterns was studied using Poisson approximation. The spacing analysis of patterns was carried with recourse to uniform probability distribution. The analysis revealed that most of the patterns show significant clustering, with one pattern occurring thrice and evenly dispersed in a sequence. The significance of occurrence and spacing of repeating patterns raised a few queries which requires explanation, perhaps through experimentation.
Originele taal-2 | Engels |
---|---|
Pagina's (van-tot) | 789-795 |
Aantal pagina's | 7 |
Tijdschrift | Current Science |
Volume | 91 |
Nummer van het tijdschrift | 6 |
Status | Gepubliceerd - 25 sep. 2006 |
Extern gepubliceerd | Ja |