Several factors make stem-loops an attractive sequence signal for a structural RNA gene-finder. Structural RNAs are virtually obligated to form stem-loops on their way to forming stable structures. Also, stem-loops can be identified along a sequence of length
) time. We postulate that stem-loops found in structural RNA genes may tend to be longer than those found in their genomic counterparts - coding sequences and noncoding DNA. We also postulate that stem-loops may occur in higher frequency in the structural RNA regions.
Methods: To examine these possibilities, rRNAs were selected as a test bed. An algorithm was developed to identify stem-loops along a genomic sequence which are similar to those found in rRNA secondary structures. This algorithm scanned the genomes in our training set to establish average metric values observed in rRNA genes. These values were subsequently used in an effort to identify rRNA genes in genomes outside of the training set.
Results: The values for the stem-loop metrics we tested are sensitive to G+C content. Two of the metrics reported here are able to identify rRNA genes when there is a marked difference in G+C content between rRNAs and their genomic counterparts. Another metric has demonstrated an ability to roughly target rRNA genes when there is a negligible difference in G+C content levels.
Conclusions: Our results are encouraging and demonstrate that stem-loops have the potential to act as sequence signals to discover rRNA genes. Our results also suggest that more study into stem-loops is warranted to further improve the performance of our algorithm and to examine the application to a wider population of structural RNA genes.