2009 | OriginalPaper | Chapter
A Screening Method for Z-Value Assessment Based on the Normalized Edit Distance
Authors : Guillermo Peris, Andrés Marzal
Published in: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Pairwise global alignment scores are used to detect related sequences in genome and proteins. These scores are biased by the length and composition of the compared sequences, and the Z-value is used to estimate their statistical significance. The Z-value is computed using a Monte Carlo algorithm that requires a large number of pairwise alignments between random permutations of the sequences compared.
A different alignment score, the
normalized edit distance
, is independent of the sequence lengths, and it usually takes 2 or 3 standard alignment calculations. In this paper we study the relationship between the normalized edit distance and the Z-value, and propose a method to screen pairs of unrelated sequences, so that Z-value needs to be computed for a small percentage of sequence pairs. We apply this method to the comparison of proteins from
Saccharomyces cerevisiae
,
Escherichia coli
,
Methanococcus jannaschii
and
Haemophilus influenzae
, showing that Z-value has to be computed for less than 1% of all protein pairs.