2008 | OriginalPaper | Chapter
A Novel Method of Prokaryotic Promoter Regions Prediction with Feature Selection: Quadratic Discriminant Analysis Approach
Authors : Yaohua Du, Taihu Wu
Published in: 7th Asian-Pacific Conference on Medical and Biological Engineering
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Promoter identification is an essential task in the research of transcription regulation, but the prediction accuracy of current methods is still far away from what it is expected. An effective and reliable prediction method for prokaryotic promoter regions would be very helpful. We have developed a quadratic discriminant analysis (QDA) method based on feature selection to predict prokaryotic promoter regions, which are classified according to their locations in genome. In order to utilize more characteristic information, we incorporate content features, signal features and structure features of the promoters in the candidate feature set and construct proper statistical models to calculate them. Especially for the main conserved signal features, a composite motif model is adopted, which achieves the optimal parameters by an iterative search algorithm OPSIA. Using the squared Mahalonobis distance as a measure, the discriminating features are selected out from the candidate features through a stepwise procedure and are combined as a multidimensional vector. Then the vector of combined features is further used by QDA to predict the potential promoter regions. The algorithm has been trained and tested on E. coli and B. subtilis promoter datasets by the jackknife method. For E. coli
σ
70
promoters located in the non-coding regions, the average prediction accuracy is 85.7%, and for the ones located in the coding regions and several other kinds of prokaryotic promoters, their prediction accuracies are also about 80%. The results indicate that our method is a universal algorithm that outperforms most of the existing approaches based on several performance measurements. Furthermore, the framework of the method is extendable, which can accept more new features to improve the prediction results efficiently. The OPSIA algorithm is also a useful tool to explore composite motifs in newly uncovered promoter sequences.