nach oben

Information Systems Frontiers

Erschienen in:

01.09.2009

Proportional fault-tolerant data mining with applications to bioinformatics

verfasst von: Guanling Lee, Sheng-Lung Peng, Yuh-Tzu Lin

Erschienen in: Information Systems Frontiers | Ausgabe 4/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The mining of frequent patterns in databases has been studied for several years, but few reports have discussed for fault-tolerant (FT) pattern mining. FT data mining is more suitable for extracting interesting information from real-world data that may be polluted by noise. In particular, the increasing amount of today’s biological databases requires such a data mining technique to mine important data, e.g., motifs. In this paper, we propose the concept of proportional FT mining of frequent patterns. The number of tolerable faults in a proportional FT pattern is proportional to the length of the pattern. Two algorithms are designed for solving this problem. The first algorithm, named FT-BottomUp, applies an FT-Apriori heuristic and finds all FT patterns with any number of faults. The second algorithm, FT-LevelWise, divides all FT patterns into several groups according to the number of tolerable faults, and mines the content patterns of each group in turn. By applying our algorithm on real data, two reported epitopes of spike proteins of SARS-CoV can be found in our resulting itemset and the proportional FT data mining is better than the fixed FT data mining for this application.

Vorheriger Artikel Comparing data mining methods with logistic regression in childhood obesity prediction

Nächster Artikel Knowledge management in biomedical libraries: A semantic web approach

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Agrawal, R., & Srikant, R. (1994). “Fast Algorithm for Mining Association Rules.” In Proceedings of Int. Conf. Very Large Data Bases (VLDB’94), pp. 487–499, Santiago, Chile.

Agrawal, R., Imielinski, T., & Swami, A. (1993). “Mining Association Rules between Sets of Items in Large Databases.” In Proceedings of ACM-SIGMOD International Conference Management of Data (SIGMOD’93), pp. 207–216, Washington, DC.

Antonie, M., Zaïane, O. R. (2004). “Mining Positive and Negative Association Rules: An Approach for Confined Rules,” European Conference on Principles of Data Mining and Knowledge Discovery (PKDD).

Brin, S., Motwani, R., Ullman, J. D., & Tsur, S. (1997). “Dynamic Itemset Counting and Implication Rules for Market Basket Analysis,” In Proceedings of ACM-SIGMOD International Conference Management of Data (SIGMOD’97), pp. 255–264, Tucson, AZ.

Chen, Y. P. P. (2005). Bioinformatics Technologies. Berlin: Springer.CrossRef

Chen, Y. - L., & Ho, C. - Y. (2005). A Sampling-Based Method for Mining Frequent Patterns from Databases. In FSKD 2005, Changsha, China, pp 536–545.

Chen, B., Haas, P., & Scheuermann, P. (2002). “A New Two-Phase Sampling Based Algorithm for Discovering Association Rules.” In Proceedings of the 8th ACM SIGKDD International Conference Knowledge Discovery and Data Mining (SIGKDD’02), Alberta, Canada.

Chen, L., Bhowmick, S. S., & Li, J. (2006). Mining Temporal Indirect Associations.” In Proceedings of 10th International Conference Pacific-Asia Conference (PAKDD 2006), Singapore, pp. 425–434.

Chu, T. - P., Wu, F., & Chiang, S. - W. (2005). “Mining Frequent Pattern Using Item-Transformation Method.” In Proceedings of 4th Annual ACIS International Conference on Computer and Information Science (ICIS 2005), South Korea, pp. 698–706.

Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences. Cambridge: Cambridge University Press.

Han, J., & Fu, Y. (1995). “Discovery of Multiple-level Association Rules from Large Databases.” In Proceedings of International Conference Very Large Data Bases (VLDB’95), pp. 420–431, Zurich, Switzerland.

Han, J., Pei, J., & Yin, Y. (2000). “Mining Frequent Patterns Without Candidate Generation.”. In Proceedings of 2000 ACM SIGMOD International Conference on Management of data. ACM SIGMOD Record.

Knuth, D. E., Morris, J. H., & Pratt, V. B. (1977). Fast Pattern Matching in Strings. SIAM Journal on Computing, 6, 323–350.CrossRef

Kotlyar, M., & Jurisica, I. (2006). Predicting protein–protein interactions by association mining. Information Systems Frontiers, 8, 37–47.CrossRef

Lee, C. F., Changchien, S. W., Wang, W. T., & Shen, J. J. (2006). A data mining approach to database compression. Information Systems Frontiers, 8, 147–161.CrossRef

Murphy, K., Travers, P., & Walport, M. (2008). Janeway’s immuno biology (7th ed.). London: Garland Science.

Park, J. S., Chen, M. S., & Yu, P. S. (1995). An Efficient Hash-based Algorithm for Mining Association Rules.” In Proceedings of ACM-SIGMOD International Conference Management of Data (SIGMOD’95), pp. 175–186, San Jose, CA.

Pavon, J., Viana, S., & Gomez, S. (2006). Matrix Apriori: Speeding Up the Search for Frequent Patterns pp. 75–82. Austria: Databases and Applications.

Pei, J., Tung, A. K. H., & Han, J. (2001). Fault-Tolerant Frequent Pattern Mining: Problems and Challenges. DMKD’01, Santa Barbara, CA.

Rota, P. A., et al. (2003). Characterization of a novel Coronavirus associated with Severe Acute Respiratory Syndrome. Science, 300, 1394–1399.CrossRef

Saha, S., Bhasin, M., & Raghava, G. P. S. (2005). Bcipep:A database of B-cell epitopes. BMC Genomics, 6(1), 79.CrossRef

Savasere, A., Omiecinski, E., & Navathe, S. (1995). An Efficient Algorithm for Mining Association Rules in Large Databases. In Proceedings of International Conference Very Large Data Bases (VLDB’95), pp. 432–443, Zurich, Switzerland.

Thiruvady, D. R., & Webb, G. I. (2004). Mining Negative Rules using GRD. In Proceedings of PAKDD.

Ukkonen, E. (1995). On-line Construction of Suffix-trees. Algorithmica, 14, 249–260.CrossRef

Wang, S. - S., & Lee, S. - Y. (2002). Mining Fault-Tolerant Frequent Patterns in Large Database. Proceedings of International Computer Symposium.

Yang, C., Fayyad, U., & Bradley, P. S. (2001). Efficient discovery of error-tolerant frequent itemsets in high dimensions.” In Proceedings of the seventh ACM SIGKDD International Conference on Knowledge discovery and data mining.

Zaki, M. J. (2000). Scalable Algorithms for Association Mining. IEEE Transaction on Knowledge and Information Engineering, 12(3).

Zhang, C., & Zhang, S. (2004). Efficient Mining of Both Positive and Negative Association Rules: generate both positive and negative association rules. ACM Transactions on Information Systems.

Titel: Proportional fault-tolerant data mining with applications to bioinformatics
verfasst von: Guanling Lee
Sheng-Lung Peng
Yuh-Tzu Lin
Publikationsdatum: 01.09.2009
Verlag: Springer US
Erschienen in: Information Systems Frontiers / Ausgabe 4/2009
Print ISSN: 1387-3326
Elektronische ISSN: 1572-9419
DOI: https://doi.org/10.1007/s10796-009-9158-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2009

Comparing data mining methods with logistic regression in childhood obesity prediction

Image fusion enhancement of deformable human structures using a two-stage warping-deformable strategy: A content-based image retrieval consideration

Editorial for the special issue of knowledge discovery and management in biomedical information systems

An automated bacterial colony counting and classification system

Efficient mining of multilevel gene association rules from microarray and gene ontology

Design of an RFID-based Healthcare Management System using an Information System Design Theory