Skip to main content
Erschienen in: Information Systems Frontiers 4/2009

01.09.2009

Proportional fault-tolerant data mining with applications to bioinformatics

verfasst von: Guanling Lee, Sheng-Lung Peng, Yuh-Tzu Lin

Erschienen in: Information Systems Frontiers | Ausgabe 4/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The mining of frequent patterns in databases has been studied for several years, but few reports have discussed for fault-tolerant (FT) pattern mining. FT data mining is more suitable for extracting interesting information from real-world data that may be polluted by noise. In particular, the increasing amount of today’s biological databases requires such a data mining technique to mine important data, e.g., motifs. In this paper, we propose the concept of proportional FT mining of frequent patterns. The number of tolerable faults in a proportional FT pattern is proportional to the length of the pattern. Two algorithms are designed for solving this problem. The first algorithm, named FT-BottomUp, applies an FT-Apriori heuristic and finds all FT patterns with any number of faults. The second algorithm, FT-LevelWise, divides all FT patterns into several groups according to the number of tolerable faults, and mines the content patterns of each group in turn. By applying our algorithm on real data, two reported epitopes of spike proteins of SARS-CoV can be found in our resulting itemset and the proportional FT data mining is better than the fixed FT data mining for this application.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agrawal, R., & Srikant, R. (1994). “Fast Algorithm for Mining Association Rules.” In Proceedings of Int. Conf. Very Large Data Bases (VLDB’94), pp. 487–499, Santiago, Chile. Agrawal, R., & Srikant, R. (1994). “Fast Algorithm for Mining Association Rules.” In Proceedings of Int. Conf. Very Large Data Bases (VLDB’94), pp. 487–499, Santiago, Chile.
Zurück zum Zitat Agrawal, R., Imielinski, T., & Swami, A. (1993). “Mining Association Rules between Sets of Items in Large Databases.” In Proceedings of ACM-SIGMOD International Conference Management of Data (SIGMOD’93), pp. 207–216, Washington, DC. Agrawal, R., Imielinski, T., & Swami, A. (1993). “Mining Association Rules between Sets of Items in Large Databases.” In Proceedings of ACM-SIGMOD International Conference Management of Data (SIGMOD’93), pp. 207–216, Washington, DC.
Zurück zum Zitat Antonie, M., Zaïane, O. R. (2004). “Mining Positive and Negative Association Rules: An Approach for Confined Rules,” European Conference on Principles of Data Mining and Knowledge Discovery (PKDD). Antonie, M., Zaïane, O. R. (2004). “Mining Positive and Negative Association Rules: An Approach for Confined Rules,” European Conference on Principles of Data Mining and Knowledge Discovery (PKDD).
Zurück zum Zitat Brin, S., Motwani, R., Ullman, J. D., & Tsur, S. (1997). “Dynamic Itemset Counting and Implication Rules for Market Basket Analysis,” In Proceedings of ACM-SIGMOD International Conference Management of Data (SIGMOD’97), pp. 255–264, Tucson, AZ. Brin, S., Motwani, R., Ullman, J. D., & Tsur, S. (1997). “Dynamic Itemset Counting and Implication Rules for Market Basket Analysis,” In Proceedings of ACM-SIGMOD International Conference Management of Data (SIGMOD’97), pp. 255–264, Tucson, AZ.
Zurück zum Zitat Chen, Y. P. P. (2005). Bioinformatics Technologies. Berlin: Springer.CrossRef Chen, Y. P. P. (2005). Bioinformatics Technologies. Berlin: Springer.CrossRef
Zurück zum Zitat Chen, Y. - L., & Ho, C. - Y. (2005). A Sampling-Based Method for Mining Frequent Patterns from Databases. In FSKD 2005, Changsha, China, pp 536–545. Chen, Y. - L., & Ho, C. - Y. (2005). A Sampling-Based Method for Mining Frequent Patterns from Databases. In FSKD 2005, Changsha, China, pp 536–545.
Zurück zum Zitat Chen, B., Haas, P., & Scheuermann, P. (2002). “A New Two-Phase Sampling Based Algorithm for Discovering Association Rules.” In Proceedings of the 8th ACM SIGKDD International Conference Knowledge Discovery and Data Mining (SIGKDD’02), Alberta, Canada. Chen, B., Haas, P., & Scheuermann, P. (2002). “A New Two-Phase Sampling Based Algorithm for Discovering Association Rules.” In Proceedings of the 8th ACM SIGKDD International Conference Knowledge Discovery and Data Mining (SIGKDD’02), Alberta, Canada.
Zurück zum Zitat Chen, L., Bhowmick, S. S., & Li, J. (2006). Mining Temporal Indirect Associations.” In Proceedings of 10th International Conference Pacific-Asia Conference (PAKDD 2006), Singapore, pp. 425–434. Chen, L., Bhowmick, S. S., & Li, J. (2006). Mining Temporal Indirect Associations.” In Proceedings of 10th International Conference Pacific-Asia Conference (PAKDD 2006), Singapore, pp. 425–434.
Zurück zum Zitat Chu, T. - P., Wu, F., & Chiang, S. - W. (2005). “Mining Frequent Pattern Using Item-Transformation Method.” In Proceedings of 4th Annual ACIS International Conference on Computer and Information Science (ICIS 2005), South Korea, pp. 698–706. Chu, T. - P., Wu, F., & Chiang, S. - W. (2005). “Mining Frequent Pattern Using Item-Transformation Method.” In Proceedings of 4th Annual ACIS International Conference on Computer and Information Science (ICIS 2005), South Korea, pp. 698–706.
Zurück zum Zitat Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences. Cambridge: Cambridge University Press. Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences. Cambridge: Cambridge University Press.
Zurück zum Zitat Han, J., & Fu, Y. (1995). “Discovery of Multiple-level Association Rules from Large Databases.” In Proceedings of International Conference Very Large Data Bases (VLDB’95), pp. 420–431, Zurich, Switzerland. Han, J., & Fu, Y. (1995). “Discovery of Multiple-level Association Rules from Large Databases.” In Proceedings of International Conference Very Large Data Bases (VLDB’95), pp. 420–431, Zurich, Switzerland.
Zurück zum Zitat Han, J., Pei, J., & Yin, Y. (2000). “Mining Frequent Patterns Without Candidate Generation.”. In Proceedings of 2000 ACM SIGMOD International Conference on Management of data. ACM SIGMOD Record. Han, J., Pei, J., & Yin, Y. (2000). “Mining Frequent Patterns Without Candidate Generation.”. In Proceedings of 2000 ACM SIGMOD International Conference on Management of data. ACM SIGMOD Record.
Zurück zum Zitat Knuth, D. E., Morris, J. H., & Pratt, V. B. (1977). Fast Pattern Matching in Strings. SIAM Journal on Computing, 6, 323–350.CrossRef Knuth, D. E., Morris, J. H., & Pratt, V. B. (1977). Fast Pattern Matching in Strings. SIAM Journal on Computing, 6, 323–350.CrossRef
Zurück zum Zitat Kotlyar, M., & Jurisica, I. (2006). Predicting protein–protein interactions by association mining. Information Systems Frontiers, 8, 37–47.CrossRef Kotlyar, M., & Jurisica, I. (2006). Predicting protein–protein interactions by association mining. Information Systems Frontiers, 8, 37–47.CrossRef
Zurück zum Zitat Lee, C. F., Changchien, S. W., Wang, W. T., & Shen, J. J. (2006). A data mining approach to database compression. Information Systems Frontiers, 8, 147–161.CrossRef Lee, C. F., Changchien, S. W., Wang, W. T., & Shen, J. J. (2006). A data mining approach to database compression. Information Systems Frontiers, 8, 147–161.CrossRef
Zurück zum Zitat Murphy, K., Travers, P., & Walport, M. (2008). Janeway’s immuno biology (7th ed.). London: Garland Science. Murphy, K., Travers, P., & Walport, M. (2008). Janeway’s immuno biology (7th ed.). London: Garland Science.
Zurück zum Zitat Park, J. S., Chen, M. S., & Yu, P. S. (1995). An Efficient Hash-based Algorithm for Mining Association Rules.” In Proceedings of ACM-SIGMOD International Conference Management of Data (SIGMOD’95), pp. 175–186, San Jose, CA. Park, J. S., Chen, M. S., & Yu, P. S. (1995). An Efficient Hash-based Algorithm for Mining Association Rules.” In Proceedings of ACM-SIGMOD International Conference Management of Data (SIGMOD’95), pp. 175–186, San Jose, CA.
Zurück zum Zitat Pavon, J., Viana, S., & Gomez, S. (2006). Matrix Apriori: Speeding Up the Search for Frequent Patterns pp. 75–82. Austria: Databases and Applications. Pavon, J., Viana, S., & Gomez, S. (2006). Matrix Apriori: Speeding Up the Search for Frequent Patterns pp. 75–82. Austria: Databases and Applications.
Zurück zum Zitat Pei, J., Tung, A. K. H., & Han, J. (2001). Fault-Tolerant Frequent Pattern Mining: Problems and Challenges. DMKD’01, Santa Barbara, CA. Pei, J., Tung, A. K. H., & Han, J. (2001). Fault-Tolerant Frequent Pattern Mining: Problems and Challenges. DMKD01, Santa Barbara, CA.
Zurück zum Zitat Rota, P. A., et al. (2003). Characterization of a novel Coronavirus associated with Severe Acute Respiratory Syndrome. Science, 300, 1394–1399.CrossRef Rota, P. A., et al. (2003). Characterization of a novel Coronavirus associated with Severe Acute Respiratory Syndrome. Science, 300, 1394–1399.CrossRef
Zurück zum Zitat Saha, S., Bhasin, M., & Raghava, G. P. S. (2005). Bcipep:A database of B-cell epitopes. BMC Genomics, 6(1), 79.CrossRef Saha, S., Bhasin, M., & Raghava, G. P. S. (2005). Bcipep:A database of B-cell epitopes. BMC Genomics, 6(1), 79.CrossRef
Zurück zum Zitat Savasere, A., Omiecinski, E., & Navathe, S. (1995). An Efficient Algorithm for Mining Association Rules in Large Databases. In Proceedings of International Conference Very Large Data Bases (VLDB’95), pp. 432–443, Zurich, Switzerland. Savasere, A., Omiecinski, E., & Navathe, S. (1995). An Efficient Algorithm for Mining Association Rules in Large Databases. In Proceedings of International Conference Very Large Data Bases (VLDB’95), pp. 432–443, Zurich, Switzerland.
Zurück zum Zitat Thiruvady, D. R., & Webb, G. I. (2004). Mining Negative Rules using GRD. In Proceedings of PAKDD. Thiruvady, D. R., & Webb, G. I. (2004). Mining Negative Rules using GRD. In Proceedings of PAKDD.
Zurück zum Zitat Ukkonen, E. (1995). On-line Construction of Suffix-trees. Algorithmica, 14, 249–260.CrossRef Ukkonen, E. (1995). On-line Construction of Suffix-trees. Algorithmica, 14, 249–260.CrossRef
Zurück zum Zitat Wang, S. - S., & Lee, S. - Y. (2002). Mining Fault-Tolerant Frequent Patterns in Large Database. Proceedings of International Computer Symposium. Wang, S. - S., & Lee, S. - Y. (2002). Mining Fault-Tolerant Frequent Patterns in Large Database. Proceedings of International Computer Symposium.
Zurück zum Zitat Yang, C., Fayyad, U., & Bradley, P. S. (2001). Efficient discovery of error-tolerant frequent itemsets in high dimensions.” In Proceedings of the seventh ACM SIGKDD International Conference on Knowledge discovery and data mining. Yang, C., Fayyad, U., & Bradley, P. S. (2001). Efficient discovery of error-tolerant frequent itemsets in high dimensions.” In Proceedings of the seventh ACM SIGKDD International Conference on Knowledge discovery and data mining.
Zurück zum Zitat Zaki, M. J. (2000). Scalable Algorithms for Association Mining. IEEE Transaction on Knowledge and Information Engineering, 12(3). Zaki, M. J. (2000). Scalable Algorithms for Association Mining. IEEE Transaction on Knowledge and Information Engineering, 12(3).
Zurück zum Zitat Zhang, C., & Zhang, S. (2004). Efficient Mining of Both Positive and Negative Association Rules: generate both positive and negative association rules. ACM Transactions on Information Systems. Zhang, C., & Zhang, S. (2004). Efficient Mining of Both Positive and Negative Association Rules: generate both positive and negative association rules. ACM Transactions on Information Systems.
Metadaten
Titel
Proportional fault-tolerant data mining with applications to bioinformatics
verfasst von
Guanling Lee
Sheng-Lung Peng
Yuh-Tzu Lin
Publikationsdatum
01.09.2009
Verlag
Springer US
Erschienen in
Information Systems Frontiers / Ausgabe 4/2009
Print ISSN: 1387-3326
Elektronische ISSN: 1572-9419
DOI
https://doi.org/10.1007/s10796-009-9158-z

Weitere Artikel der Ausgabe 4/2009

Information Systems Frontiers 4/2009 Zur Ausgabe