Skip to main content
Erschienen in: Journal of Intelligent Information Systems 3/2010

01.12.2010

Speed up gradual rule mining from stream data! A B-Tree and OWA-based approach

verfasst von: Jordi Nin, Anne Laurent, Pascal Poncelet

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 3/2010

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Gradual rules allow users to be provided with rules describing the ordering correlations among attributes. Such a rule is for instance given by the higher the salary and the lower the number of cars, the higher the number of tourist travels. Previously intensively used in fuzzy command systems, these rules were manually provided to the system. More recently, they have received attention from the data mining community and methods have been defined to automatically extract and maintain gradual rules from numerical databases. However, no method has been shown to be able to handle data streams, as no method is scalable enough to manage the high rate which stream data arrive at. In this paper, we thus propose an original approach to mine data streams for gradual rules. Our method is based on B-Trees and OWA (Ordered Weighted Aggregation) operator in order to speed up the process. B-Trees are used to store already-known gradual rules in order to maintain the knowledge over time, while OWA operators provide a fast way to discard non relevant data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
If a tuple has an attribute lower or higher than the minimum or maximum value predefined, such value can be normalized as 0 (for the minimum) or 1 (for the maximum).
 
Literatur
Zurück zum Zitat Aggarwal, C. (2007). Data streams: Models and algorithms. New York: Springer.MATH Aggarwal, C. (2007). Data streams: Models and algorithms. New York: Springer.MATH
Zurück zum Zitat Alon, N., Matias, Y., & Szegedy, M. (1996). The space complexity of approximating the frequency moments. In Proc. of the 28th annual ACM symposium on theory of computing (STOC’96) (pp. 20–29). Alon, N., Matias, Y., & Szegedy, M. (1996). The space complexity of approximating the frequency moments. In Proc. of the 28th annual ACM symposium on theory of computing (STOC’96) (pp. 20–29).
Zurück zum Zitat Berzal, F., Cubero, J., Sanchez, D., Vila, M., & Serrano, J. (2007). An alternative approach to discover gradual dependencies. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS), 15(5), 559–570.MATHCrossRefMathSciNet Berzal, F., Cubero, J., Sanchez, D., Vila, M., & Serrano, J. (2007). An alternative approach to discover gradual dependencies. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS), 15(5), 559–570.MATHCrossRefMathSciNet
Zurück zum Zitat Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422–426.MATHCrossRef Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422–426.MATHCrossRef
Zurück zum Zitat Calders, T., Dexters, N., & Goethals, B. (2007). Mining frequent itemsets in a stream. In Proc. of the IEEE int. conference on data mining (ICDM’07) (pp. 83–92). Calders, T., Dexters, N., & Goethals, B. (2007). Mining frequent itemsets in a stream. In Proc. of the IEEE int. conference on data mining (ICDM’07) (pp. 83–92).
Zurück zum Zitat Calders, T., Dexters, N., & Goethals, B. (2008). Mining frequent items in a stream using flexible windows. Journal of Intelligent Data Analysis, 12(3), 293–304. Calders, T., Dexters, N., & Goethals, B. (2008). Mining frequent items in a stream using flexible windows. Journal of Intelligent Data Analysis, 12(3), 293–304.
Zurück zum Zitat Charikar, M., Chen, K., & Farach-Colton, M. (2004). Finding frequent items in data streams. Theoretical Computer Sciences, 312(1), 3–15.MATHCrossRefMathSciNet Charikar, M., Chen, K., & Farach-Colton, M. (2004). Finding frequent items in data streams. Theoretical Computer Sciences, 312(1), 3–15.MATHCrossRefMathSciNet
Zurück zum Zitat Chi, Y., Wang, H., Yu, P., & Muntz, R. (2004). Moment: Maintaining closed frequent itemsets over a stream sliding window. In Proc. of the IEEE int. conference on data mining (ICDM’04). Chi, Y., Wang, H., Yu, P., & Muntz, R. (2004). Moment: Maintaining closed frequent itemsets over a stream sliding window. In Proc. of the IEEE int. conference on data mining (ICDM’04).
Zurück zum Zitat Coenen, F., Goulbourne, G., & Leng, P. (2004). Tree structures for mining association rules. Data Mining and Knowledge Discovery, 8(1), 25–51.CrossRefMathSciNet Coenen, F., Goulbourne, G., & Leng, P. (2004). Tree structures for mining association rules. Data Mining and Knowledge Discovery, 8(1), 25–51.CrossRefMathSciNet
Zurück zum Zitat Giannella, G., Han, J., Pei, J., Yan, X., & Yu, P. (2003). Mining frequent patterns in data streams at multiple time granularities. In Next generation data mining. Cambridge: MIT. Giannella, G., Han, J., Pei, J., Yan, X., & Yu, P. (2003). Mining frequent patterns in data streams at multiple time granularities. In Next generation data mining. Cambridge: MIT.
Zurück zum Zitat Greco, S., Matarazzo, B., Pappalardo, N., & Slowinski, R. (2006). Measuring expected effects of interventions based on decision rules. Journal of Experimental and Theoretical Artificial Intelligence, 17(1–2). Greco, S., Matarazzo, B., Pappalardo, N., & Slowinski, R. (2006). Measuring expected effects of interventions based on decision rules. Journal of Experimental and Theoretical Artificial Intelligence, 17(1–2).
Zurück zum Zitat Greenwa, M., & Khanna, S. (2001). Space-efficient online computation of quantile summaries. In Proc. of the ACM int. conference on management of data (SIGMOD’01) (pp. 56–66). Greenwa, M., & Khanna, S. (2001). Space-efficient online computation of quantile summaries. In Proc. of the ACM int. conference on management of data (SIGMOD’01) (pp. 56–66).
Zurück zum Zitat Han, J., Chen, Y., Dong, G., Pei, J., Wah, B., Wang, J., et al. (2005). Stream cube: An architecture for multi-dimensional analysis of data streams. Distributed and Parallel Databases, 18(2), 173–197.CrossRef Han, J., Chen, Y., Dong, G., Pei, J., Wah, B., Wang, J., et al. (2005). Stream cube: An architecture for multi-dimensional analysis of data streams. Distributed and Parallel Databases, 18(2), 173–197.CrossRef
Zurück zum Zitat Han, J., Pei, J., Mortazavi-asl, B., Chen, Q., Dayal, U., & Hsu, M. (2000). Freespan: Frequent pattern-projected sequential pattern mining. In Proc. of ACM int. conference on knowledge discovery and data mining (KDD’00). Han, J., Pei, J., Mortazavi-asl, B., Chen, Q., Dayal, U., & Hsu, M. (2000). Freespan: Frequent pattern-projected sequential pattern mining. In Proc. of ACM int. conference on knowledge discovery and data mining (KDD’00).
Zurück zum Zitat Han, J., Pei, J., Yin, Y., & Mao, R. (2004). Mining frequent patterns without candidate generation. Data Mining and Knowledge Discovery, 8, 53–87.CrossRefMathSciNet Han, J., Pei, J., Yin, Y., & Mao, R. (2004). Mining frequent patterns without candidate generation. Data Mining and Knowledge Discovery, 8, 53–87.CrossRefMathSciNet
Zurück zum Zitat Hüllermeier, E. (2002). Association rules for expressing gradual dependences. In Proc. of the 6th Eu. conf. on principles of data mining and knowledge discovery (PKDD’02) (pp. 200–211). Hüllermeier, E. (2002). Association rules for expressing gradual dependences. In Proc. of the 6th Eu. conf. on principles of data mining and knowledge discovery (PKDD’02) (pp. 200–211).
Zurück zum Zitat Jorio, L. D., Laurent, A., & Teisseire, M. (2008). Fast extraction of gradual association rules: A heuristic based method. In IEEE/ACM int. conference on soft computing as transdisciplinary science and technology (CSTST). Jorio, L. D., Laurent, A., & Teisseire, M. (2008). Fast extraction of gradual association rules: A heuristic based method. In IEEE/ACM int. conference on soft computing as transdisciplinary science and technology (CSTST).
Zurück zum Zitat Jorio, L. D., Laurent, A., & Teisseire, M. (2009). Mining for gradualness over time using sequential patterns. In Intelligent decision technologies (IDT’09). Jorio, L. D., Laurent, A., & Teisseire, M. (2009). Mining for gradualness over time using sequential patterns. In Intelligent decision technologies (IDT’09).
Zurück zum Zitat Li, H.-F., Lee, S., & Shan, M.-K. (2004). An efficient algorithm for mining frequent itemsets over the entire history of data streams. In Proc. of 1st int. workshop on knowledge discovery in data streams. Li, H.-F., Lee, S., & Shan, M.-K. (2004). An efficient algorithm for mining frequent itemsets over the entire history of data streams. In Proc. of 1st int. workshop on knowledge discovery in data streams.
Zurück zum Zitat Manku, G., & Motwani, R. (2002). Approximate frequency counts over data streams. In Proc.of 28th int. conf. very large databases. Manku, G., & Motwani, R. (2002). Approximate frequency counts over data streams. In Proc.of 28th int. conf. very large databases.
Zurück zum Zitat Masseglia, F., Poncelet, P., Teisseire, M., & Marascu, A. (2008). Web usage mining: extracting unexpected periods from web logs. Data Mining and Knowledge Discover, 16(1), 39–65.CrossRefMathSciNet Masseglia, F., Poncelet, P., Teisseire, M., & Marascu, A. (2008). Web usage mining: extracting unexpected periods from web logs. Data Mining and Knowledge Discover, 16(1), 39–65.CrossRefMathSciNet
Zurück zum Zitat Miller, R. J., & Yang, Y. (1997). Association rules over interval data. In SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD international conference on management of data (pp. 452–461). Miller, R. J., & Yang, Y. (1997). Association rules over interval data. In SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD international conference on management of data (pp. 452–461).
Zurück zum Zitat Nag, B., Deshpande, P. M., & DeWitt, D. J. (1999). Using a knowledge cache for interactive discovery of association rules. In KDD ’99: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 244–253). Nag, B., Deshpande, P. M., & DeWitt, D. J. (1999). Using a knowledge cache for interactive discovery of association rules. In KDD ’99: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 244–253).
Zurück zum Zitat Raissi, C., Poncelet, P., & Teisseire, M. (2006). Speed: Mining maximal sequential patterns over data streams. In Proc. of the 3rd IEEE int. conference on intelligent systems (IS2006). Raissi, C., Poncelet, P., & Teisseire, M. (2006). Speed: Mining maximal sequential patterns over data streams. In Proc. of the 3rd IEEE int. conference on intelligent systems (IS2006).
Zurück zum Zitat Raissi, C., Poncelet, P., & Teisseire, M. (2007). Towards a new approach for mining maximal frequent itemsets over data stream. Journal of Intelligent Information Systems (JIIS), 28(1), 23–36.CrossRef Raissi, C., Poncelet, P., & Teisseire, M. (2007). Towards a new approach for mining maximal frequent itemsets over data stream. Journal of Intelligent Information Systems (JIIS), 28(1), 23–36.CrossRef
Zurück zum Zitat Ras, Z., & Wieczorkowska, A. (2000). Action rules: how to increase profit of a company. In Proceedings of the principles of data mining and knowledge discovery (PKDD 00) (pp. 587–592). Lyon, France. Ras, Z., & Wieczorkowska, A. (2000). Action rules: how to increase profit of a company. In Proceedings of the principles of data mining and knowledge discovery (PKDD 00) (pp. 587–592). Lyon, France.
Zurück zum Zitat Skowron, A., & Synak, P. (2006). Planning based on reasoning about information changes. In Rough sets and current trends in computing (pp. 165–173). Skowron, A., & Synak, P. (2006). Planning based on reasoning about information changes. In Rough sets and current trends in computing (pp. 165–173).
Zurück zum Zitat Stollnitz, E. J., Derose, T. D., & Salesin, D. H. (1996). Wavelets for computer graphics: Theory and applications. San Mateo: Morgan Kaufmann. Stollnitz, E. J., Derose, T. D., & Salesin, D. H. (1996). Wavelets for computer graphics: Theory and applications. San Mateo: Morgan Kaufmann.
Zurück zum Zitat Teng, W.-G., Chen, M.-S., & Yu, P. (2003). A regression-based temporal patterns mining schema for data streams. In Proc. of 29th int. conf. very large databases (VLDB’03) (pp. 93–104). Teng, W.-G., Chen, M.-S., & Yu, P. (2003). A regression-based temporal patterns mining schema for data streams. In Proc. of 29th int. conf. very large databases (VLDB’03) (pp. 93–104).
Zurück zum Zitat Torra, V. (1997). The weighted owa operator. International Journal of Intelligent Systems, 12, 153–166.MATHCrossRef Torra, V. (1997). The weighted owa operator. International Journal of Intelligent Systems, 12, 153–166.MATHCrossRef
Zurück zum Zitat Torra, V. (2004). Owa operators in data modeling and re-identification. IEEE Transaction on Fuzzy Systems, 12(5), 652–660.CrossRef Torra, V. (2004). Owa operators in data modeling and re-identification. IEEE Transaction on Fuzzy Systems, 12(5), 652–660.CrossRef
Zurück zum Zitat Torra, V., & Narukawa, Y. (2007). Modeling decisions: Information fusion and aggregation operators. New York: Springer. Torra, V., & Narukawa, Y. (2007). Modeling decisions: Information fusion and aggregation operators. New York: Springer.
Zurück zum Zitat Torra, V., & Nin, J. (2008). Record linkage for database integration using fuzzy integrals. International Journal of Intelligent Systems, 23(6), 715–734.MATHCrossRef Torra, V., & Nin, J. (2008). Record linkage for database integration using fuzzy integrals. International Journal of Intelligent Systems, 23(6), 715–734.MATHCrossRef
Zurück zum Zitat Tsay, L.-S., & Ras, Z. (2005). Action rules discovery system dear, method and experiments. Journal of Experimental and Theoretical Artificial Intelligence, 17(1–2), 119–128.MATHCrossRef Tsay, L.-S., & Ras, Z. (2005). Action rules discovery system dear, method and experiments. Journal of Experimental and Theoretical Artificial Intelligence, 17(1–2), 119–128.MATHCrossRef
Zurück zum Zitat Tzacheva, A., & Ras, Z. (2005). Action rules mining. International Journal of Intelligent Systems, 20(7), 719–736.MATHCrossRef Tzacheva, A., & Ras, Z. (2005). Action rules mining. International Journal of Intelligent Systems, 20(7), 719–736.MATHCrossRef
Zurück zum Zitat Verma, K., Vyas, O., & Vyas, R. (2005). Temporal approach to association rule mining using t-tree and p-tree. In Machine learning and data mining in pattern recognition. Lecture notes in computer sciences (Vol. 3587, pp. 651–659). Springer. Verma, K., Vyas, O., & Vyas, R. (2005). Temporal approach to association rule mining using t-tree and p-tree. In Machine learning and data mining in pattern recognition. Lecture notes in computer sciences (Vol. 3587, pp. 651–659). Springer.
Zurück zum Zitat Yager, R. R. (1988). On ordered weighted averaging aggregation operators in multi-criteria decision making. IEEE Transaction on Systems, Man and Cybernetics, 18, 183–190.MATHCrossRefMathSciNet Yager, R. R. (1988). On ordered weighted averaging aggregation operators in multi-criteria decision making. IEEE Transaction on Systems, Man and Cybernetics, 18, 183–190.MATHCrossRefMathSciNet
Metadaten
Titel
Speed up gradual rule mining from stream data! A B-Tree and OWA-based approach
verfasst von
Jordi Nin
Anne Laurent
Pascal Poncelet
Publikationsdatum
01.12.2010
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 3/2010
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-009-0112-9

Weitere Artikel der Ausgabe 3/2010

Journal of Intelligent Information Systems 3/2010 Zur Ausgabe

Premium Partner