Skip to main content

2021 | OriginalPaper | Buchkapitel

Text Classification Using FP-Growth Association Rule and Updating the Term Weight

verfasst von : Santosh K. Vishwakarma, Akhilesh Kumar Sharma, Sourabh Singh Verma, Meghna Utmal

Erschienen in: Innovations in Information and Communication Technologies (IICT-2020)

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Text classification plays a vital role in many real-life applications. There are different methods for text classification primarily Naive Bayes classifier, support vector machine, etc. A good text classifier must efficiently classify large set of unstructured documents with optimal accuracy. Many techniques have been proposed for text classification. In this paper, we propose an integrated approach for text classification which works in two phases. In initial preprocess phase, we select the frequent terms and adjust the term weight by use of information gain and support vector machines. Second phase consists of applying Naïve Bayes classifier to the document vector. The experiment has been performed on the open research dataset of Forum of Information Retrieval (FIRE). In association rule, the correlation between data items is obtained with no requirement of external knowledge, whereas in classification, attention is given to small set of rules with the help of external knowledge. The proposed work uses FP-growth algorithm with absolute pruning for obtaining frequent text sets, and then, Naïve Bayes classifier model is used for training and constructing a model for classification. Our experimental result shows increase in efficiency while comparing with other traditional text classification methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. VLDB 1994. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. VLDB 1994.
Zurück zum Zitat Chen, L. (2012). The research of data mining algorithm based on association rules. In International Conference on Computer Application and System Modeling Chen, L. (2012). The research of data mining algorithm based on association rules. In International Conference on Computer Application and System Modeling
Zurück zum Zitat Dsouza, K. J., Ansari, Z. A. (2015). A novel data mining approach for multi variant text classification. In IEEE International Conference on Cloud Computing in Emerging Markets. Dsouza, K. J., Ansari, Z. A. (2015). A novel data mining approach for multi variant text classification. In IEEE International Conference on Cloud Computing in Emerging Markets.
Zurück zum Zitat Fayyad, U., Piatetsky Shapiro, G. & Smyth, P. (1996). Data mining to knowledge discovery: an overview. In: Advances in knowledge discovery and data mining, Cambridge, USA, pp. 1–3. Fayyad, U., Piatetsky Shapiro, G. & Smyth, P. (1996). Data mining to knowledge discovery: an overview. In: Advances in knowledge discovery and data mining, Cambridge, USA, pp. 1–3.
Zurück zum Zitat Han, J., & Fu, Y. (1995). Discovery of multiple-level association rules from large databases. Very large data bases (VLDB’95), Zürich, Switzerland, September 1995, pp. 420–431. Han, J., & Fu, Y. (1995). Discovery of multiple-level association rules from large databases. Very large data bases (VLDB’95), Zürich, Switzerland, September 1995, pp. 420–431.
Zurück zum Zitat Han, J., & Kamber, M. (2000). Data mining: concepts and techniques. San Francisco, CA: Morgan Kaufmann Publishers. Han, J., & Kamber, M. (2000). Data mining: concepts and techniques. San Francisco, CA: Morgan Kaufmann Publishers.
Zurück zum Zitat Hossein, S. M., Araghi, M. S., & Farahani, M. M. (2016). A novel text mining approach based on TF-IDF and support vector machine for news classification. In 2nd IEEE International Conference on Engineering and Technology (ICETECH), 17th & 18th March, 2016, Coimbatore, TN, India. Hossein, S. M., Araghi, M. S., & Farahani, M. M. (2016). A novel text mining approach based on TF-IDF and support vector machine for news classification. In 2nd IEEE International Conference on Engineering and Technology (ICETECH), 17th & 18th March, 2016, Coimbatore, TN, India.
Zurück zum Zitat Junrui, Y., Lisha, X., & Hongd, H. (2012). A classification algorithm based on association rule mining. In International Conference on Computer Science and Service System. Junrui, Y., Lisha, X., & Hongd, H. (2012). A classification algorithm based on association rule mining. In International Conference on Computer Science and Service System.
Zurück zum Zitat Kalpana, R., & Bansal, K. L. (2014). Comparative study of data mining tools. International Journal of Advanced Research in Computer Science and Software Engineering Research Paper, 4(6). ISSN: 2277 128X. Kalpana, R., & Bansal, K. L. (2014). Comparative study of data mining tools. International Journal of Advanced Research in Computer Science and Software Engineering Research Paper, 4(6). ISSN: 2277 128X.
Zurück zum Zitat Kamruzzaman, S. M., Haider, F., & Hasan, A. R. (2010). Text classification using association rule with a hybrid concept of naive Bayes classifier and genetic algorithm. arXiv preprint arXiv:1009.4976 Kamruzzaman, S. M., Haider, F., & Hasan, A. R. (2010). Text classification using association rule with a hybrid concept of naive Bayes classifier and genetic algorithm. arXiv preprint arXiv:​1009.​4976
Zurück zum Zitat Karthik, P., Saurabh, M., & Chandrashekhar, U. (2016). Classification of text document using association rule mining with critical relative support-based pruning. In Conference on Advances in Computing, Communications, and Informatics (ICACCI), September 21–24, 2016, Jaipur, India. Karthik, P., Saurabh, M., & Chandrashekhar, U. (2016). Classification of text document using association rule mining with critical relative support-based pruning. In Conference on Advances in Computing, Communications, and Informatics (ICACCI), September 21–24, 2016, Jaipur, India.
Zurück zum Zitat Li, W., Han, J., & Pei, J. (2001). Accurate and efficient classification based on multiple-class association rule. CMAR. Li, W., Han, J., & Pei, J. (2001). Accurate and efficient classification based on multiple-class association rule. CMAR.
Zurück zum Zitat Li, J. J., Fong, S., & Li, Y. Z. (2014). Hierarchical classification in text mining for sentiment analysis. In 2014 International Conference on Soft Computing & Machine Intelligence. Li, J. J., Fong, S., & Li, Y. Z. (2014). Hierarchical classification in text mining for sentiment analysis. In 2014 International Conference on Soft Computing & Machine Intelligence.
Zurück zum Zitat Liu, G., & Fu, W. (2015). An association rules text mining algorithm fusion with K-means improvement. In 2015 4th International Conference on Computer Science and Network Technology (ICCSNT 2015). Liu, G., & Fu, W. (2015). An association rules text mining algorithm fusion with K-means improvement. In 2015 4th International Conference on Computer Science and Network Technology (ICCSNT 2015).
Zurück zum Zitat Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining in knowledge discovery in databases. Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining in knowledge discovery in databases.
Zurück zum Zitat Liu, B., Ma, Y., & Wong, C.-K. (2001). Classification using association rules: weaknesses and enhancements. Data mining for scientific applications. Liu, B., Ma, Y., & Wong, C.-K. (2001). Classification using association rules: weaknesses and enhancements. Data mining for scientific applications.
Zurück zum Zitat Moreno, M. N., & Segrera, S. (2005). Association rules: Problems, solutions and new applications. ISBN: 84-9732-449-8 Moreno, M. N., & Segrera, S. (2005). Association rules: Problems, solutions and new applications. ISBN: 84-9732-449-8
Zurück zum Zitat Rahman, C. M., Sohel, F. A., Naushad, P., & Kamruzzaman, S. M. (2010). Text classification using the concept of association rule of data mining. arXiv preprint arXiv:1009.4582 Rahman, C. M., Sohel, F. A., Naushad, P., & Kamruzzaman, S. M. (2010). Text classification using the concept of association rule of data mining. arXiv preprint arXiv:​1009.​4582
Zurück zum Zitat Srikant, S., & Agrawal, R. (1997). Mining generalized association rules. Future Generation Computer Systems, 13, 2–3. Srikant, S., & Agrawal, R. (1997). Mining generalized association rules. Future Generation Computer Systems, 13, 2–3.
Zurück zum Zitat Vishwakarma, S. K., Lakhtariab, K. I., Bhatnagar, D., & Sharma, A. K. (2015). Monolingual information retrieval using terrier: FIRE 2010 experiments based on n-gram indexing. In 3rd International Conference on Recent Trends in Computing 2015 (ICRTC-2015). Vishwakarma, S. K., Lakhtariab, K. I., Bhatnagar, D., & Sharma, A. K. (2015). Monolingual information retrieval using terrier: FIRE 2010 experiments based on n-gram indexing. In 3rd International Conference on Recent Trends in Computing 2015 (ICRTC-2015).
Zurück zum Zitat Wikarsa, L., & Thahir, S. N. (2015). A text mining application of emotion classifications of twitter’s users using naïve based method. IEEE. Wikarsa, L., & Thahir, S. N. (2015). A text mining application of emotion classifications of twitter’s users using naïve based method. IEEE.
Zurück zum Zitat Zhou, Y., Tong, Y., Gu, R., & Gall, H. (2014). Combining text mining and data mining for bug report classification. In 2014 IEEE International Conference on Software Maintenance and Evolution. Zhou, Y., Tong, Y., Gu, R., & Gall, H. (2014). Combining text mining and data mining for bug report classification. In 2014 IEEE International Conference on Software Maintenance and Evolution.
Metadaten
Titel
Text Classification Using FP-Growth Association Rule and Updating the Term Weight
verfasst von
Santosh K. Vishwakarma
Akhilesh Kumar Sharma
Sourabh Singh Verma
Meghna Utmal
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-66218-9_47