nach oben

Erschienen in:

2019 | OriginalPaper | Buchkapitel

Categorizing Emails Using Machine Learning with Textual Features

verfasst von : Haoran Zhang, Jagadish Rangrej, Saad Rais, Michael Hillmer, Frank Rudzicz, Kamil Malikov

Erschienen in: Advances in Artificial Intelligence

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We developed an application that automates the process of assigning emails received in a generic request inbox to one of fourteen predefined topic categories. To build this application, we compared the performance of several classifiers in predicting the topic category, using an email dataset extracted from this inbox, which consisted of 8,841 emails over three years. The algorithms ranged from linear classifiers operating on n-gram features to deep learning techniques such as CNNs and LSTMs. For our objective, we found that the best-performing model was a logistic regression classifier using n-grams with TF-IDF weights, achieving 90.9% accuracy. The traditional models performed better than the deep learning models for this dataset, likely in part due to the small dataset size, and also because this particular classification task may not require the ordered sequence representation of tokens that deep learning models provide. Eventually, a bagged voting model was selected which combines the predictive power of the top eight models, with accuracy of 92.7%, surpassing the performance of any of the individual models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nächstes Kapitel Weakly Supervised, Data-Driven Acquisition of Rules for Open Information Extraction

Yang, J., Park, S.-Y.: Email categorization using fast machine learning algorithms. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 316–323. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36182-0_31CrossRef

Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2001)CrossRef

Provost, J.: Naïve-Bayes vs. rule-learning in classification of email. University of Texas at Austin, Artificial Intelligence Lab, CiteSeer (Ingebrigsten), pp. 1–4 (1999)

Zhou, C., Sun, C., Liu, Z., Lau, F.C.M.: A C-LSTM Neural Network for Text Classification. ArXiv e-prints, November 2015

Zhang, X., Zhao, J., LeCun, Y.: Character-level Convolutional Networks for Text Classification, pp. 1–9 (2015)

Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI-29, pp. 2267–2273 (2015)

Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of Tricks for Efficient Text Classification (2016)

Johnson, R., Zhang, T.: Effective Use of Word Order for Text Categorization with Convolutional Neural Networks (2011, 2014)

Kim, T., Yang, J.: Abstractive Text Classification Using Sequence-to-convolution Neural Networks (2018)

10.

Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python (2009)

11.

Pedregosa, F., Varoquaux, G., Gramfort, A.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2012)MathSciNetMATH

12.

Chen, T., Guestrin, C.: XGBoost: A Scalable Tree Boosting System (2016)

13.

He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRef

14.

Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval Introduction, vol. 35 (2008)

15.

Lewis, D.D.: Feature selection and feature extraction for text categorization. Speech and natural language. In: Proceedings of a Workshop Held at Harriman, New York, 23–26 February 1992 (1992)

16.

Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on EMNLP, pp. 1532–1543 (2014)

17.

Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef

18.

Conneau, A., Schwenk, H., Le Cun, Y., Barrault, L.: Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the EACL, vol. 1, pp. 1107–1116 (2017)

19.

Jurafsky, D., Martin, J.: Speech & Language Processing, 2 edn., London (2014)

20.

Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, pp. 1–4 (2003)

21.

Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MathSciNetMATH

22.

Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the ACL, pp. 142–150 (2011)

23.

Luong, M.T., Manning, C.D.: Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models (2016)

24.

Bahdanau, D., Bosc, T.: Learning to Compute Word Embeddings on the Fly (2018)

25.

Gordan, M., Kochen, M.: Recall-precision trade-off : a derivation. J. Am. Soc. Inf. Sci. 40 145 (1989, 1998)

26.

Fisher, D.: Knowledge acquisition via incremental clustering. Mach. Learn. 2(1980), 139–182 (1987)

27.

Choi, J.D., Tetreault, J., Stent, A.: It Depends: Dependency Parser Comparison Using a Web-based Evaluation Tool, pp. 387–396 (2015)

28.

Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. CoRR abs/1607.04606 (2016)

Titel: Categorizing Emails Using Machine Learning with Textual Features
verfasst von: Haoran Zhang
Jagadish Rangrej
Saad Rais
Michael Hillmer
Frank Rudzicz
Kamil Malikov
Verlag: Springer International Publishing
Buch: Advances in Artificial Intelligence
Print ISBN: 978-3-030-18304-2

Electronic ISBN: 978-3-030-18305-9

Copyright-Jahr: 2019
DOI: https://doi.org/10.1007/978-3-030-18305-9_1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner