Skip to main content

2018 | OriginalPaper | Buchkapitel

J48S: A Sequence Classification Approach to Text Analysis Based on Decision Trees

verfasst von : Andrea Brunello, Enrico Marzano, Angelo Montanari, Guido Sciavicco

Erschienen in: Information and Software Technologies

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Sequences play a major role in the extraction of information from data. As an example, in business intelligence, they can be used to track the evolution of customer behaviors over time or to model relevant relationships. In this paper, we focus our attention on the domain of contact centers, where sequential data typically take the form of oral or written interactions, and word sequences often play a major role in text classification, and we investigate the connections between sequential data and text mining techniques. The main contribution of the paper is a new machine learning algorithm, called J48S, that associates semantic knowledge with telephone conversations. The proposed solution is based on the well-known C4.5 decision tree learner, and it is natively able to mix static, that is, numeric or categorical, data and sequential ones, such as texts, for classification purposes. The algorithm, evaluated in a real business setting, is shown to provide competitive classification performances compared with classical approaches, while generating highly interpretable models and effectively reducing the data preparation effort.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
A detailed account of these aspects is the object of a forthcoming work about the whole speech analytics process.
 
Literatur
1.
Zurück zum Zitat Saberi, M., Khadeer Hussain, O., Chang, E.: Past, present and future of contact centers: a literature review. Bus. Process. Manag. J. 23(3), 574–597 (2017)CrossRef Saberi, M., Khadeer Hussain, O., Chang, E.: Past, present and future of contact centers: a literature review. Bus. Process. Manag. J. 23(3), 574–597 (2017)CrossRef
3.
Zurück zum Zitat Pandharipande, M.A., Kopparapu, S.K.: A novel approach to identify problematic call center conversations. In: Ninth International Joint Conference on Computer Science and Software Engineering (JCSSE 2012), pp. 1–5 (2012) Pandharipande, M.A., Kopparapu, S.K.: A novel approach to identify problematic call center conversations. In: Ninth International Joint Conference on Computer Science and Software Engineering (JCSSE 2012), pp. 1–5 (2012)
4.
Zurück zum Zitat Garnier-Rizet, M., et al.: CallSurf: automatic transcription, indexing and structuration of call center conversational speech for knowledge extraction and query by content. In: Sixth International Conference on Language Resources and Evaluation (LREC 2008), pp. 2623–2628 (2008) Garnier-Rizet, M., et al.: CallSurf: automatic transcription, indexing and structuration of call center conversational speech for knowledge extraction and query by content. In: Sixth International Conference on Language Resources and Evaluation (LREC 2008), pp. 2623–2628 (2008)
5.
Zurück zum Zitat Quinlan, J.R.: Simplifying decision trees. Int. J. Man Mach. Stud. 27(3), 221–234 (1987)CrossRef Quinlan, J.R.: Simplifying decision trees. Int. J. Man Mach. Stud. 27(3), 221–234 (1987)CrossRef
7.
Zurück zum Zitat Gans, N., Koole, G., Mandelbaum, A.: Telephone call centers: tutorial, review, and research prospects. Manuf. Serv. Oper. Manag. 5(2), 79–141 (2003)CrossRef Gans, N., Koole, G., Mandelbaum, A.: Telephone call centers: tutorial, review, and research prospects. Manuf. Serv. Oper. Manag. 5(2), 79–141 (2003)CrossRef
8.
Zurück zum Zitat Fan, W., et al.: Direct mining of discriminative and essential frequent patterns via model-based search tree. In: Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 230–238 (2008) Fan, W., et al.: Direct mining of discriminative and essential frequent patterns via model-based search tree. In: Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 230–238 (2008)
9.
Zurück zum Zitat Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2016) Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2016)
10.
Zurück zum Zitat Quinlan, J.R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4(1), 77–90 (1996)CrossRef Quinlan, J.R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4(1), 77–90 (1996)CrossRef
11.
Zurück zum Zitat Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM Comput. Surv. 43(1), 1–41 (2010)CrossRef Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM Comput. Surv. 43(1), 1–41 (2010)CrossRef
12.
Zurück zum Zitat Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recognit. 1(1), 54–77 (2017) Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recognit. 1(1), 54–77 (2017)
13.
Zurück zum Zitat Agrawal, R., Srikant, R.: Mining sequential patterns. In: Eleventh IEEE International Conference on Data Engineering (ICDE 1995), pp. 3–14 (1995) Agrawal, R., Srikant, R.: Mining sequential patterns. In: Eleventh IEEE International Conference on Data Engineering (ICDE 1995), pp. 3–14 (1995)
14.
Zurück zum Zitat Pei, J., et al.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)CrossRef Pei, J., et al.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)CrossRef
15.
Zurück zum Zitat Zaki, M.J.: Spade: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1), 31–60 (2001)CrossRef Zaki, M.J.: Spade: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1), 31–60 (2001)CrossRef
16.
Zurück zum Zitat Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002), pp. 429–435 (2002) Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002), pp. 429–435 (2002)
17.
Zurück zum Zitat Yan, X., Han, J., Afshar, R.: CloSpan: mining closed sequential patterns in large datasets. In: 2003 SIAM International Conference on Data Mining (SIAM 2003), pp. 166–177 (2003)CrossRef Yan, X., Han, J., Afshar, R.: CloSpan: mining closed sequential patterns in large datasets. In: 2003 SIAM International Conference on Data Mining (SIAM 2003), pp. 166–177 (2003)CrossRef
18.
Zurück zum Zitat Wang, J., Han, J.: BIDE: efficient mining of frequent closed sequences. In: Twentieth IEEE International Conference on Data Engineering (ICDE 2004), pp. 79–90 (2004) Wang, J., Han, J.: BIDE: efficient mining of frequent closed sequences. In: Twentieth IEEE International Conference on Data Engineering (ICDE 2004), pp. 79–90 (2004)
20.
21.
Zurück zum Zitat Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)CrossRef Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)CrossRef
22.
Zurück zum Zitat Lo, D., Khoo, S.C., Li, J.: Mining and ranking generators of sequential patterns. In: 2008 SIAM International Conference on Data Mining (SIAM 2008), pp. 553–564 (2008)CrossRef Lo, D., Khoo, S.C., Li, J.: Mining and ranking generators of sequential patterns. In: 2008 SIAM International Conference on Data Mining (SIAM 2008), pp. 553–564 (2008)CrossRef
23.
Zurück zum Zitat Duong, H., Truong, T., Le, B.: Efficient algorithms for simultaneously mining concise representations of sequential patterns based on extended pruning conditions. Eng. Appl. Artif. Intell. 67, 197–210 (2018)CrossRef Duong, H., Truong, T., Le, B.: Efficient algorithms for simultaneously mining concise representations of sequential patterns based on extended pruning conditions. Eng. Appl. Artif. Intell. 67, 197–210 (2018)CrossRef
24.
Zurück zum Zitat Cheng, H., Yan, X., Han, J., Hsu, C.W.: Discriminative frequent pattern analysis for effective classification. In: Twenty-Third IEEE International Conference on Data Engineering (ICDE 2007), pp. 716–725 (2007) Cheng, H., Yan, X., Han, J., Hsu, C.W.: Discriminative frequent pattern analysis for effective classification. In: Twenty-Third IEEE International Conference on Data Engineering (ICDE 2007), pp. 716–725 (2007)
25.
Zurück zum Zitat Jun, B.H., Kim, C.S., Song, H.Y., Kim, J.: A new criterion in selection and discretization of attributes for the generation of decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 19(12), 1371–1375 (1997)CrossRef Jun, B.H., Kim, C.S., Song, H.Y., Kim, J.: A new criterion in selection and discretization of attributes for the generation of decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 19(12), 1371–1375 (1997)CrossRef
26.
Zurück zum Zitat Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (ASRU 2011), pp. 1–4 (2011) Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (ASRU 2011), pp. 1–4 (2011)
27.
Zurück zum Zitat Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato (1999) Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato (1999)
Metadaten
Titel
J48S: A Sequence Classification Approach to Text Analysis Based on Decision Trees
verfasst von
Andrea Brunello
Enrico Marzano
Angelo Montanari
Guido Sciavicco
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-99972-2_19

Premium Partner