Skip to main content

2018 | OriginalPaper | Buchkapitel

Bundles: A Framework to Optimise Topic Analysis in Real-Time Chat Discourse

verfasst von : Jonathan Dunne, David Malone, Andrew Penrose

Erschienen in: Collaboration and Technology

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Collaborative chat tools and large text corpora are ubiquitous in today’s world of real-time communication. As micro teams and start-ups adopt such tools, there is a need to understand the meaning (even at a high level) of chat conversations within collaborative teams. In this study, we propose a technique to segment chat conversations to increase the number of words available (19% on average) for text mining purposes. Using an open source dataset, we answer the question of whether having more words available for text mining can produce more useful information to the end user. Our technique can help micro-teams and start-ups with limited resources to efficiently model their conversations to afford a higher degree of readability and comprehension.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
13.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)MATH
14.
Zurück zum Zitat Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283 (1975)CrossRef Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283 (1975)CrossRef
15.
Zurück zum Zitat Dale, E., Chall, J.S.: A formula for predicting readability: instructions. Educ. Res. Bull. 27, 37–54 (1948) Dale, E., Chall, J.S.: A formula for predicting readability: instructions. Educ. Res. Bull. 27, 37–54 (1948)
16.
Zurück zum Zitat Diao, Q., Jiang, J., Zhu, F., Lim, E.P.: Finding bursty topics from microblogs. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 536–544. Association for Computational Linguistics (2012) Diao, Q., Jiang, J., Zhu, F., Lim, E.P.: Finding bursty topics from microblogs. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 536–544. Association for Computational Linguistics (2012)
17.
Zurück zum Zitat Galton, F.: Regression towards mediocrity in hereditary stature. J. Anthropol. Inst. Great Br. Irel. 15, 246–263 (1886)CrossRef Galton, F.: Regression towards mediocrity in hereditary stature. J. Anthropol. Inst. Great Br. Irel. 15, 246–263 (1886)CrossRef
18.
Zurück zum Zitat Gunning, R.: The Technique of Clear Writing. McGraw-Hill, New York (1952) Gunning, R.: The Technique of Clear Writing. McGraw-Hill, New York (1952)
19.
Zurück zum Zitat Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999) Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999)
20.
Zurück zum Zitat Jivani, A.G., et al.: A comparative study of stemming algorithms. Int. J. Comput. Technol. Appl. 2(6), 1930–1938 (2011) Jivani, A.G., et al.: A comparative study of stemming algorithms. Int. J. Comput. Technol. Appl. 2(6), 1930–1938 (2011)
21.
Zurück zum Zitat Jurafsky, D., Martin, J.H.: Speech and Language Processing, vol. 3. Pearson, London (2014) Jurafsky, D., Martin, J.H.: Speech and Language Processing, vol. 3. Pearson, London (2014)
22.
Zurück zum Zitat Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975) Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)
23.
Zurück zum Zitat Kučera, H., Francis, W.N.: Computational Analysis of Present-Day American English. Dartmouth Publishing Group, London (1967) Kučera, H., Francis, W.N.: Computational Analysis of Present-Day American English. Dartmouth Publishing Group, London (1967)
24.
Zurück zum Zitat Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)CrossRef Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)CrossRef
25.
Zurück zum Zitat Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2014)CrossRef Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2014)CrossRef
26.
Zurück zum Zitat Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11(1–2), 22–31 (1968) Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11(1–2), 22–31 (1968)
27.
Zurück zum Zitat Luhn, H.P.: Key word-in-context index for technical literature (KWIC index). J. Assoc. Inf. Sci. Technol. 11(4), 288–295 (1960) Luhn, H.P.: Key word-in-context index for technical literature (KWIC index). J. Assoc. Inf. Sci. Technol. 11(4), 288–295 (1960)
29.
Zurück zum Zitat Naveed, N., Gottron, T., Kunegis, J., Alhadi, A.C.: Searching microblogs: coping with sparsity and document quality. In: Proceedings of the 20th Acm International Conference on Information and Knowledge Management, pp. 183–188. ACM (2011) Naveed, N., Gottron, T., Kunegis, J., Alhadi, A.C.: Searching microblogs: coping with sparsity and document quality. In: Proceedings of the 20th Acm International Conference on Information and Knowledge Management, pp. 183–188. ACM (2011)
30.
Zurück zum Zitat Schofield, A., Mimno, D.: Comparing apples to apple: the effects of stemmers on topic models. Trans. Assoc. Comput. Linguist. 4, 287–300 (2016) Schofield, A., Mimno, D.: Comparing apples to apple: the effects of stemmers on topic models. Trans. Assoc. Comput. Linguist. 4, 287–300 (2016)
31.
Zurück zum Zitat Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed representations of words. In: VS@ HLT-NAACL, pp. 192–200 (2015) Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed representations of words. In: VS@ HLT-NAACL, pp. 192–200 (2015)
32.
Zurück zum Zitat Webster, J.J., Kit, C.: Tokenization as the initial phase in NLP. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 4, pp. 1106–1110. Association for Computational Linguistics (1992) Webster, J.J., Kit, C.: Tokenization as the initial phase in NLP. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 4, pp. 1106–1110. Association for Computational Linguistics (1992)
33.
Zurück zum Zitat Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456. ACM (2013) Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456. ACM (2013)
34.
Zurück zum Zitat Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: AAAI, pp. 353–359 (2015) Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: AAAI, pp. 353–359 (2015)
35.
Zurück zum Zitat Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242. ACM (2014) Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242. ACM (2014)
36.
Zurück zum Zitat Zuo, Y., et al.: Topic modeling of short texts: a pseudo-document view. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2105–2114. ACM (2016) Zuo, Y., et al.: Topic modeling of short texts: a pseudo-document view. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2105–2114. ACM (2016)
Metadaten
Titel
Bundles: A Framework to Optimise Topic Analysis in Real-Time Chat Discourse
verfasst von
Jonathan Dunne
David Malone
Andrew Penrose
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-99504-5_6