Skip to main content

Tipp

Weitere Kapitel dieses Buchs durch Wischen aufrufen

2020 | OriginalPaper | Buchkapitel

How to Detect Novelty in Textual Data Streams? A Comparative Study of Existing Methods

verfasst von : Clément Christophe, Julien Velcin, Jairo Cugliari, Philippe Suignard, Manel Boumghar

Erschienen in: Advanced Analytics and Learning on Temporal Data

Verlag: Springer International Publishing

share
TEILEN

Abstract

Since datasets with annotation for novelty at the document and/or word level are not easily available, we present a simulation framework that allows us to create different textual datasets in which we control the way novelty occurs. We also present a benchmark of existing methods for novelty detection in textual data streams. We define a few tasks to solve and compare several state-of-the-art methods. The simulation framework allows us to evaluate their performances according to a set of limited scenarios and test their sensitivity to some parameters. Finally, we experiment with the same methods on different kinds of novelty in the New York Times Annotated Dataset.

Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 69.000 Bücher
  • über 500 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 50.000 Bücher
  • über 380 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe



 


Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 58.000 Bücher
  • über 300 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko





Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Literatur
1.
Zurück zum Zitat Allan, J., Lavrenko, V., Malin, D., Swan, R.: Detections, bounds, and timelines: UMass and TDT-3. In: Proceedings of Topic Detection and Tracking Workshop, pp. 167–174. SN (2000) Allan, J., Lavrenko, V., Malin, D., Swan, R.: Detections, bounds, and timelines: UMass and TDT-3. In: Proceedings of Topic Detection and Tracking Workshop, pp. 167–174. SN (2000)
2.
Zurück zum Zitat Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM (2006) Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM (2006)
3.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003) MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003) MATH
4.
Zurück zum Zitat Eckhoff, R., Markus, M., Lassnig, M., Schon, S.: Detecting weak signals with technologies overview of current technology-enhanced approaches for the detection of weak signals. Int. J. Trends Econ. Manag. Technol. (IJTEMT) 3(5) (2014) Eckhoff, R., Markus, M., Lassnig, M., Schon, S.: Detecting weak signals with technologies overview of current technology-enhanced approaches for the detection of weak signals. Int. J. Trends Econ. Manag. Technol. (IJTEMT) 3(5) (2014)
5.
Zurück zum Zitat Gerrish, S., Blei, D.M.: A language-based approach to measuring scholarly impact. In: ICML, vol. 10, pp. 375–382. Citeseer (2010) Gerrish, S., Blei, D.M.: A language-based approach to measuring scholarly impact. In: ICML, vol. 10, pp. 375–382. Citeseer (2010)
6.
Zurück zum Zitat Hiltunen, E., et al.: Weak signals in organizational futures learning. Helsinki School of Economics (2010) Hiltunen, E., et al.: Weak signals in organizational futures learning. Helsinki School of Economics (2010)
7.
Zurück zum Zitat Lau, J.H., Collier, N., Baldwin, T.: On-line trend analysis with topic models: \(\backslash \)# Twitter trends detection topic model online. In: Proceedings of COLING 2012, pp. 1519–1534 (2012) Lau, J.H., Collier, N., Baldwin, T.: On-line trend analysis with topic models: \(\backslash \)# Twitter trends detection topic model online. In: Proceedings of COLING 2012, pp. 1519–1534 (2012)
9.
Zurück zum Zitat Mannermaa, M.: Heikoista signaaleista vahva tulevaisuus. Wsoy (2004) Mannermaa, M.: Heikoista signaaleista vahva tulevaisuus. Wsoy (2004)
10.
Zurück zum Zitat Markou, M., Singh, S.: Novelty detection: a review–part 1: statistical approaches. Signal Process. 83(12), 2481–2497 (2003) CrossRef Markou, M., Singh, S.: Novelty detection: a review–part 1: statistical approaches. Signal Process. 83(12), 2481–2497 (2003) CrossRef
11.
Zurück zum Zitat Marsland, S.: Novelty detection in learning systems. Neural Comput. Surv. 3(2), 157–195 (2003) Marsland, S.: Novelty detection in learning systems. Neural Comput. Surv. 3(2), 157–195 (2003)
12.
Zurück zum Zitat Metzler, D., Cai, C., Hovy, E.: Structured event retrieval over microblog archives. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 646–655. Association for Computational Linguistics (2012) Metzler, D., Cai, C., Hovy, E.: Structured event retrieval over microblog archives. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 646–655. Association for Computational Linguistics (2012)
13.
Zurück zum Zitat Murena, P.A., Al-Ghossein, M., Abdessalem, T., Cornuéjols, A.: Adaptive window strategy for topic modeling in document streams. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2018) Murena, P.A., Al-Ghossein, M., Abdessalem, T., Cornuéjols, A.: Adaptive window strategy for topic modeling in document streams. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2018)
14.
Zurück zum Zitat Ng, K.W., Tsai, F.S., Chen, L., Goh, K.C.: Novelty detection for text documents using named entity recognition. In: 2007 6th International Conference on Information, Communications & Signal Processing, pp. 1–5. IEEE (2007) Ng, K.W., Tsai, F.S., Chen, L., Goh, K.C.: Novelty detection for text documents using named entity recognition. In: 2007 6th International Conference on Information, Communications & Signal Processing, pp. 1–5. IEEE (2007)
15.
Zurück zum Zitat Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000) CrossRef Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000) CrossRef
16.
Zurück zum Zitat Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014) CrossRef Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014) CrossRef
17.
Zurück zum Zitat Ritter, G., Gallegos, M.T.: Outliers in statistical pattern recognition and an application to automatic chromosome classification. Pattern Recogn. Lett. 18(6), 525–539 (1997) CrossRef Ritter, G., Gallegos, M.T.: Outliers in statistical pattern recognition and an application to automatic chromosome classification. Pattern Recogn. Lett. 18(6), 525–539 (1997) CrossRef
18.
Zurück zum Zitat Suzuki, Y., Fukumoto, F.: Detection of topic and its extrinsic evaluation through multi-document summarization. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 241–246 (2014) Suzuki, Y., Fukumoto, F.: Detection of topic and its extrinsic evaluation through multi-document summarization. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 241–246 (2014)
19.
Zurück zum Zitat Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: Topicsketch: real-time bursty topic detection from Twitter. IEEE Trans. Knowl. Data Eng. 28(8), 2216–2229 (2016) CrossRef Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: Topicsketch: real-time bursty topic detection from Twitter. IEEE Trans. Knowl. Data Eng. 28(8), 2216–2229 (2016) CrossRef
20.
Zurück zum Zitat Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic-conditioned novelty detection. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 688–693. ACM (2002) Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic-conditioned novelty detection. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 688–693. ACM (2002)
Metadaten
Titel
How to Detect Novelty in Textual Data Streams? A Comparative Study of Existing Methods
verfasst von
Clément Christophe
Julien Velcin
Jairo Cugliari
Philippe Suignard
Manel Boumghar
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-39098-3_9

Premium Partner