Skip to main content
Top

Hint

Swipe to navigate through the chapters of this book

2020 | OriginalPaper | Chapter

How to Detect Novelty in Textual Data Streams? A Comparative Study of Existing Methods

Authors : Clément Christophe, Julien Velcin, Jairo Cugliari, Philippe Suignard, Manel Boumghar

Published in: Advanced Analytics and Learning on Temporal Data

Publisher: Springer International Publishing

share
SHARE

Abstract

Since datasets with annotation for novelty at the document and/or word level are not easily available, we present a simulation framework that allows us to create different textual datasets in which we control the way novelty occurs. We also present a benchmark of existing methods for novelty detection in textual data streams. We define a few tasks to solve and compare several state-of-the-art methods. The simulation framework allows us to evaluate their performances according to a set of limited scenarios and test their sensitivity to some parameters. Finally, we experiment with the same methods on different kinds of novelty in the New York Times Annotated Dataset.
Literature
1.
go back to reference Allan, J., Lavrenko, V., Malin, D., Swan, R.: Detections, bounds, and timelines: UMass and TDT-3. In: Proceedings of Topic Detection and Tracking Workshop, pp. 167–174. SN (2000) Allan, J., Lavrenko, V., Malin, D., Swan, R.: Detections, bounds, and timelines: UMass and TDT-3. In: Proceedings of Topic Detection and Tracking Workshop, pp. 167–174. SN (2000)
2.
go back to reference Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM (2006) Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM (2006)
3.
go back to reference Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003) MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003) MATH
4.
go back to reference Eckhoff, R., Markus, M., Lassnig, M., Schon, S.: Detecting weak signals with technologies overview of current technology-enhanced approaches for the detection of weak signals. Int. J. Trends Econ. Manag. Technol. (IJTEMT) 3(5) (2014) Eckhoff, R., Markus, M., Lassnig, M., Schon, S.: Detecting weak signals with technologies overview of current technology-enhanced approaches for the detection of weak signals. Int. J. Trends Econ. Manag. Technol. (IJTEMT) 3(5) (2014)
5.
go back to reference Gerrish, S., Blei, D.M.: A language-based approach to measuring scholarly impact. In: ICML, vol. 10, pp. 375–382. Citeseer (2010) Gerrish, S., Blei, D.M.: A language-based approach to measuring scholarly impact. In: ICML, vol. 10, pp. 375–382. Citeseer (2010)
6.
go back to reference Hiltunen, E., et al.: Weak signals in organizational futures learning. Helsinki School of Economics (2010) Hiltunen, E., et al.: Weak signals in organizational futures learning. Helsinki School of Economics (2010)
7.
go back to reference Lau, J.H., Collier, N., Baldwin, T.: On-line trend analysis with topic models: \(\backslash \)# Twitter trends detection topic model online. In: Proceedings of COLING 2012, pp. 1519–1534 (2012) Lau, J.H., Collier, N., Baldwin, T.: On-line trend analysis with topic models: \(\backslash \)# Twitter trends detection topic model online. In: Proceedings of COLING 2012, pp. 1519–1534 (2012)
9.
go back to reference Mannermaa, M.: Heikoista signaaleista vahva tulevaisuus. Wsoy (2004) Mannermaa, M.: Heikoista signaaleista vahva tulevaisuus. Wsoy (2004)
10.
go back to reference Markou, M., Singh, S.: Novelty detection: a review–part 1: statistical approaches. Signal Process. 83(12), 2481–2497 (2003) CrossRef Markou, M., Singh, S.: Novelty detection: a review–part 1: statistical approaches. Signal Process. 83(12), 2481–2497 (2003) CrossRef
11.
go back to reference Marsland, S.: Novelty detection in learning systems. Neural Comput. Surv. 3(2), 157–195 (2003) Marsland, S.: Novelty detection in learning systems. Neural Comput. Surv. 3(2), 157–195 (2003)
12.
go back to reference Metzler, D., Cai, C., Hovy, E.: Structured event retrieval over microblog archives. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 646–655. Association for Computational Linguistics (2012) Metzler, D., Cai, C., Hovy, E.: Structured event retrieval over microblog archives. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 646–655. Association for Computational Linguistics (2012)
13.
go back to reference Murena, P.A., Al-Ghossein, M., Abdessalem, T., Cornuéjols, A.: Adaptive window strategy for topic modeling in document streams. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2018) Murena, P.A., Al-Ghossein, M., Abdessalem, T., Cornuéjols, A.: Adaptive window strategy for topic modeling in document streams. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2018)
14.
go back to reference Ng, K.W., Tsai, F.S., Chen, L., Goh, K.C.: Novelty detection for text documents using named entity recognition. In: 2007 6th International Conference on Information, Communications & Signal Processing, pp. 1–5. IEEE (2007) Ng, K.W., Tsai, F.S., Chen, L., Goh, K.C.: Novelty detection for text documents using named entity recognition. In: 2007 6th International Conference on Information, Communications & Signal Processing, pp. 1–5. IEEE (2007)
15.
go back to reference Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000) CrossRef Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000) CrossRef
16.
go back to reference Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014) CrossRef Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014) CrossRef
17.
go back to reference Ritter, G., Gallegos, M.T.: Outliers in statistical pattern recognition and an application to automatic chromosome classification. Pattern Recogn. Lett. 18(6), 525–539 (1997) CrossRef Ritter, G., Gallegos, M.T.: Outliers in statistical pattern recognition and an application to automatic chromosome classification. Pattern Recogn. Lett. 18(6), 525–539 (1997) CrossRef
18.
go back to reference Suzuki, Y., Fukumoto, F.: Detection of topic and its extrinsic evaluation through multi-document summarization. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 241–246 (2014) Suzuki, Y., Fukumoto, F.: Detection of topic and its extrinsic evaluation through multi-document summarization. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 241–246 (2014)
19.
go back to reference Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: Topicsketch: real-time bursty topic detection from Twitter. IEEE Trans. Knowl. Data Eng. 28(8), 2216–2229 (2016) CrossRef Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: Topicsketch: real-time bursty topic detection from Twitter. IEEE Trans. Knowl. Data Eng. 28(8), 2216–2229 (2016) CrossRef
20.
go back to reference Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic-conditioned novelty detection. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 688–693. ACM (2002) Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic-conditioned novelty detection. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 688–693. ACM (2002)
Metadata
Title
How to Detect Novelty in Textual Data Streams? A Comparative Study of Existing Methods
Authors
Clément Christophe
Julien Velcin
Jairo Cugliari
Philippe Suignard
Manel Boumghar
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-39098-3_9

Premium Partner