Skip to main content
Top

22-02-2019 | Regular Paper

Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity

Authors: Vishnu Unnikrishnan, Christian Beyer, Pawel Matuszyk, Uli Niemann, Rüdiger Pryss, Winfried Schlee, Eirini Ntoutsi, Myra Spiliopoulou

Published in: International Journal of Data Science and Analytics

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Al-qahtani, F.H.: Multivariate k-Nearest Neighbour Regression for Time Series data—a novel Algorithm for Forecasting UK Electricity Demand Multivariate KNN Regression for Time Series. Neural Networks (IJCNN), The 2013 International Joint Conference on pp 228–235 (2013) Al-qahtani, F.H.: Multivariate k-Nearest Neighbour Regression for Time Series data—a novel Algorithm for Forecasting UK Electricity Demand Multivariate KNN Regression for Time Series. Neural Networks (IJCNN), The 2013 International Joint Conference on pp 228–235 (2013)
2.
go back to reference Ban, T., Zhang, R., Pang, S., Sarrafzadeh, A., Inoue, D.: Referential kNN regression for financial time series forecasting. In: Lee, M., Hirose, A., Hou, Z.G., Kil, R.M. (eds.) Neural Information Processing, pp. 601–608. Springer, Heidelberg (2013)CrossRef Ban, T., Zhang, R., Pang, S., Sarrafzadeh, A., Inoue, D.: Referential kNN regression for financial time series forecasting. In: Lee, M., Hirose, A., Hou, Z.G., Kil, R.M. (eds.) Neural Information Processing, pp. 601–608. Springer, Heidelberg (2013)CrossRef
3.
go back to reference Beyer, C., Niemann, U., Unnikrishnan, V., Ntoutsi, E., Spiliopoulou, M.: Predicting document polarities on a stream without reading their contents. In: Proceedings of the Symposium on Applied Computing (SAC) (2018) Beyer, C., Niemann, U., Unnikrishnan, V., Ntoutsi, E., Spiliopoulou, M.: Predicting document polarities on a stream without reading their contents. In: Proceedings of the Symposium on Applied Computing (SAC) (2018)
4.
go back to reference Che, Z., Purushotham, S., Cho, K., Sontag, D., Liu, Y.: Recurrent neural networks for multivariate time series with missing values. Sci. Reports 8(1), 6085 (2018)CrossRef Che, Z., Purushotham, S., Cho, K., Sontag, D., Liu, Y.: Recurrent neural networks for multivariate time series with missing values. Sci. Reports 8(1), 6085 (2018)CrossRef
6.
go back to reference Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environments: a survey. IEEE Comput. Intell. Mag. 10(4), 12–25 (2015)CrossRef Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environments: a survey. IEEE Comput. Intell. Mag. 10(4), 12–25 (2015)CrossRef
7.
go back to reference Dyer, K.B., Capo, R., Polikar, R.: Compose: A semisupervised learning framework for initially labeled nonstationary streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 12–26 (2014)CrossRef Dyer, K.B., Capo, R., Polikar, R.: Compose: A semisupervised learning framework for initially labeled nonstationary streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 12–26 (2014)CrossRef
9.
go back to reference Hiller, W., Goebel, G.: When tinnitus loudness and annoyance are discrepant: audiological characteristics and psychological profile. Audiol. Neurotol. 12(6), 391–400 (2007)CrossRef Hiller, W., Goebel, G.: When tinnitus loudness and annoyance are discrepant: audiological characteristics and psychological profile. Audiol. Neurotol. 12(6), 391–400 (2007)CrossRef
10.
go back to reference Iosifidis, V., Ntoutsi, E.: Large scale sentiment learning with limited labels. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 1823–1832 (2017) Iosifidis, V., Ntoutsi, E.: Large scale sentiment learning with limited labels. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 1823–1832 (2017)
11.
go back to reference Keogh, E.J., Pazzani, M.J.: Scaling up dynamic time warping for datamining applications. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 285–289 (2000) Keogh, E.J., Pazzani, M.J.: Scaling up dynamic time warping for datamining applications. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 285–289 (2000)
12.
go back to reference Kia, A.N., Haratizadeh, S., Shouraki, S.B.: A hybrid supervised semi-supervised graph-based model to predict one-day ahead movement of global stock markets and commodity prices. Expert Syst. Appl. 105, 159–173 (2018)CrossRef Kia, A.N., Haratizadeh, S., Shouraki, S.B.: A hybrid supervised semi-supervised graph-based model to predict one-day ahead movement of global stock markets and commodity prices. Expert Syst. Appl. 105, 159–173 (2018)CrossRef
13.
go back to reference Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)CrossRef Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)CrossRef
14.
go back to reference Krempl, G., Žliobaite, I., Brzeziński, D., Hüllermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., et al.: Open challenges for data stream mining research. ACM SIGKDD Explorations Newslett. 16(1), 1–10 (2014)CrossRef Krempl, G., Žliobaite, I., Brzeziński, D., Hüllermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., et al.: Open challenges for data stream mining research. ACM SIGKDD Explorations Newslett. 16(1), 1–10 (2014)CrossRef
15.
go back to reference Längkvist, M., Karlsson, L., Loutfi, A.: A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recogn. Lett. 42, 11–24 (2014)CrossRef Längkvist, M., Karlsson, L., Loutfi, A.: A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recogn. Lett. 42, 11–24 (2014)CrossRef
16.
go back to reference Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp 1188–1196 (2014) Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp 1188–1196 (2014)
17.
go back to reference Lora, A.T., Santos, J.R., Santos, J.R., Ramos, J.L.M., Expósito, A.G.: Electricity market price forecasting: Neural networks versus weighted-distance k nearest neighbours. In: International Conference on Database and Expert Systems Applications, Springer, pp 321–330 (2002) Lora, A.T., Santos, J.R., Santos, J.R., Ramos, J.L.M., Expósito, A.G.: Electricity market price forecasting: Neural networks versus weighted-distance k nearest neighbours. In: International Conference on Database and Expert Systems Applications, Springer, pp 321–330 (2002)
18.
go back to reference Lora, A.T., Santos, J.M.R., Riquelme, J.C., Expósito, A.G., Ramos, J.L.M.: Time-series prediction: Application to the short-term electric energy demand. Current Topics in Artificial Intelligence pp 577–586 (2004) Lora, A.T., Santos, J.M.R., Riquelme, J.C., Expósito, A.G., Ramos, J.L.M.: Time-series prediction: Application to the short-term electric energy demand. Current Topics in Artificial Intelligence pp 577–586 (2004)
19.
go back to reference Lora, A.T., Santos, J.M.R., Exposito, A.G., Ramos, J.L.M., Santos, J.C.R.: Electricity market price forecasting based on weighted nearest neighbors techniques. IEEE Trans. Power Syst. 22(3), 1294–1301 (2007)CrossRef Lora, A.T., Santos, J.M.R., Exposito, A.G., Ramos, J.L.M., Santos, J.C.R.: Electricity market price forecasting based on weighted nearest neighbors techniques. IEEE Trans. Power Syst. 22(3), 1294–1301 (2007)CrossRef
20.
go back to reference McAuley, J., Yang, A.: Addressing complex and subjective product-related queries with customer reviews. In: Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp 625–635 (2016) McAuley, J., Yang, A.: Addressing complex and subjective product-related queries with customer reviews. In: Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp 625–635 (2016)
21.
go back to reference Polson, N.G., Sokolov, V.O.: Deep learning for short-term traffic flow prediction. Transportation Research Part C: Emerging Technologies 79, (2017)CrossRef Polson, N.G., Sokolov, V.O.: Deep learning for short-term traffic flow prediction. Transportation Research Part C: Emerging Technologies 79, (2017)CrossRef
22.
go back to reference Pryss, R., Probst, T., Schlee, W., Schobel, J., Langguth, B., Neff, P., Spiliopoulou, M., Reichert, M.: Prospective crowdsensing versus retrospective ratings of tinnitus variability and tinnitus-stress associations based on the trackyourtinnitus mobile platform. Int. J. Data Sci. Anal. (2018). https://doi.org/10.1007/s41060-018-0111-4 CrossRef Pryss, R., Probst, T., Schlee, W., Schobel, J., Langguth, B., Neff, P., Spiliopoulou, M., Reichert, M.: Prospective crowdsensing versus retrospective ratings of tinnitus variability and tinnitus-stress associations based on the trackyourtinnitus mobile platform. Int. J. Data Sci. Anal. (2018). https://​doi.​org/​10.​1007/​s41060-018-0111-4 CrossRef
23.
go back to reference Řehůřek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta, pp 45–50 (2010) Řehůřek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta, pp 45–50 (2010)
24.
go back to reference Serrao, E., Spiliopoulou, M.: Active stream learning with an oracle of unknown availability for sentiment prediction. In: 2nd Int. Workshop on Interactive Adaptive Learning (IAL2018) at ECML PKDD 2018, Dublin, Ireland, accepted in July 2018, to appear (2018) Serrao, E., Spiliopoulou, M.: Active stream learning with an oracle of unknown availability for sentiment prediction. In: 2nd Int. Workshop on Interactive Adaptive Learning (IAL2018) at ECML PKDD 2018, Dublin, Ireland, accepted in July 2018, to appear (2018)
25.
go back to reference Suresh, H., Hunt, N., Johnson, A., Celi, L.A., Szolovits, P., Ghassemi, M.: Clinical intervention prediction and understanding with deep neural networks. In: Machine Learning for Healthcare Conference, pp 322–337 (2017) Suresh, H., Hunt, N., Johnson, A., Celi, L.A., Szolovits, P., Ghassemi, M.: Clinical intervention prediction and understanding with deep neural networks. In: Machine Learning for Healthcare Conference, pp 322–337 (2017)
26.
go back to reference Troncoso Lora, A., Riquelme, J.C., Martínez Ramos, J.L., Riquelme Santos, J.M., Gómez Expósito, A.: Influence of kNN-based load forecasting errors on optimal energy production. In: Pires, F.M., Abreu, S. (eds.) Progress in Artificial Intelligence, pp. 189–203. Springer, Heidelberg (2003)CrossRef Troncoso Lora, A., Riquelme, J.C., Martínez Ramos, J.L., Riquelme Santos, J.M., Gómez Expósito, A.: Influence of kNN-based load forecasting errors on optimal energy production. In: Pires, F.M., Abreu, S. (eds.) Progress in Artificial Intelligence, pp. 189–203. Springer, Heidelberg (2003)CrossRef
27.
go back to reference Wagner, T., Guha, S., Kasiviswanathan, S.P., Mishra, N.: Semi-supervised learning on data streams via temporal label propagation. In: International Conference on Machine Learning, pp 5082–5091 (2018) Wagner, T., Guha, S., Kasiviswanathan, S.P., Mishra, N.: Semi-supervised learning on data streams via temporal label propagation. In: International Conference on Machine Learning, pp 5082–5091 (2018)
28.
29.
go back to reference Zhang, J., Zheng, Y., Qi, D.: Deep spatio-temporal residual networks for citywide crowd flows prediction. In: AAAI, pp 1655–1661 (2017) Zhang, J., Zheng, Y., Qi, D.: Deep spatio-temporal residual networks for citywide crowd flows prediction. In: AAAI, pp 1655–1661 (2017)
Metadata
Title
Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity
Authors
Vishnu Unnikrishnan
Christian Beyer
Pawel Matuszyk
Uli Niemann
Rüdiger Pryss
Winfried Schlee
Eirini Ntoutsi
Myra Spiliopoulou
Publication date
22-02-2019
Publisher
Springer International Publishing
Published in
International Journal of Data Science and Analytics
Print ISSN: 2364-415X
Electronic ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-019-00177-1

Premium Partner