Skip to main content
Erschienen in: Cognitive Computation 3/2019

10.04.2019

PAS3-HSID: a Dynamic Bio-Inspired Approach for Real-Time Hot Spot Identification in Data Streams

verfasst von: Rebecca Tickle, Isaac Triguero, Grazziela P. Figueredo, Mohammad Mesgarpour, Robert I. John

Erschienen in: Cognitive Computation | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Hot spot identification is a very relevant problem in a wide variety of areas such as health care, energy or transportation. A hot spot is defined as a region of high likelihood of occurrence of a particular event. To identify hot spots, location data for those events is required, which is typically collected by telematics devices. These sensors are constantly gathering information, generating very large volumes of data. Current state-of-the-art solutions are capable of identifying hot spots from big static batches of data by means of variations of clustering or instance selection techniques that pre-process the original input data, providing the most relevant locations. However, these approaches neglect to address changes in hot spots over time. This paper presents a dynamic bio-inspired approach to detect hot spots in big data streams. This computational intelligence method is designed and applied to the transportation sector as a case study to identify incidents in the roads caused by heavy goods vehicles. We adapt an immune-based algorithm to account for the temporary aspect of hot spots inspired by the idea of pheromones, which is then subsequently implemented using Apache Spark Streaming. Experimental results on real datasets with up to 4.5 million data points—provided by a telematics company—show that the algorithm is capable of quickly processing large streaming batches of data, as well as successfully adapting over time to detect hot spots. The outcome of this method is twofold, both reducing data storage requirements and demonstrating resilience to sudden changes in the input data (concept drift).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alpaydin E. Introduction to machine learning. Cambridge: The MIT Press; 2014. Alpaydin E. Introduction to machine learning. Cambridge: The MIT Press; 2014.
2.
Zurück zum Zitat Anderson TK. Kernel density estimation and k-means clustering to profile road accident hotspots. Accid Anal Prev 2009;41(3):359–64.CrossRefPubMed Anderson TK. Kernel density estimation and k-means clustering to profile road accident hotspots. Accid Anal Prev 2009;41(3):359–64.CrossRefPubMed
3.
Zurück zum Zitat Barros RSM, Santos SGTC. A large-scale comparison of concept drift detectors. Inf Sci 2018;451-452:348–70.CrossRef Barros RSM, Santos SGTC. A large-scale comparison of concept drift detectors. Inf Sci 2018;451-452:348–70.CrossRef
4.
Zurück zum Zitat Beringer J, Hüllermeier E. Efficient instance-based learning on data streams. Intelligent Data Analysis 2007; 11(6):627–50.CrossRef Beringer J, Hüllermeier E. Efficient instance-based learning on data streams. Intelligent Data Analysis 2007; 11(6):627–50.CrossRef
5.
Zurück zum Zitat Braithwaite A, Li Q. Transnational terrorism hot spots: identification and impact evaluation. Conflict Management and Peace Science 2007;24(4):281–96.CrossRef Braithwaite A, Li Q. Transnational terrorism hot spots: identification and impact evaluation. Conflict Management and Peace Science 2007;24(4):281–96.CrossRef
6.
Zurück zum Zitat Cambria E, Chattopadhyay A, Linn E, Mandal B, White B. Storages are not forever. Cogn Comput 2017;9(5):646–58.CrossRef Cambria E, Chattopadhyay A, Linn E, Mandal B, White B. Storages are not forever. Cogn Comput 2017;9(5):646–58.CrossRef
7.
Zurück zum Zitat Cheng W, Washington SP. Experimental evaluation of hotspot identification methods. Accid Anal Prev 2005;37(5):870–81.CrossRefPubMed Cheng W, Washington SP. Experimental evaluation of hotspot identification methods. Accid Anal Prev 2005;37(5):870–81.CrossRefPubMed
8.
Zurück zum Zitat Chu F, Zaniolo C. Fast and light boosting for adaptive mining of data streams. Advances in Knowledge Discovery and Data Mining, p 282–92. In: Dai H, Srikant R, and Zhang C, editors; 2004. Chu F, Zaniolo C. Fast and light boosting for adaptive mining of data streams. Advances in Knowledge Discovery and Data Mining, p 282–92. In: Dai H, Srikant R, and Zhang C, editors; 2004.
9.
Zurück zum Zitat Dean J, Ghemawat S. MapReduce: a flexible data processing tool. Commun ACM 2010;53(1):72–7.CrossRef Dean J, Ghemawat S. MapReduce: a flexible data processing tool. Commun ACM 2010;53(1):72–7.CrossRef
10.
Zurück zum Zitat Ding S, Zhang J, Jia H, Qian J. An adaptive density data stream clustering algorithm. Cogn Comput 2016;8(1):30–8.CrossRef Ding S, Zhang J, Jia H, Qian J. An adaptive density data stream clustering algorithm. Cogn Comput 2016;8(1):30–8.CrossRef
11.
Zurück zum Zitat Dorigo M, Di Caro G. Ant colony optimization: a new meta-heuristic. Proceedings of the 1999 congress on evolutionary computation, 1999. IEEE; 1999. p. 1470–7. Dorigo M, Di Caro G. Ant colony optimization: a new meta-heuristic. Proceedings of the 1999 congress on evolutionary computation, 1999. IEEE; 1999. p. 1470–7.
12.
Zurück zum Zitat Dorigo M, Maniezzo V, Colorni A. Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern B (Cybernetics) 1996;26(1):29–41.CrossRef Dorigo M, Maniezzo V, Colorni A. Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern B (Cybernetics) 1996;26(1):29–41.CrossRef
13.
Zurück zum Zitat Elen B, Peters J, van Poppel M, Bleux N, Theunis J, Reggente M, Standaert A. The Aeroflex: a bicycle for mobile air quality measurements. Sensors (Switzerland) 2013;13(1):221–40.CrossRef Elen B, Peters J, van Poppel M, Bleux N, Theunis J, Reggente M, Standaert A. The Aeroflex: a bicycle for mobile air quality measurements. Sensors (Switzerland) 2013;13(1):221–40.CrossRef
14.
Zurück zum Zitat Ester M, Kriegel HP, Sander J, Xu X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd; 1996. p. 226–31. Ester M, Kriegel HP, Sander J, Xu X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd; 1996. p. 226–31.
15.
Zurück zum Zitat Figueredo GP, Ebecken NFF, Augusto DA, Barbosa HJC. An immune-inspired instance selection mechanism for supervised classification. Memetic Computing 2012;4:135–47.CrossRef Figueredo GP, Ebecken NFF, Augusto DA, Barbosa HJC. An immune-inspired instance selection mechanism for supervised classification. Memetic Computing 2012;4:135–47.CrossRef
16.
Zurück zum Zitat Figueredo GP, Ebecken NFF, Barbosa HJC. The SUPRAIC algorithm: a suppression immune based mechanism to find a representative training set in data classification tasks. ICARIS, Lecture notes in computer science. Berlin: Springer; 2007. p. 59–70. Figueredo GP, Ebecken NFF, Barbosa HJC. The SUPRAIC algorithm: a suppression immune based mechanism to find a representative training set in data classification tasks. ICARIS, Lecture notes in computer science. Berlin: Springer; 2007. p. 59–70.
17.
Zurück zum Zitat Figueredo GP, Triguero I, Mesgarpour M, Guerra AM, Garibaldi JM, John RI. An immune-inspired technique to identify heavy goods vehicles incident hot spots. IEEE Transactions on Emerging Topics in Computational Intelligence 2017;1(4):248–58.CrossRef Figueredo GP, Triguero I, Mesgarpour M, Guerra AM, Garibaldi JM, John RI. An immune-inspired technique to identify heavy goods vehicles incident hot spots. IEEE Transactions on Emerging Topics in Computational Intelligence 2017;1(4):248–58.CrossRef
18.
Zurück zum Zitat Gama J. Knowledge discovery from data streams, 1st ed. Boca Raton: Chapman & hall/CRC; 2010.CrossRef Gama J. Knowledge discovery from data streams, 1st ed. Boca Raton: Chapman & hall/CRC; 2010.CrossRef
19.
Zurück zum Zitat García S, Derrac J, Cano J, Herrera F. Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 2012;34(3):417–35.CrossRefPubMed García S, Derrac J, Cano J, Herrera F. Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 2012;34(3):417–35.CrossRefPubMed
20.
Zurück zum Zitat García S, Luengo J, Herrera F. Data preprocessing in data mining. Berlin: Springer Publishing Company, Incorporated; 2014. García S, Luengo J, Herrera F. Data preprocessing in data mining. Berlin: Springer Publishing Company, Incorporated; 2014.
21.
Zurück zum Zitat Han J, Kamber M, Tung AKH. Spatial clustering methods in data mining: a survey. In: Miller HJ and Han J, editors. Milton Park: Taylor and Francis; 2001. Han J, Kamber M, Tung AKH. Spatial clustering methods in data mining: a survey. In: Miller HJ and Han J, editors. Milton Park: Taylor and Francis; 2001.
22.
Zurück zum Zitat Han J, Pei J, Kamber M. Data mining: concepts and techniques. Amsterdam: Elsevier; 2011. Han J, Pei J, Kamber M. Data mining: concepts and techniques. Amsterdam: Elsevier; 2011.
24.
Zurück zum Zitat Hulten G, Spencer L, Domingos P. Mining time-changing data streams. Proceedings of the Seventh ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’01. New York: ACM; 2001. p. 97–106. Hulten G, Spencer L, Domingos P. Mining time-changing data streams. Proceedings of the Seventh ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’01. New York: ACM; 2001. p. 97–106.
25.
Zurück zum Zitat Klinkenberg R. Learning drifting concepts: example selection vs. example weighting. Intelligent Data Analysis 2004;8(3):281–300.CrossRef Klinkenberg R. Learning drifting concepts: example selection vs. example weighting. Intelligent Data Analysis 2004;8(3):281–300.CrossRef
26.
Zurück zum Zitat Krawczyk B. Active and adaptive ensemble learning for online activity recognition from data streams. Knowl-Based Syst 2017;138:69–78.CrossRef Krawczyk B. Active and adaptive ensemble learning for online activity recognition from data streams. Knowl-Based Syst 2017;138:69–78.CrossRef
27.
Zurück zum Zitat Krawczyk B, Cano A. Online ensemble learning with abstaining classifiers for drifting and noisy data streams. Appl Soft Comput 2018;68:677–92.CrossRef Krawczyk B, Cano A. Online ensemble learning with abstaining classifiers for drifting and noisy data streams. Appl Soft Comput 2018;68:677–92.CrossRef
28.
Zurück zum Zitat Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M. Ensemble learning for data stream analysis: a survey. Information Fusion 2017;37:132–56.CrossRef Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M. Ensemble learning for data stream analysis: a survey. Information Fusion 2017;37:132–56.CrossRef
29.
Zurück zum Zitat Krempl G, žliobaite I, Brzeziński D, Hüllermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J. Open challenges for data stream mining research. SIGKDD Explor Newsl 2014;16(1):1–0.CrossRef Krempl G, žliobaite I, Brzeziński D, Hüllermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J. Open challenges for data stream mining research. SIGKDD Explor Newsl 2014;16(1):1–0.CrossRef
30.
Zurück zum Zitat Mesgarpour M, Landa-Silva D, Dickinson I. Overview of telematics-based prognostics and health management systems for commercial vehicles. Activities of Transport Telematics 2013;395:123–30.CrossRef Mesgarpour M, Landa-Silva D, Dickinson I. Overview of telematics-based prognostics and health management systems for commercial vehicles. Activities of Transport Telematics 2013;395:123–30.CrossRef
32.
Zurück zum Zitat Montella A. A comparative analysis of hotspot identification methods. Accid Anal Prev 2010;42(2):571–81.CrossRefPubMed Montella A. A comparative analysis of hotspot identification methods. Accid Anal Prev 2010;42(2):571–81.CrossRefPubMed
33.
Zurück zum Zitat Passini MLC, Estébanez KB, Figueredo GP, Ebecken NFF. A strategy for training set selection in text classification problems. Int J Adv Comput Sci Appl 2013;4(6):54–60. Passini MLC, Estébanez KB, Figueredo GP, Ebecken NFF. A strategy for training set selection in text classification problems. Int J Adv Comput Sci Appl 2013;4(6):54–60.
34.
Zurück zum Zitat Perallos A, Hernandez-Jayo U, Onieva E, García-zuazola IJ. Intelligent transport systems: technologies and applications, 1st ed. Hoboken: Wiley Publishing; 2015.CrossRef Perallos A, Hernandez-Jayo U, Onieva E, García-zuazola IJ. Intelligent transport systems: technologies and applications, 1st ed. Hoboken: Wiley Publishing; 2015.CrossRef
35.
Zurück zum Zitat Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F. A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 2017;239:39–57.CrossRef Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F. A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 2017;239:39–57.CrossRef
36.
Zurück zum Zitat Shen YY, Liu CL. Incremental adaptive learning vector quantization for character recognition with continuous style adaptation. Cogn Comput 2018;10(2):334–46.CrossRef Shen YY, Liu CL. Incremental adaptive learning vector quantization for character recognition with continuous style adaptation. Cogn Comput 2018;10(2):334–46.CrossRef
37.
Zurück zum Zitat Shirkhorshidi AS, Aghabozorgi S, Wah TY, Herawan T. 2014. Big data clustering: a review. In: International conference on computational science and its applications, Springer; p. 707–20. Shirkhorshidi AS, Aghabozorgi S, Wah TY, Herawan T. 2014. Big data clustering: a review. In: International conference on computational science and its applications, Springer; p. 707–20.
38.
Zurück zum Zitat Siddique N, Adeli H. Nature inspired computing: an overview and some future directions. Cogn Comput 2015;7 (6):706–14.CrossRef Siddique N, Adeli H. Nature inspired computing: an overview and some future directions. Cogn Comput 2015;7 (6):706–14.CrossRef
39.
Zurück zum Zitat Sousa R, Gama J. Multi-label classification from high-speed data streams with adaptive model rules and random rules. Progress in Artificial Intelligence 2018;7(3):177–87.CrossRef Sousa R, Gama J. Multi-label classification from high-speed data streams with adaptive model rules and random rules. Progress in Artificial Intelligence 2018;7(3):177–87.CrossRef
40.
Zurück zum Zitat Street WN, Kim Y. 2001. A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’01, p. 377–82. Street WN, Kim Y. 2001. A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’01, p. 377–82.
41.
Zurück zum Zitat Triguero I, Figueredo GP, Mesgarpour M, Garibaldi JM, John RI. 2017. Vehicle incident hot spots identification: an approach for big data. In: 2017 IEEE Trustcom/bigdataSE/ICESS, p. 901–8. Triguero I, Figueredo GP, Mesgarpour M, Garibaldi JM, John RI. 2017. Vehicle incident hot spots identification: an approach for big data. In: 2017 IEEE Trustcom/bigdataSE/ICESS, p. 901–8.
42.
Zurück zum Zitat Van Brummelen G. Heavenly mathematics: the forgotten art of spherical trigonometry. Princeton: Princeton University Press; 2012.CrossRef Van Brummelen G. Heavenly mathematics: the forgotten art of spherical trigonometry. Princeton: Princeton University Press; 2012.CrossRef
43.
Zurück zum Zitat Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I. 2012. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, NSDI’12, p. 15–28. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I. 2012. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, NSDI’12, p. 15–28.
44.
Zurück zum Zitat Zaharia M, Das T, Li H, Shenker S, Stoica I. 2012. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: Proceedings of the 4th USENIX conference on hot topics in cloud computing, p. 10–0. Zaharia M, Das T, Li H, Shenker S, Stoica I. 2012. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: Proceedings of the 4th USENIX conference on hot topics in cloud computing, p. 10–0.
45.
Zurück zum Zitat Zhao L, Wang L, Xu Q. Data stream classification with artificial endocrine system. Appl Intell 2012;37 (3):390–404.CrossRef Zhao L, Wang L, Xu Q. Data stream classification with artificial endocrine system. Appl Intell 2012;37 (3):390–404.CrossRef
Metadaten
Titel
PAS3-HSID: a Dynamic Bio-Inspired Approach for Real-Time Hot Spot Identification in Data Streams
verfasst von
Rebecca Tickle
Isaac Triguero
Grazziela P. Figueredo
Mohammad Mesgarpour
Robert I. John
Publikationsdatum
10.04.2019
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 3/2019
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-019-09638-y

Weitere Artikel der Ausgabe 3/2019

Cognitive Computation 3/2019 Zur Ausgabe

Premium Partner