Skip to main content
Erschienen in:
Buchtitelbild

2017 | OriginalPaper | Buchkapitel

Lightweight Metric Computation for Distributed Massive Data Streams

verfasst von : Emmanuelle Anceaume, Yann Busnel

Erschienen in: Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXIII

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The real time analysis of massive data streams is of utmost importance in data intensive applications that need to detect as fast as possible and as efficiently as possible (in terms of computation and memory space) any correlation between its inputs or any deviance from some expected nominal behavior. The IoT infrastructure can be used for monitoring any events or changes in structural conditions that can compromise safety and increase risk. It is thus a recurrent and crucial issue to determine whether huge data streams, received at monitored devices, are correlated or not as it may reveal the presence of attacks. We propose a metric, called codeviation, that allows to evaluate the correlation between distributed massive streams. This metric is inspired from classical metric in statistics and probability theory, and as such enables to understand how observed quantities change together, and in which proportion. We then propose to estimate the codeviation in the data stream model. In this model, functions are estimated on a huge sequence of data items, in an online fashion, and with a very small amount of memory with respect to both the size of the input stream and the values domain from which data items are drawn. We then generalize our approach by presenting a new metric, the Sketch-\(\star \) metric, which allows us to define a distance between updatable summaries of large data streams. An important feature of the Sketch-\(\star \) metric is that, given a measure on the entire initial data streams, the Sketch-\(\star \) metric preserves the axioms of the latter measure on the sketch. We finally present results obtained during extensive experiments conducted on both synthetic traces and real data sets allowing us to validate the robustness and accuracy of our metrics.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Lakhina, A., Crovella, M., Diot, C.: Mining anomalies using traffic feature distributions. In: Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM) (2005) Lakhina, A., Crovella, M., Diot, C.: Mining anomalies using traffic feature distributions. In: Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM) (2005)
2.
Zurück zum Zitat Qiu, T., Ge, Z., Pei, D., Wang, J., Xu, J.: What happened in my network: mining network events from router syslogs. In: Proceedings of the 10th ACM Conference on Internet Measurement (IMC) (2010) Qiu, T., Ge, Z., Pei, D., Wang, J., Xu, J.: What happened in my network: mining network events from router syslogs. In: Proceedings of the 10th ACM Conference on Internet Measurement (IMC) (2010)
3.
Zurück zum Zitat Yeung, D.S.: Covariance-matrix modeling and detecting various flooding attacks. IEEE Trans. Syst. Man Cybernet. Part A 37(2), 157–169 (2007)CrossRef Yeung, D.S.: Covariance-matrix modeling and detecting various flooding attacks. IEEE Trans. Syst. Man Cybernet. Part A 37(2), 157–169 (2007)CrossRef
4.
Zurück zum Zitat Zhu, Y., Fu, X., Graham, B., Bettati, R., Zhao, W.: On flow correlation attacks and countermeasures in mix networks. In: Martin, D., Serjantov, A. (eds.) PET 2004. LNCS, vol. 3424, pp. 207–225. Springer, Heidelberg (2005). doi:10.1007/11423409_13 CrossRef Zhu, Y., Fu, X., Graham, B., Bettati, R., Zhao, W.: On flow correlation attacks and countermeasures in mix networks. In: Martin, D., Serjantov, A. (eds.) PET 2004. LNCS, vol. 3424, pp. 207–225. Springer, Heidelberg (2005). doi:10.​1007/​11423409_​13 CrossRef
5.
Zurück zum Zitat Ganguly, S., Garafalakis, M., Rastogi, R., Sabnani, K.: Streaming algorithms for robust, real-time detection of ddos attacks. In: Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS) (2007) Ganguly, S., Garafalakis, M., Rastogi, R., Sabnani, K.: Streaming algorithms for robust, real-time detection of ddos attacks. In: Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS) (2007)
6.
Zurück zum Zitat Jin, S., Yeung, D.: A covariance analysis model for ddos attack detection. In: 4th IEEE International Conference on Communications (ICC), vol. 4, pp. 1882–1886 (2004) Jin, S., Yeung, D.: A covariance analysis model for ddos attack detection. In: 4th IEEE International Conference on Communications (ICC), vol. 4, pp. 1882–1886 (2004)
7.
Zurück zum Zitat Pinarer, O., Gripay, Y., Servigne, S., Ozgovde, A.: Energy enhancement of multi-application monitoring systems for smart buildings. In: Krogstie, J., Mouratidis, H., Su, J. (eds.) CAiSE 2016. LNBIP, vol. 249, pp. 131–142. Springer, Cham (2016). doi:10.1007/978-3-319-39564-7_14 Pinarer, O., Gripay, Y., Servigne, S., Ozgovde, A.: Energy enhancement of multi-application monitoring systems for smart buildings. In: Krogstie, J., Mouratidis, H., Su, J. (eds.) CAiSE 2016. LNBIP, vol. 249, pp. 131–142. Springer, Cham (2016). doi:10.​1007/​978-3-319-39564-7_​14
8.
Zurück zum Zitat Boubrima, A., Matigot, F., Bechkit, W., Rivano, H., Ruas, A.: Optimal deployment of wireless sensor networks for air pollution monitoring. In: 24th International Conference on Computer Communication and Networks (ICCCN), Las Vegas, USA, August 2015 Boubrima, A., Matigot, F., Bechkit, W., Rivano, H., Ruas, A.: Optimal deployment of wireless sensor networks for air pollution monitoring. In: 24th International Conference on Computer Communication and Networks (ICCCN), Las Vegas, USA, August 2015
9.
Zurück zum Zitat Stankovic, J.A.: Research directions for the internet of things. IEEE Internet Things J. 1(1), 3–9 (2014)CrossRef Stankovic, J.A.: Research directions for the internet of things. IEEE Internet Things J. 1(1), 3–9 (2014)CrossRef
10.
Zurück zum Zitat Anceaume, E., Busnel, Y., Gambs, S.: Uniform and ergodic sampling in unstructured peer-to-peer systems with malicious nodes. In: Lu, C., Masuzawa, T., Mosbah, M. (eds.) OPODIS 2010. LNCS, vol. 6490, pp. 64–78. Springer, Heidelberg (2010). doi:10.1007/978-3-642-17653-1_5 CrossRef Anceaume, E., Busnel, Y., Gambs, S.: Uniform and ergodic sampling in unstructured peer-to-peer systems with malicious nodes. In: Lu, C., Masuzawa, T., Mosbah, M. (eds.) OPODIS 2010. LNCS, vol. 6490, pp. 64–78. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-17653-1_​5 CrossRef
11.
Zurück zum Zitat Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002). doi:10.1007/3-540-45726-7_1 CrossRef Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002). doi:10.​1007/​3-540-45726-7_​1 CrossRef
12.
Zurück zum Zitat Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)MathSciNetCrossRefMATH Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)MathSciNetCrossRefMATH
13.
Zurück zum Zitat Kane, D.M., Nelson, J., Woodruff, D.P.: An optimal algorithm for the distinct element problem. In: Proceedings of the Symposium on Principles of Databases (PODS) (2010) Kane, D.M., Nelson, J., Woodruff, D.P.: An optimal algorithm for the distinct element problem. In: Proceedings of the Symposium on Principles of Databases (PODS) (2010)
14.
Zurück zum Zitat Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing (STOC), pp. 20–29 (1996) Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing (STOC), pp. 20–29 (1996)
15.
16.
Zurück zum Zitat Chakrabarti, A., Cormode, G., McGregor, A.: A near-optimal algorithm for computing the entropy of a stream. In. ACM-SIAM Symposium on Discrete Algorithms, pp. 328–335 (2007) Chakrabarti, A., Cormode, G., McGregor, A.: A near-optimal algorithm for computing the entropy of a stream. In. ACM-SIAM Symposium on Discrete Algorithms, pp. 328–335 (2007)
17.
Zurück zum Zitat Lall, A., Sekar, V., Ogihara, M., Xu, J., Zhang, H.: Data streaming algorithms for estimating entropy of network traffic. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). ACM (2006) Lall, A., Sekar, V., Ogihara, M., Xu, J., Zhang, H.: Data streaming algorithms for estimating entropy of network traffic. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). ACM (2006)
18.
Zurück zum Zitat Anceaume, E., Busnel, Y., Gambs, S.: On the power of the adversary to solve the node sampling problem. Trans. Large-Scale Data Knowl. Centered Syst. (TLDKS) 11, 102–126 (2013) Anceaume, E., Busnel, Y., Gambs, S.: On the power of the adversary to solve the node sampling problem. Trans. Large-Scale Data Knowl. Centered Syst. (TLDKS) 11, 102–126 (2013)
19.
Zurück zum Zitat Anceaume, E., Busnel, Y.: An information divergence estimation over data streams. In: Proceedings of the 11th IEEE International Symposium on Network Computing and Applications (NCA) (2012) Anceaume, E., Busnel, Y.: An information divergence estimation over data streams. In: Proceedings of the 11th IEEE International Symposium on Network Computing and Applications (NCA) (2012)
20.
Zurück zum Zitat Chakrabarti, A., Ba, K., Muthukrishnan, S.: Estimating entropy and entropy norm on data streams. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 196–205. Springer, Heidelberg (2006). doi:10.1007/11672142_15 CrossRef Chakrabarti, A., Ba, K., Muthukrishnan, S.: Estimating entropy and entropy norm on data streams. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 196–205. Springer, Heidelberg (2006). doi:10.​1007/​11672142_​15 CrossRef
21.
Zurück zum Zitat Guha, S., McGregor, A., Venkatasubramanian, S.: Streaming and sublinear approximation of entropy and information distances. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 733–742 (2006) Guha, S., McGregor, A., Venkatasubramanian, S.: Streaming and sublinear approximation of entropy and information distances. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 733–742 (2006)
22.
Zurück zum Zitat Rivetti, N., Busnel, Y., Querzoni, L.: Load-aware shedding in stream processing systems. In: Proceedings of the 10th ACM International Conference on Distributed Event-Based Systems (DEBS), Ivine, CA, USA, June 2016 Rivetti, N., Busnel, Y., Querzoni, L.: Load-aware shedding in stream processing systems. In: Proceedings of the 10th ACM International Conference on Distributed Event-Based Systems (DEBS), Ivine, CA, USA, June 2016
23.
Zurück zum Zitat Rivetti, N., Anceaume, E., Busnel, Y., Querzoni, L., Sericola, B.: Online scheduling for shuffle grouping in distributed stream processing systems. In: Proceedings of the 17th ACM/IFIP/USENIX 13th International Conference on Middleware (Middleware), Trento, Italie, December 2016 Rivetti, N., Anceaume, E., Busnel, Y., Querzoni, L., Sericola, B.: Online scheduling for shuffle grouping in distributed stream processing systems. In: Proceedings of the 17th ACM/IFIP/USENIX 13th International Conference on Middleware (Middleware), Trento, Italie, December 2016
24.
25.
Zurück zum Zitat Cormode, G., Garofalakis, M.: Sketching probabilistic data streams. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 281–292 (2007) Cormode, G., Garofalakis, M.: Sketching probabilistic data streams. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 281–292 (2007)
26.
Zurück zum Zitat Guha, S., Indyk, P., Mcgregor, A.: Sketching information divergences. Mach. Learn. 72(1–2), 5–19 (2008)CrossRefMATH Guha, S., Indyk, P., Mcgregor, A.: Sketching information divergences. Mach. Learn. 72(1–2), 5–19 (2008)CrossRefMATH
27.
Zurück zum Zitat Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. In: Proceedings of the 19th Annual ACM-SIAM Symposium On Discrete Algorithms (SODA) (2008) Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. In: Proceedings of the 19th Annual ACM-SIAM Symposium On Discrete Algorithms (SODA) (2008)
28.
Zurück zum Zitat Arackaparambil, C., Brody, J., Chakrabarti, A.: Functional monitoring without monotonicity. In: Proceedings of the 36th ACM International Colloquium on Automata, Languages and Programming (ICALP) (2009) Arackaparambil, C., Brody, J., Chakrabarti, A.: Functional monitoring without monotonicity. In: Proceedings of the 36th ACM International Colloquium on Automata, Languages and Programming (ICALP) (2009)
29.
Zurück zum Zitat Gibbons, P.B., Tirthapura, S.: Estimating simple functions on the union of data streams. In: Proceedings of the Thirteenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 281–291 (2001) Gibbons, P.B., Tirthapura, S.: Estimating simple functions on the union of data streams. In: Proceedings of the Thirteenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 281–291 (2001)
30.
Zurück zum Zitat Haung, Z., Yi, K., Zhang, Q.: Randomized algorithms for tracking distributed count, frequencies and ranks. In: Proceedings of 31st ACM Symposium on Principles of Database Systems (PODS) (2012) Haung, Z., Yi, K., Zhang, Q.: Randomized algorithms for tracking distributed count, frequencies and ranks. In: Proceedings of 31st ACM Symposium on Principles of Database Systems (PODS) (2012)
31.
Zurück zum Zitat Liu, Z., Radunović, B., Vojnovic, M.: Continuous distributed counting for non-monotonic streams. In: Proceedings of 31st ACM Symposium on Principles of Database Systems (PODS) (2012) Liu, Z., Radunović, B., Vojnovic, M.: Continuous distributed counting for non-monotonic streams. In: Proceedings of 31st ACM Symposium on Principles of Database Systems (PODS) (2012)
32.
Zurück zum Zitat Yuan, J., Mills, K.: Monitoring the macroscopic effect of DDoS flooding attacks. IEEE Trans. Dependable Secure Comput. 2(4), 324–335 (2005)CrossRef Yuan, J., Mills, K.: Monitoring the macroscopic effect of DDoS flooding attacks. IEEE Trans. Dependable Secure Comput. 2(4), 324–335 (2005)CrossRef
33.
Zurück zum Zitat Basseville, M., Cardoso, J.F.: On entropies, divergences, and mean values. In: Proceedings of the IEEE International Symposium on Information Theory (1995) Basseville, M., Cardoso, J.F.: On entropies, divergences, and mean values. In: Proceedings of the IEEE International Symposium on Information Theory (1995)
34.
Zurück zum Zitat Ali, S.M., Silvey, S.D.: General class of coefficients of divergence of one distribution from another. J. Roy. Stat. Soc. Ser. B (Methodological) 28(1), 131–142 (1966)MathSciNetMATH Ali, S.M., Silvey, S.D.: General class of coefficients of divergence of one distribution from another. J. Roy. Stat. Soc. Ser. B (Methodological) 28(1), 131–142 (1966)MathSciNetMATH
35.
Zurück zum Zitat Csiszár, I.: Information measures: a critical survey. In: Transactions of the Seventh Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, Dordrecht, D. Riedel, pp. 73–86 (1978) Csiszár, I.: Information measures: a critical survey. In: Transactions of the Seventh Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, Dordrecht, D. Riedel, pp. 73–86 (1978)
36.
Zurück zum Zitat Morimoto, T.: Markov processes and the \(h\)-theorem. J. Phys. Soc. Jpn. 18(3), 328–331 (1963)CrossRefMATH Morimoto, T.: Markov processes and the \(h\)-theorem. J. Phys. Soc. Jpn. 18(3), 328–331 (1963)CrossRefMATH
38.
Zurück zum Zitat Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)MathSciNetMATH Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)MathSciNetMATH
39.
Zurück zum Zitat Muthukrishnan, S.: Data Streams: Algorithms and Applications. Now Publishers Inc., Hanover (2005)MATH Muthukrishnan, S.: Data Streams: Algorithms and Applications. Now Publishers Inc., Hanover (2005)MATH
40.
Zurück zum Zitat Anceaume, E., Busnel, Y., Rivetti, N.: Estimating the frequency of data items in massive distributed streams. In: Proceedings of the 4th IEEE Symposium on Network Cloud Computing and Applications (NCCA), pp. 59–66 (2015) Anceaume, E., Busnel, Y., Rivetti, N.: Estimating the frequency of data items in massive distributed streams. In: Proceedings of the 4th IEEE Symposium on Network Cloud Computing and Applications (NCCA), pp. 59–66 (2015)
41.
Zurück zum Zitat Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)MathSciNetCrossRefMATH Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)MathSciNetCrossRefMATH
43.
Zurück zum Zitat Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)MathSciNetCrossRefMATH Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)MathSciNetCrossRefMATH
44.
Zurück zum Zitat Hellinger, E.: Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen. J. Reine Angew. Math. 136, 210–271 (1909)MathSciNetMATH Hellinger, E.: Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen. J. Reine Angew. Math. 136, 210–271 (1909)MathSciNetMATH
45.
Zurück zum Zitat Csiszár, I.: Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems. Ann. Stat. 19(4), 2032–2066 (1991)MathSciNetCrossRefMATH Csiszár, I.: Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems. Ann. Stat. 19(4), 2032–2066 (1991)MathSciNetCrossRefMATH
46.
Zurück zum Zitat Amari, S.I., Cichocki, A.: Information geometry of divergence functions. Bull. Pol. Acad. Sci. Techn. Sci. 58(1), 183–195 (2010) Amari, S.I., Cichocki, A.: Information geometry of divergence functions. Bull. Pol. Acad. Sci. Techn. Sci. 58(1), 183–195 (2010)
47.
Zurück zum Zitat Amari, S.I.: \(\alpha \)-divergence is unique, belonging to both \(f\)-divergence and bregman divergence classes. IEEE Trans. Inf. Theor. 55(11), 4925–4931 (2009)MathSciNetCrossRef Amari, S.I.: \(\alpha \)-divergence is unique, belonging to both \(f\)-divergence and bregman divergence classes. IEEE Trans. Inf. Theor. 55(11), 4925–4931 (2009)MathSciNetCrossRef
48.
Zurück zum Zitat Renyi, A.: On measures of information and entropy. In: Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability, pp. 547–561 (1960) Renyi, A.: On measures of information and entropy. In: Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability, pp. 547–561 (1960)
Metadaten
Titel
Lightweight Metric Computation for Distributed Massive Data Streams
verfasst von
Emmanuelle Anceaume
Yann Busnel
Copyright-Jahr
2017
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-55696-2_1