Skip to main content
Erschienen in: The VLDB Journal 2-3/2020

14.08.2019 | Special Issue Paper

Scalable algorithms for signal reconstruction by leveraging similarity joins

verfasst von: Abolfazl Asudeh, Jees Augustine, Azade Nazi, Saravanan Thirumuruganathan, Nan Zhang, Gautam Das, Divesh Srivastava

Erschienen in: The VLDB Journal | Ausgabe 2-3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Signal reconstruction problem (SRP) is an important optimization problem where the objective is to identify a solution to an underdetermined system of linear equations that is closest to a given prior. It has a substantial number of applications in diverse areas including network traffic engineering, medical image reconstruction, acoustics, astronomy and many more. Most common approaches for SRP do not scale to large problem sizes. In this paper, we propose multiple optimization steps, developing scalable algorithms for the problem. We first propose a dual formulation of the problem and develop the Direct algorithm that is significantly more efficient than the state of the art. Second, we show how adapting database techniques developed for scalable similarity joins provides a significant speedup over Direct, scaling our proposal up to large-scale settings. Third, we describe a number of practical techniques that allow our algorithm to scale to settings of size in the order of a million by a billion. We also adapt our proposal to identify the top-k components of the solved system of linear equations. Finally, we consider the dynamic setting where the inputs to the linear system change and propose efficient algorithms inspired by the database techniques of materialization and reuse. Extensive experiments on real-world and synthetic data confirm the efficiency, effectiveness and scalability of our proposal.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
We assume that the problem has at least one solution.
 
2
Note that \(\min \frac{1}{2}X^\mathrm{T}X - {X'}^\mathrm{T}X\) is the same as \(\min ||X-X'||_2\).
 
3
Since, looking at Fig. 1, Eq. 1 has a single optimal point, Eq. 4 has one stationary point which happens to be the saddle point.
 
5
We have found out the knee point of the cumulative flow is around 2%.
 
Literatur
1.
Zurück zum Zitat Beyer, K., Gemulla, R., Haas, P.J., Reinwald, B., Sismanis, Y.: Distinct-value synopses for multiset operations. Commun. ACM 52(10), 87–95 (2009)CrossRef Beyer, K., Gemulla, R., Haas, P.J., Reinwald, B., Sismanis, Y.: Distinct-value synopses for multiset operations. Commun. ACM 52(10), 87–95 (2009)CrossRef
2.
Zurück zum Zitat Bjerhammar, A.: Application of Calculus of Matrices to Method of Least Squares: With Special Reference to Geodetic Calculations. Elander, Göteborg (1951)MATH Bjerhammar, A.: Application of Calculus of Matrices to Method of Least Squares: With Special Reference to Geodetic Calculations. Elander, Göteborg (1951)MATH
3.
Zurück zum Zitat Boehm, M., Dusenberry, M.W., Eriksson, D., Evfimievski, A.V., Manshadi, F.M., Pansare, N., Reinwald, B., Reiss, F.R., Sen, P., Surve, A.C., et al.: SystemML: declarative machine learning on spark. PVLDB 9(13), 1425–1436 (2016) Boehm, M., Dusenberry, M.W., Eriksson, D., Evfimievski, A.V., Manshadi, F.M., Pansare, N., Reinwald, B., Reiss, F.R., Sen, P., Surve, A.C., et al.: SystemML: declarative machine learning on spark. PVLDB 9(13), 1425–1436 (2016)
4.
Zurück zum Zitat Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRef Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRef
5.
Zurück zum Zitat Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings on Compression and Complexity of Sequences 1997, pp. 21–29 (1997) Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings on Compression and Complexity of Sequences 1997, pp. 21–29 (1997)
6.
Zurück zum Zitat Candes, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)MathSciNetCrossRef Candes, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)MathSciNetCrossRef
7.
Zurück zum Zitat Cao, J., Davis, D., Vander Wiel, S., Yu, B.: Time-varying network tomography: router link data. J. Am. Stat. Assoc. 95(452), 1063–1075 (2000)MathSciNetCrossRef Cao, J., Davis, D., Vander Wiel, S., Yu, B.: Time-varying network tomography: router link data. J. Am. Stat. Assoc. 95(452), 1063–1075 (2000)MathSciNetCrossRef
8.
Zurück zum Zitat Chandrasekaran, B.: Survey of Network Traffic Models, vol. 567. Washington University, St. Louis CSE (2009) Chandrasekaran, B.: Survey of Network Traffic Models, vol. 567. Washington University, St. Louis CSE (2009)
9.
Zurück zum Zitat Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE (2006) Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE (2006)
10.
Zurück zum Zitat Cohen, E., Kaplan, H.: Tighter estimation using bottom k sketches. PVLDB 1(1), 213–224 (2008) Cohen, E., Kaplan, H.: Tighter estimation using bottom k sketches. PVLDB 1(1), 213–224 (2008)
11.
Zurück zum Zitat Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding hierarchical heavy hitters in data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 464–475. VLDB Endowment (2003) Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding hierarchical heavy hitters in data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 464–475. VLDB Endowment (2003)
12.
Zurück zum Zitat Craig, I.J., Brown, J.C.: Inverse problems in astronomy: a guide to inversion strategies for remotely sensed data. In: Research Supported by SERC. Adam Hilger, Ltd., Bristol and Boston (1986) Craig, I.J., Brown, J.C.: Inverse problems in astronomy: a guide to inversion strategies for remotely sensed data. In: Research Supported by SERC. Adam Hilger, Ltd., Bristol and Boston (1986)
13.
Zurück zum Zitat Dasu, T., Johnson, T., Muthukrishnan, S., Shkapenyuk, V.: Mining database structure; or, how to build a data quality browser. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 240–251. ACM (2002) Dasu, T., Johnson, T., Muthukrishnan, S., Shkapenyuk, V.: Mining database structure; or, how to build a data quality browser. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 240–251. ACM (2002)
14.
Zurück zum Zitat Ding, B., König, A.C.: Fast set intersection in memory. PVLDB 4(4), 255–266 (2011) Ding, B., König, A.C.: Fast set intersection in memory. PVLDB 4(4), 255–266 (2011)
16.
Zurück zum Zitat Erdos, P., Rényi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5(1), 17–60 (1960)MathSciNetMATH Erdos, P., Rényi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5(1), 17–60 (1960)MathSciNetMATH
17.
Zurück zum Zitat Fortz, B., Thorup, M.: Optimizing OSPF/IS-IS weights in a changing world. IEEE J. Sel. Areas Commun. 20(4), 756–767 (2002)CrossRef Fortz, B., Thorup, M.: Optimizing OSPF/IS-IS weights in a changing world. IEEE J. Sel. Areas Commun. 20(4), 756–767 (2002)CrossRef
18.
Zurück zum Zitat Ge, D., Jiang, X., Ye, Y.: A note on the complexity of \(L_p\) minimization. Math. Program. 129(2), 285–299 (2011)MathSciNetCrossRef Ge, D., Jiang, X., Ye, Y.: A note on the complexity of \(L_p\) minimization. Math. Program. 129(2), 285–299 (2011)MathSciNetCrossRef
19.
Zurück zum Zitat Goldschmidt, O.: ISP backbone traffic inference methods to support traffic engineering. In: Internet Statistics and Metrics Analysis (ISMA) Workshop, pp. 1063–1075 (2000) Goldschmidt, O.: ISP backbone traffic inference methods to support traffic engineering. In: Internet Statistics and Metrics Analysis (ISMA) Workshop, pp. 1063–1075 (2000)
21.
Zurück zum Zitat Gordon, J.: Pareto process as a model of self-similar packet traffic. In: Global Telecommunications Conference, 1995 (GLOBECOM’95) vol. 3, pp. 2232–2236 (1995) Gordon, J.: Pareto process as a model of self-similar packet traffic. In: Global Telecommunications Conference, 1995 (GLOBECOM’95) vol. 3, pp. 2232–2236 (1995)
22.
Zurück zum Zitat Grangeat, P., Amans, J.L.: Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine, vol. 4. Springer, Berlin (2013) Grangeat, P., Amans, J.L.: Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine, vol. 4. Springer, Berlin (2013)
23.
Zurück zum Zitat Hadjieleftheriou, M., Yu, X., Koudas, N., Srivastava, D.: Hashed samples: selectivity estimators for set similarity selection queries. PVLDB 1(1), 201–212 (2008) Hadjieleftheriou, M., Yu, X., Koudas, N., Srivastava, D.: Hashed samples: selectivity estimators for set similarity selection queries. PVLDB 1(1), 201–212 (2008)
24.
Zurück zum Zitat Hansen, P.C.: Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion. SIAM, Philadelphia (1998)CrossRef Hansen, P.C.: Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion. SIAM, Philadelphia (1998)CrossRef
25.
Zurück zum Zitat Hasani, S., Thirumuruganathan, S., Asudeh, A., Koudas, N., Das, G.: Efficient construction of approximate ad-hoc ML models through materialization and reuse. Proc. VLDB Endow. 11(11), 1468–1481 (2018)CrossRef Hasani, S., Thirumuruganathan, S., Asudeh, A., Koudas, N., Das, G.: Efficient construction of approximate ad-hoc ML models through materialization and reuse. Proc. VLDB Endow. 11(11), 1468–1481 (2018)CrossRef
26.
Zurück zum Zitat Hrinivich, W.T., Hoover, D.A., Surry, K., Edirisinghe, C., D’Souza, D., Fenster, A., Wong, E.: Ultrasound guided high-dose-rate prostate brachytherapy: live needle segmentation and 3d image reconstruction using the sagittal transducer. Brachytherapy 15, S195 (2016)CrossRef Hrinivich, W.T., Hoover, D.A., Surry, K., Edirisinghe, C., D’Souza, D., Fenster, A., Wong, E.: Ultrasound guided high-dose-rate prostate brachytherapy: live needle segmentation and 3d image reconstruction using the sagittal transducer. Brachytherapy 15, S195 (2016)CrossRef
27.
Zurück zum Zitat Kaoudi, Z., Quiané-Ruiz, J.A., Thirumuruganathan, S., Chawla, S., Agrawal, D.: A cost-based optimizer for gradient descent optimization. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 977–992. ACM (2017) Kaoudi, Z., Quiané-Ruiz, J.A., Thirumuruganathan, S., Chawla, S., Agrawal, D.: A cost-based optimizer for gradient descent optimization. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 977–992. ACM (2017)
28.
Zurück zum Zitat Kim, Y., Nelson, P.: Optimal regularisation for acoustic source reconstruction by inverse methods. J. Sound Vib. 275(3), 463–487 (2004)CrossRef Kim, Y., Nelson, P.: Optimal regularisation for acoustic source reconstruction by inverse methods. J. Sound Vib. 275(3), 463–487 (2004)CrossRef
29.
Zurück zum Zitat Kleinrock, L., Kamoun, F.: Hierarchical routing for large networks performance evaluation and optimization. Comput. Netw. 1(3), 155–174 (1977)MathSciNetMATH Kleinrock, L., Kamoun, F.: Hierarchical routing for large networks performance evaluation and optimization. Comput. Netw. 1(3), 155–174 (1977)MathSciNetMATH
30.
Zurück zum Zitat Kumar, A., Naughton, J., Patel, J.M., Zhu, X.: To join or not to join? Thinking twice about joins before feature selection. In: Proceedings of the 2016 International Conference on Management of Data, pp. 19–34. ACM (2016) Kumar, A., Naughton, J., Patel, J.M., Zhu, X.: To join or not to join? Thinking twice about joins before feature selection. In: Proceedings of the 2016 International Conference on Management of Data, pp. 19–34. ACM (2016)
31.
Zurück zum Zitat Lagrange, J.L.: Mécanique Analytique, vol. 1. Mallet-Bachelier, Paris (1853) Lagrange, J.L.: Mécanique Analytique, vol. 1. Mallet-Bachelier, Paris (1853)
32.
Zurück zum Zitat Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636. ACM (2006) Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636. ACM (2006)
33.
Zurück zum Zitat Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 177–187. ACM (2005) Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 177–187. ACM (2005)
34.
Zurück zum Zitat McMahan, B., Ramage, D.: Federated learning: collaborative machine learning without centralized training data. Technical report, Google (2017) McMahan, B., Ramage, D.: Federated learning: collaborative machine learning without centralized training data. Technical report, Google (2017)
35.
Zurück zum Zitat Medina, A., Taft, N., Salamatian, K., Bhattacharyya, S., Diot, C.: Traffic matrix estimation: existing techniques and new directions. ACM SIGCOMM Comput. Commun. Rev. 32(4), 161–174 (2002)CrossRef Medina, A., Taft, N., Salamatian, K., Bhattacharyya, S., Diot, C.: Traffic matrix estimation: existing techniques and new directions. ACM SIGCOMM Comput. Commun. Rev. 32(4), 161–174 (2002)CrossRef
36.
Zurück zum Zitat Moors, E.: On the reciprocal of the general algebraic matrix (abstract). Bull. Am. Math. Soc. 26, 394–395 (1920) Moors, E.: On the reciprocal of the general algebraic matrix (abstract). Bull. Am. Math. Soc. 26, 394–395 (1920)
37.
Zurück zum Zitat Needell, D., Tropp, J.A.: CoSaMP: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26(3), 301–321 (2009)MathSciNetCrossRef Needell, D., Tropp, J.A.: CoSaMP: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26(3), 301–321 (2009)MathSciNetCrossRef
38.
Zurück zum Zitat Nunes, B.A.A., Mendonca, M., Nguyen, X.N., Obraczka, K., Turletti, T.: A survey of software-defined networking: past, present, and future of programmable networks. IEEE Commun. Surv. Tutor. 16(3), 1617–1634 (2014)CrossRef Nunes, B.A.A., Mendonca, M., Nguyen, X.N., Obraczka, K., Turletti, T.: A survey of software-defined networking: past, present, and future of programmable networks. IEEE Commun. Surv. Tutor. 16(3), 1617–1634 (2014)CrossRef
39.
Zurück zum Zitat Penrose, R.: A generalized inverse for matrices. In: Mathematical proceedings of the Cambridge philosophical society, vol. 51, pp. 406–413. Cambridge University Press, Cambridge (1955)CrossRef Penrose, R.: A generalized inverse for matrices. In: Mathematical proceedings of the Cambridge philosophical society, vol. 51, pp. 406–413. Cambridge University Press, Cambridge (1955)CrossRef
40.
Zurück zum Zitat Tebaldi, C., West, M.: Bayesian inference on network traffic using link count data. J. Am. Stat. Assoc. 93(442), 557–573 (1998)MathSciNetCrossRef Tebaldi, C., West, M.: Bayesian inference on network traffic using link count data. J. Am. Stat. Assoc. 93(442), 557–573 (1998)MathSciNetCrossRef
41.
Zurück zum Zitat Trefethen, L.N., Bau III, D.: Numerical linear algebra. Society for Industrial and Applied Mathematics, Philadelphia. Technical report, ISBN 978-0-89871-361-9 (1997) Trefethen, L.N., Bau III, D.: Numerical linear algebra. Society for Industrial and Applied Mathematics, Philadelphia. Technical report, ISBN 978-0-89871-361-9 (1997)
42.
Zurück zum Zitat Tsirogiannis, D., Guha, S., Koudas, N.: Improving the performance of list intersection. PVLDB 2(1), 838–849 (2009) Tsirogiannis, D., Guha, S., Koudas, N.: Improving the performance of list intersection. PVLDB 2(1), 838–849 (2009)
43.
Zurück zum Zitat Tune, P., Roughan, M.: Maximum entropy traffic matrix synthesis. In: ACM SIGMETRICS Performance Evaluation Review vol. 42(2), pp. 43–45 (2014)CrossRef Tune, P., Roughan, M.: Maximum entropy traffic matrix synthesis. In: ACM SIGMETRICS Performance Evaluation Review vol. 42(2), pp. 43–45 (2014)CrossRef
44.
Zurück zum Zitat Vogel, C.R.: Computational Methods for Inverse Problems. SIAM, Philadelphia (2002)CrossRef Vogel, C.R.: Computational Methods for Inverse Problems. SIAM, Philadelphia (2002)CrossRef
45.
Zurück zum Zitat Zhang, C., Kumar, A., Ré, C.: Materialization optimizations for feature selection workloads. ACM Trans. Datab. Syst. (TODS) 41(1), 2 (2016)MathSciNet Zhang, C., Kumar, A., Ré, C.: Materialization optimizations for feature selection workloads. ACM Trans. Datab. Syst. (TODS) 41(1), 2 (2016)MathSciNet
46.
Zurück zum Zitat Zhang, Y., Roughan, M., Duffield, N., Greenberg, A.: Fast accurate computation of large-scale IP traffic matrices from link loads. In: ACM SIGMETRICS Performance Evaluation Review, vol. 31, pp. 206–217. ACM (2003)CrossRef Zhang, Y., Roughan, M., Duffield, N., Greenberg, A.: Fast accurate computation of large-scale IP traffic matrices from link loads. In: ACM SIGMETRICS Performance Evaluation Review, vol. 31, pp. 206–217. ACM (2003)CrossRef
47.
Zurück zum Zitat Zhang, Y., Roughan, M., Lund, C., Donoho, D.: An information-theoretic approach to traffic matrix estimation. In: Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pp. 301–312. ACM (2003) Zhang, Y., Roughan, M., Lund, C., Donoho, D.: An information-theoretic approach to traffic matrix estimation. In: Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pp. 301–312. ACM (2003)
Metadaten
Titel
Scalable algorithms for signal reconstruction by leveraging similarity joins
verfasst von
Abolfazl Asudeh
Jees Augustine
Azade Nazi
Saravanan Thirumuruganathan
Nan Zhang
Gautam Das
Divesh Srivastava
Publikationsdatum
14.08.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 2-3/2020
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-019-00562-z

Weitere Artikel der Ausgabe 2-3/2020

The VLDB Journal 2-3/2020 Zur Ausgabe

Guest Editorial

VLDB SI 2018 editorial

Premium Partner