Skip to main content
Top

2017 | OriginalPaper | Chapter

A Novel String Representation and Kernel Function for the Comparison of I/O Access Patterns

Authors : Raul Torres, Julian Kunkel, Manuel F. Dolz, Thomas Ludwig

Published in: Parallel Computing Technologies

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Parallel I/O access patterns act as fingerprints of a parallel program. In order to extract meaningful information from these patterns, they have to be represented appropriately. Due to the fact that string objects can be easily compared using Kernel Methods, a conversion to a weighted string representation is proposed in this paper, together with a novel string kernel function called Kast Spectrum Kernel. The similarity matrices, obtained after applying the mentioned kernel over a set of examples from a real application, were analyzed using Kernel Principal Component Analysis (Kernel PCA) and Hierarchical Clustering. The evaluation showed that 2 out of 4 I/O access pattern groups were completely identified, while the other 2 conformed a single cluster due to the intrinsic similarity of their members. The proposed strategy can be promisingly applied to other similarity problems involving tree-like structured data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Kunkel, J.M.: Simulating parallel programs on application and system level. Comput. Sci. Res. Dev. 28(2), 167–174 (2012) Kunkel, J.M.: Simulating parallel programs on application and system level. Comput. Sci. Res. Dev. 28(2), 167–174 (2012)
2.
go back to reference Liu, Y., Gunasekaran, R., Ma, X.S., Vazhkudai, S.S.: Automatic identification of application I/O signatures from noisy server-side traces. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 2014), Santa Clara, pp. 213–228 (2014) Liu, Y., Gunasekaran, R., Ma, X.S., Vazhkudai, S.S.: Automatic identification of application I/O signatures from noisy server-side traces. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 2014), Santa Clara, pp. 213–228 (2014)
3.
go back to reference Kung, S.Y.: Kernel Methods and Machine Learning. Cambridge University Press, Cambridge (2014)CrossRefMATH Kung, S.Y.: Kernel Methods and Machine Learning. Cambridge University Press, Cambridge (2014)CrossRefMATH
4.
go back to reference Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)CrossRefMATH Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)CrossRefMATH
5.
go back to reference BakIr, G., Hofmann, T., Schölkopf, B., Smola, A.J., Taskar, B., Vishwanathan, S.V.N.: Predicting Structured Data. The MIT Press, Cambridge (2007) BakIr, G., Hofmann, T., Schölkopf, B., Smola, A.J., Taskar, B., Vishwanathan, S.V.N.: Predicting Structured Data. The MIT Press, Cambridge (2007)
6.
go back to reference Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning - Data Mining, Inference. Springer Series in Statistics. Springer, New York (2009)CrossRefMATH Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning - Data Mining, Inference. Springer Series in Statistics. Springer, New York (2009)CrossRefMATH
7.
go back to reference Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997). doi:10.1007/BFb0020217 Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997). doi:10.​1007/​BFb0020217
8.
9.
go back to reference Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels and Distances for Structured Data. Mach. Learn. 57(3), 205–232 (2004)CrossRefMATH Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels and Distances for Structured Data. Mach. Learn. 57(3), 205–232 (2004)CrossRefMATH
10.
go back to reference Haussler, D.: Convolution Kernels on Discrete Structures. Technical Report. University of California at Santa Cruz (1999) Haussler, D.: Convolution Kernels on Discrete Structures. Technical Report. University of California at Santa Cruz (1999)
11.
go back to reference Vishwanathan, S.V.N., Smola, A.J.: Fast kernels for string and tree matching. In: Advances in Neural Information Processing Systems 15, pp. 569–576 (2003) Vishwanathan, S.V.N., Smola, A.J.: Fast kernels for string and tree matching. In: Advances in Neural Information Processing Systems 15, pp. 569–576 (2003)
12.
go back to reference Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the Pacific Symposium on Biocomputing, vol. 7, pp. 566–575 (2002) Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the Pacific Symposium on Biocomputing, vol. 7, pp. 566–575 (2002)
13.
go back to reference Kluge, M.: Comparison and End-to-End Performance Analysis of Parallel Filesystems. Ph.D. Thesis Dissertation. Technische Universität Dresden (2011) Kluge, M.: Comparison and End-to-End Performance Analysis of Parallel Filesystems. Ph.D. Thesis Dissertation. Technische Universität Dresden (2011)
14.
go back to reference Loewe, W., McLarty, T., Morrone, C.: IOR Benchmark (2012) Loewe, W., McLarty, T., Morrone, C.: IOR Benchmark (2012)
15.
go back to reference Fryxell, B., Olson, K., Ricker, P., Timmes, F.X., Zingale, M., Lamb, D.Q., MacNeice, P., Rosner, R., Truran, J.W., Tufo, H.: FLASH: an adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. Astrophys. J. Suppl. Ser. 131(1), 273 (2000)CrossRef Fryxell, B., Olson, K., Ricker, P., Timmes, F.X., Zingale, M., Lamb, D.Q., MacNeice, P., Rosner, R., Truran, J.W., Tufo, H.: FLASH: an adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. Astrophys. J. Suppl. Ser. 131(1), 273 (2000)CrossRef
16.
go back to reference Madhyastha, T.M., Reed, D.A.: Learning to classify parallel input/output access patterns. IEEE Trans. Parallel Distrib. Syst. 13(8), 802–813 (2002)CrossRef Madhyastha, T.M., Reed, D.A.: Learning to classify parallel input/output access patterns. IEEE Trans. Parallel Distrib. Syst. 13(8), 802–813 (2002)CrossRef
17.
go back to reference Behzad B., Byna S., Prabhat and Snir, M.: Pattern-driven parallel I/O tuning. In: Proceedings of the 10th Parallel Data Storage Workshop, Austin, Texas, pp. 43–48 (2015) Behzad B., Byna S., Prabhat and Snir, M.: Pattern-driven parallel I/O tuning. In: Proceedings of the 10th Parallel Data Storage Workshop, Austin, Texas, pp. 43–48 (2015)
18.
go back to reference Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD 1998, Seattle, pp. 94–105 (1998) Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD 1998, Seattle, pp. 94–105 (1998)
19.
go back to reference Koller, R., Rangaswami, R.: I/O Deduplication: utilizing content similarity to improve I/O performance. ACM Trans. Storage (TOS) 6(3), 13:1–13:26 (2010) Koller, R., Rangaswami, R.: I/O Deduplication: utilizing content similarity to improve I/O performance. ACM Trans. Storage (TOS) 6(3), 13:1–13:26 (2010)
Metadata
Title
A Novel String Representation and Kernel Function for the Comparison of I/O Access Patterns
Authors
Raul Torres
Julian Kunkel
Manuel F. Dolz
Thomas Ludwig
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-62932-2_48

Premium Partner