Skip to main content
Erschienen in: Journal of Scientific Computing 1/2018

08.09.2017

Integer Matrix Approximation and Data Mining

verfasst von: Bo Dong, Matthew M. Lin, Haesun Park

Erschienen in: Journal of Scientific Computing | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Integer datasets frequently appear in many applications in science and engineering. To analyze these datasets, we consider an integer matrix approximation technique that can preserve the original dataset characteristics. Because integers are discrete in nature, to the best of our knowledge, no previously proposed technique developed for real numbers can be successfully applied. In this study, we first conduct a thorough review of current algorithms that can solve integer least squares problems, and then we develop an alternative least square method based on an integer least squares estimation to obtain the integer approximation of the integer matrices. We discuss numerical applications for the approximation of randomly generated integer matrices as well as studies of association rule mining, cluster analysis, and pattern extraction. Our computed results suggest that our proposed method can calculate a more accurate solution for discrete datasets than other existing methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min Knowl. Discov. 11(1), 5–33 (2005)MathSciNetCrossRef Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min Knowl. Discov. 11(1), 5–33 (2005)MathSciNetCrossRef
4.
Zurück zum Zitat Banerjee, A., Krumpelman, C., Ghosh, J., Basu, S., Mooney, R.J.: Model-based overlapping clustering. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD ’05, pp. 532–537. ACM, New York, NY, USA (2005). doi:10.1145/1081870.1081932 Banerjee, A., Krumpelman, C., Ghosh, J., Basu, S., Mooney, R.J.: Model-based overlapping clustering. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD ’05, pp. 532–537. ACM, New York, NY, USA (2005). doi:10.​1145/​1081870.​1081932
5.
10.
Zurück zum Zitat Chowdhury, G.G.: Introduction to Modern Information Retrieval, 2nd edn. Facet Publishing, Abingdon (2004) Chowdhury, G.G.: Introduction to Modern Information Retrieval, 2nd edn. Facet Publishing, Abingdon (2004)
12.
Zurück zum Zitat Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, Hoboken (2006)MATH Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, Hoboken (2006)MATH
13.
Zurück zum Zitat Donoho, D., Stodden, V.: When does nonnegative matrix factorization give a correct decomposition into parts? In: Proceedings of 17th Annual Conference on Neural Information Processing Systems. NIPS, Stanford University, Stanford, CA, 2003 (2003) Donoho, D., Stodden, V.: When does nonnegative matrix factorization give a correct decomposition into parts? In: Proceedings of 17th Annual Conference on Neural Information Processing Systems. NIPS, Stanford University, Stanford, CA, 2003 (2003)
14.
Zurück zum Zitat Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95(25), 14863–14868 (1998)CrossRef Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95(25), 14863–14868 (1998)CrossRef
15.
Zurück zum Zitat Elhamifar, E., Vidal, R.: Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2765–2781 (2013)CrossRef Elhamifar, E., Vidal, R.: Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2765–2781 (2013)CrossRef
16.
Zurück zum Zitat Filipovych, R., Resnick, S.M., Davatzikos, C.: Semi-supervised cluster analysis of imaging data. NeuroImage 54(3), 2185–2197 (2011)CrossRef Filipovych, R., Resnick, S.M., Davatzikos, C.: Semi-supervised cluster analysis of imaging data. NeuroImage 54(3), 2185–2197 (2011)CrossRef
18.
Zurück zum Zitat Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences, 4th edn. Johns Hopkins University Press, Baltimore (2013) Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences, 4th edn. Johns Hopkins University Press, Baltimore (2013)
19.
Zurück zum Zitat Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH
20.
Zurück zum Zitat Hassibi, A., Boyd, S.: Integer parameter estimation in linear models with applications to gps. In: Proceedings of the 35th IEEE Conference on Decision and Control, 1996, vol. 3, pp. 3245–3251 vol.3 (1996). doi:10.1109/CDC.1996.573639 Hassibi, A., Boyd, S.: Integer parameter estimation in linear models with applications to gps. In: Proceedings of the 35th IEEE Conference on Decision and Control, 1996, vol. 3, pp. 3245–3251 vol.3 (1996). doi:10.​1109/​CDC.​1996.​573639
21.
Zurück zum Zitat Houseman, E.A., Accomando, W.P., Koestler, D.C., Christensen, B.C., Marsit, C.J., Nelson, H.H., Wiencke, J.K., Kelsey, K.T.: Dna methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinform. 13, 86 (2012). doi:10.1186/1471-2105-13-86 Houseman, E.A., Accomando, W.P., Koestler, D.C., Christensen, B.C., Marsit, C.J., Nelson, H.H., Wiencke, J.K., Kelsey, K.T.: Dna methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinform. 13, 86 (2012). doi:10.​1186/​1471-2105-13-86
22.
Zurück zum Zitat Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)MathSciNetMATH Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)MathSciNetMATH
23.
Zurück zum Zitat Kabán, A., Bingham, E.: Factorisation and denoising of 0–1 data: a variational approach. Neurocomputing 71(10), 2291–2308 (2008)CrossRef Kabán, A., Bingham, E.: Factorisation and denoising of 0–1 data: a variational approach. Neurocomputing 71(10), 2291–2308 (2008)CrossRef
24.
Zurück zum Zitat Kannan, R., Ishteva, M., Park, H.: Bounded matrix low rank approximation. In: 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 319–328 (2012). doi:10.1109/ICDM.2012.131 Kannan, R., Ishteva, M., Park, H.: Bounded matrix low rank approximation. In: 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 319–328 (2012). doi:10.​1109/​ICDM.​2012.​131
25.
Zurück zum Zitat Kawamoto, T., Hotta, K., Mishima, T., Fujiki, J., Tanaka, M., Kurita, T.: Estimation of single tones from chord sounds using non-negative matrix factorization. Neural Netw. World 3, 429–436 (2000) Kawamoto, T., Hotta, K., Mishima, T., Fujiki, J., Tanaka, M., Kurita, T.: Estimation of single tones from chord sounds using non-negative matrix factorization. Neural Netw. World 3, 429–436 (2000)
26.
Zurück zum Zitat Kim, H., Park, H.: Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J. Matrix Anal. Appl. 30(2), 713–730 (2008)MathSciNetCrossRefMATH Kim, H., Park, H.: Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J. Matrix Anal. Appl. 30(2), 713–730 (2008)MathSciNetCrossRefMATH
27.
Zurück zum Zitat Kim, H., Park, H., Drake, B.L.: Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations. BMC Bioinform. 9, 211 (2008). doi:10.1186/1471-2105-9-211 Kim, H., Park, H., Drake, B.L.: Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations. BMC Bioinform. 9, 211 (2008). doi:10.​1186/​1471-2105-9-211
30.
Zurück zum Zitat Koyutürk, M., Grama, A.: Proximus: a framework for analyzing very high dimensional discrete-attributed datasets. In: KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 147–156. ACM, New York, NY, USA (2003). doi:10.1145/956750.956770 Koyutürk, M., Grama, A.: Proximus: a framework for analyzing very high dimensional discrete-attributed datasets. In: KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 147–156. ACM, New York, NY, USA (2003). doi:10.​1145/​956750.​956770
31.
Zurück zum Zitat Koyuturk, M., Grama, A., Ramakrishnan, N.: Compression, clustering, and pattern discovery in very high-dimensional discrete-attribute data sets. IEEE Trans. Knowl. Data Eng. 17(4), 447–461 (2005). doi:10.1109/TKDE.2005.55 CrossRef Koyuturk, M., Grama, A., Ramakrishnan, N.: Compression, clustering, and pattern discovery in very high-dimensional discrete-attribute data sets. IEEE Trans. Knowl. Data Eng. 17(4), 447–461 (2005). doi:10.​1109/​TKDE.​2005.​55 CrossRef
35.
Zurück zum Zitat Meeds, E., Ghahramani, Z., Neal, R.M., Roweis, S.T.: Modeling dyadic data with binary latent factors. Adv. Neural Inf. Process. Syst. 19, 977 (2007) Meeds, E., Ghahramani, Z., Neal, R.M., Roweis, S.T.: Modeling dyadic data with binary latent factors. Adv. Neural Inf. Process. Syst. 19, 977 (2007)
36.
Zurück zum Zitat Morgan, S.D.: Cluster analysis in electronic manufacturing. Ph.D. dissertation, North Carolina State University, Raleigh, NC 27695 (2001) Morgan, S.D.: Cluster analysis in electronic manufacturing. Ph.D. dissertation, North Carolina State University, Raleigh, NC 27695 (2001)
37.
Zurück zum Zitat Mow, W.H.: Universal lattice decoding: a review and some recent results. In: 2004 IEEE International Conference on Communications, vol. 5, pp. 2842–2846 (2004). doi:10.1109/ICC.2004.1313048 Mow, W.H.: Universal lattice decoding: a review and some recent results. In: 2004 IEEE International Conference on Communications, vol. 5, pp. 2842–2846 (2004). doi:10.​1109/​ICC.​2004.​1313048
38.
Zurück zum Zitat Segal, E., Battle, A., Koller, D.: Decomposing gene expression into cellular processes. In: In Proceedings of the 8th Pacific Symposium on Biocomputing (2003) Segal, E., Battle, A., Koller, D.: Decomposing gene expression into cellular processes. In: In Proceedings of the 8th Pacific Symposium on Biocomputing (2003)
39.
Zurück zum Zitat Slawski, M., Hein, M., Lutsik, P.: Matrix factorization with binary components. In: Advances in Neural Information Processing Systems, pp. 3210–3218 (2013) Slawski, M., Hein, M., Lutsik, P.: Matrix factorization with binary components. In: Advances in Neural Information Processing Systems, pp. 3210–3218 (2013)
40.
Zurück zum Zitat Su, K., Wassell, I.: A new ordering for efficient sphere decoding. In: 2005 IEEE International Conference on Communications, ICC 2005, vol. 3, pp. 1906–1910 (2005). doi:10.1109/ICC.2005.1494671 Su, K., Wassell, I.: A new ordering for efficient sphere decoding. In: 2005 IEEE International Conference on Communications, ICC 2005, vol. 3, pp. 1906–1910 (2005). doi:10.​1109/​ICC.​2005.​1494671
41.
Zurück zum Zitat Tu, S., Chen, R., Xu, L.: Transcription network analysis by a sparse binary factor analysis algorithm. J. Integr. Bioinform. 9(2), 68–79 (2012) Tu, S., Chen, R., Xu, L.: Transcription network analysis by a sparse binary factor analysis algorithm. J. Integr. Bioinform. 9(2), 68–79 (2012)
44.
Zurück zum Zitat Zhang, Z., Li, T., Ding, C., Zhang, X.: Binary matrix factorization with applications. In: Seventh IEEE International Conference on Data Mining, 2007, ICDM 2007, pp. 391–400. IEEE (2007) Zhang, Z., Li, T., Ding, C., Zhang, X.: Binary matrix factorization with applications. In: Seventh IEEE International Conference on Data Mining, 2007, ICDM 2007, pp. 391–400. IEEE (2007)
Metadaten
Titel
Integer Matrix Approximation and Data Mining
verfasst von
Bo Dong
Matthew M. Lin
Haesun Park
Publikationsdatum
08.09.2017
Verlag
Springer US
Erschienen in
Journal of Scientific Computing / Ausgabe 1/2018
Print ISSN: 0885-7474
Elektronische ISSN: 1573-7691
DOI
https://doi.org/10.1007/s10915-017-0531-7

Weitere Artikel der Ausgabe 1/2018

Journal of Scientific Computing 1/2018 Zur Ausgabe