Skip to main content
Top
Published in: Data Mining and Knowledge Discovery 1/2017

26-04-2016

Efficient histogram dictionary learning for text/image modeling and classification

Author: Minyoung Kim

Published in: Data Mining and Knowledge Discovery | Issue 1/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In dealing with text or image data, it is quite effective to represent them as histograms. In modeling histograms, although recent Bayesian topic models such as latent Dirichlet allocation and its variants are shown to be successful, they often suffer from computational overhead for inference of a large number of hidden variables. In this paper we consider a different modeling strategy of forming a dictionary of base histograms whose convex combination yields a histogram of observable text/image document. The dictionary entries are learned from data, which establishes direct/indirect association between specific topics/keywords and the base histograms. From a learned dictionary, the coding of an observed histogram can provide succinct and salient information useful for classification. One of our main contributions is that we propose a very efficient dictionary learning algorithm based on the recent Nesterov’s smooth optimization technique in conjunction with analytic solution methods for quadratic minimization sub-problems. Not alone the faster theoretical convergence rate, also in real time, our algorithm is 20–30 times faster than general-purpose optimizers such as interior-point methods. In classification/annotation tasks on several text/image datasets, our approach exhibits comparable or often superior performance to existing Bayesian models, while significantly faster than their variational inference.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Typically, each visual codeword corresponds to a particular cluster in the feature space after clustering all features from data.
 
2
Hence, hereafter we often abuse the term document to refer to both text-based document and image of visual codewords.
 
3
The distance between two histogram vectors \(\mathbf{h}\) and \(\mathbf{h}'\) is defined as: \(d_{\chi ^2}(\mathbf{h},\mathbf{h}') = \sum _j \frac{(h_j-h'_j)^2}{h_j+h'_j}\) or \(d_{L_2}(\mathbf{h},\mathbf{h}')=\sum _j (h_j-h'_j)^2\).
 
4
Here we typically assume \(M \ll V\).
 
5
We will see soon how this can be explicitly formulated.
 
6
This aims to find approximate factorization of data matrix \(\mathbf{H} = [\mathbf{h}_1,\dots ,\mathbf{h}_n] \approx \mathbf{X} \cdot \mathbf{A}\) (hence, of rank M) where \(\mathbf{A} = [{\varvec{\alpha }}_1, \dots , {\varvec{\alpha }}_n]\).
 
7
\(f(y) \le f(x) + \nabla f(x)^{\top }(y-x) + \frac{L}{2} \Vert y-x\Vert _2^2\). It can be easily shown from (8).
 
8
The stopping criterion in the algorithm is when the relative change in the iterates or the objective values is below some threshold (e.g., \(10^{-4}\)).
 
9
We use fmincon() in Matlab that implements the algorithm.
 
10
Specifically, we apply the CVX Matlab package (CVX Research Inc 2012; Grant and Boyd 2008).
 
12
More extensive results on running times are demonstrated in the next sections.
 
13
We use the C++ implementation publicly available from http://​www.​cs.​cmu.​edu/​~chongw/​slda/​.
 
15
The SLDA model of Wang et al. (2009) can either ignore or exploit the annotation terms in learning. In this paper we only test with the former model not only because there is less significant improvement with the annotation information as reported in Wang et al. (2009), but also due to the unavailability of the codes for the latter model.
 
16
Refer to the supplemental material for the performance of fairly standard existing approaches including Gaussian mixtures and (sparse) non-negative matrix factorization.
 
18
In the supplemental material, we also show the performance of fairly standard existing approaches including Gaussian mixtures and (sparse) non-negative matrix factorization.
 
Literature
go back to reference Aharon M, Elad M, Bruckstein AM (2005) K-svd and its non-negative variant for dictionary design. In: Proceedings of the SPIE conference wavelets, pp 327–339 Aharon M, Elad M, Bruckstein AM (2005) K-svd and its non-negative variant for dictionary design. In: Proceedings of the SPIE conference wavelets, pp 327–339
go back to reference Asuncion A, Newman D (2007) UCI machine learning repository Asuncion A, Newman D (2007) UCI machine learning repository
go back to reference Bach F, Mairal J, Ponce J (2012) Task-driven dictionary learning. IEEE Trans Pattern Anal Mach Intell 34(4):791–804CrossRef Bach F, Mairal J, Ponce J (2012) Task-driven dictionary learning. IEEE Trans Pattern Anal Mach Intell 34(4):791–804CrossRef
go back to reference Barla A, Odone F, Verri A (2003) Histogram intersection kernel for image classification. In: International conference on image processing Barla A, Odone F, Verri A (2003) Histogram intersection kernel for image classification. In: International conference on image processing
go back to reference Bayón L, Grau JM, Suárez PM (2002) A new formulation of the equivalent thermal in optimization of hydrothermal systems. Math Prob Eng 8(3):181–196MathSciNetCrossRefMATH Bayón L, Grau JM, Suárez PM (2002) A new formulation of the equivalent thermal in optimization of hydrothermal systems. Math Prob Eng 8(3):181–196MathSciNetCrossRefMATH
go back to reference Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202MathSciNetCrossRefMATH Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202MathSciNetCrossRefMATH
go back to reference Blei D, Jordan M (2003) Modeling annotated data. In: ACM SIGIR conference Blei D, Jordan M (2003) Modeling annotated data. In: ACM SIGIR conference
go back to reference Blei D, McAuliffe J (2007) Supervised topic models. In: Neural information processing systems Blei D, McAuliffe J (2007) Supervised topic models. In: Neural information processing systems
go back to reference Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH
go back to reference Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. In: European conference on computer vision Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. In: European conference on computer vision
go back to reference Coates A, Lee H, Ng AY (2011) An analysis of single layer networks in unsupervised feature learning. In: International conference on Artificial Intelligence and Statistics (AISTATS) Coates A, Lee H, Ng AY (2011) An analysis of single layer networks in unsupervised feature learning. In: International conference on Artificial Intelligence and Statistics (AISTATS)
go back to reference Coleman TF, Li Y (1996) A reflective Newton method for minimizing a quadratic function subject to bounds on some of the variables. SIAM J Optim 6(4):1040–1058MathSciNetCrossRefMATH Coleman TF, Li Y (1996) A reflective Newton method for minimizing a quadratic function subject to bounds on some of the variables. SIAM J Optim 6(4):1040–1058MathSciNetCrossRefMATH
go back to reference Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: IEEE international conference on computer vision and pattern recognition Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: IEEE international conference on computer vision and pattern recognition
go back to reference Gill PE, Murray W, Wright MH (1981) Pract Optim. Academic Press, London Gill PE, Murray W, Wright MH (1981) Pract Optim. Academic Press, London
go back to reference Grant M, Boyd S (2008) Graph implementations for nonsmooth convex programs., Recent Advances in Learning and ControlSpringer, London, pp 95–110MATH Grant M, Boyd S (2008) Graph implementations for nonsmooth convex programs., Recent Advances in Learning and ControlSpringer, London, pp 95–110MATH
go back to reference Ho ND, Dooren PV (2008) Non-negative matrix factorization with fixed row and column sums. Linear Algebra Appl 429(5–6):1020–1025MathSciNetCrossRefMATH Ho ND, Dooren PV (2008) Non-negative matrix factorization with fixed row and column sums. Linear Algebra Appl 429(5–6):1020–1025MathSciNetCrossRefMATH
go back to reference Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of uncertainty in artificial intelligence Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of uncertainty in artificial intelligence
go back to reference Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469MathSciNetMATH Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469MathSciNetMATH
go back to reference Kiros R, Szepesvári C (2012) Deep representations and codes for image auto-annotation. In: Advances in Neural Information Processing Systems (NIPS) Kiros R, Szepesvári C (2012) Deep representations and codes for image auto-annotation. In: Advances in Neural Information Processing Systems (NIPS)
go back to reference Kreutz-Delgado K, Murray JF, Rao BD, Engan K, Lee TW, Sejnowski TJ (2003) Dictionary learning algorithms for sparse representation. Neural Comput 15(2):349–396CrossRefMATH Kreutz-Delgado K, Murray JF, Rao BD, Engan K, Lee TW, Sejnowski TJ (2003) Dictionary learning algorithms for sparse representation. Neural Comput 15(2):349–396CrossRefMATH
go back to reference Li LJ, Fei-Fei L (2007) What, where and who? classifying event by scene and object recognition. In: IEEE International Conference on Computer Vision Li LJ, Fei-Fei L (2007) What, where and who? classifying event by scene and object recognition. In: IEEE International Conference on Computer Vision
go back to reference Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Computer Vision 60(2):91–110CrossRef Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Computer Vision 60(2):91–110CrossRef
go back to reference Mairal J, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. In: International conference on machine learning Mairal J, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. In: International conference on machine learning
go back to reference Osborne MR, Presnell B, Turlach B (2000) A new approach to variable selection in least squares problems. IMA J Numer Anal 20(3):389–403MathSciNetCrossRefMATH Osborne MR, Presnell B, Turlach B (2000) A new approach to variable selection in least squares problems. IMA J Numer Anal 20(3):389–403MathSciNetCrossRefMATH
go back to reference Pele O, Werman M (2010) The quadratic-chi histogram distance family. In: European conference on computer vision Pele O, Werman M (2010) The quadratic-chi histogram distance family. In: European conference on computer vision
go back to reference Perkins S, Theiler J (2003) Online feature selection using grafting. In: International conference on machine learning (ICML) Perkins S, Theiler J (2003) Online feature selection using grafting. In: International conference on machine learning (ICML)
go back to reference Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ, Barlett P, Schölkopf B, Schuurmans D (eds) Advances in Large Margin Classifiers. MIT Press, Cambridge Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ, Barlett P, Schölkopf B, Schuurmans D (eds) Advances in Large Margin Classifiers. MIT Press, Cambridge
go back to reference Polyak BT (1987) Introduction to optimization. Optimization Software Inc., New YorkMATH Polyak BT (1987) Introduction to optimization. Optimization Software Inc., New YorkMATH
go back to reference Rosset S (2004) Tracking curved regularized optimization solution paths. In: In Advances in Neural Information Processing Systems. MIT Press Rosset S (2004) Tracking curved regularized optimization solution paths. In: In Advances in Neural Information Processing Systems. MIT Press
go back to reference Rubinstein R, Bruckstein A, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057CrossRef Rubinstein R, Bruckstein A, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057CrossRef
go back to reference Rubner Y, Tomasi C, Guibas L (2000) The earth mover’s distance as a metric for image retrieval. Int J Computer Vision 40(2):99–121CrossRefMATH Rubner Y, Tomasi C, Guibas L (2000) The earth mover’s distance as a metric for image retrieval. Int J Computer Vision 40(2):99–121CrossRefMATH
go back to reference Russell B, Torralba A, Murphy K, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vision 77(1–3):157–173CrossRef Russell B, Torralba A, Murphy K, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vision 77(1–3):157–173CrossRef
go back to reference Sindhwani V, Ghoting A (2012) Large-scale distributed non-negative sparse coding and sparse dictionary learning. In: International conference on knowledge discovery and data mining Sindhwani V, Ghoting A (2012) Large-scale distributed non-negative sparse coding and sparse dictionary learning. In: International conference on knowledge discovery and data mining
go back to reference Thurau C, Kersting K, Bauckhage C (2009) Convex non-negative matrix factorization in the wild. In: International conference on data mining Thurau C, Kersting K, Bauckhage C (2009) Convex non-negative matrix factorization in the wild. In: International conference on data mining
go back to reference Tosic I, Frossard P (2011) Dictionary learning. IEEE Signal Proc Mag 28(2):27–38CrossRef Tosic I, Frossard P (2011) Dictionary learning. IEEE Signal Proc Mag 28(2):27–38CrossRef
go back to reference Wang C, Blei DM, Fei-Fei L (2009) Simultaneous image classification and annotation. In: IEEE international conference on computer vision and pattern recognition Wang C, Blei DM, Fei-Fei L (2009) Simultaneous image classification and annotation. In: IEEE international conference on computer vision and pattern recognition
go back to reference Wang Y, Jia Y, Hu C, Turk M (2004) Fisher non-negative matrix factorization for learning local features. In: Asian conference on computer vision Wang Y, Jia Y, Hu C, Turk M (2004) Fisher non-negative matrix factorization for learning local features. In: Asian conference on computer vision
go back to reference Yang AY, Zhou Z, Ganesh A, Sastry SS, Ma Y (2013) Fast \(l_1\)-minimization algorithms for robust face recognition. IEEE Trans Image Proc 22(8):3234–3246CrossRef Yang AY, Zhou Z, Ganesh A, Sastry SS, Ma Y (2013) Fast \(l_1\)-minimization algorithms for robust face recognition. IEEE Trans Image Proc 22(8):3234–3246CrossRef
Metadata
Title
Efficient histogram dictionary learning for text/image modeling and classification
Author
Minyoung Kim
Publication date
26-04-2016
Publisher
Springer US
Published in
Data Mining and Knowledge Discovery / Issue 1/2017
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-016-0461-2

Other articles of this Issue 1/2017

Data Mining and Knowledge Discovery 1/2017 Go to the issue

Premium Partner