Skip to main content
Erschienen in: Journal of Intelligent Information Systems 1/2009

01.08.2009

A local semi-supervised Sammon algorithm for textual data visualization

verfasst von: Manuel Martín-Merino, Ángela Blanco

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 1/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Sammon’s mapping is a powerful non-linear technique that allow us to visualize high dimensional object relationships. It has been applied to a broad range of practical problems and particularly to the visualization of the semantic relations among terms in textual databases. The word maps generated by the Sammon mapping suffer from a low discriminant power due to the well known “curse of dimensionality” and to the unsupervised nature of the algorithm. Fortunately the textual databases provide frequently a manually created classification for a subset of documents that may help to overcome this problem. In this paper we first introduce a modification of the Sammon mapping (SSammon) that enhances the local topology reducing the sensibility to the ’curse of dimensionality’. Next a semi-supervised version is proposed that takes advantage of the a priori categorization of a subset of documents to improve the discriminant power of the word maps generated. The new algorithm has been applied to the challenging problem of word map generation. The experimental results suggest that the new model improves significantly well known unsupervised alternatives.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
Literatur
Zurück zum Zitat Aggarwal, C. C. (2001). Re-designing distance functions and distance-based applications for high dimensional applications. In Proc. of SIGMOD-PODS (Vol. 1, pp. 13–18). Aggarwal, C. C. (2001). Re-designing distance functions and distance-based applications for high dimensional applications. In Proc. of SIGMOD-PODS (Vol. 1, pp. 13–18).
Zurück zum Zitat Aggarwal, C. C., & Yu, P. S. (2002). Redefining clustering for high-dimensional applications. IEEE Transactions on Knowledge and Data Engineering, 14(2), 210–225 (March/April).CrossRef Aggarwal, C. C., & Yu, P. S. (2002). Redefining clustering for high-dimensional applications. IEEE Transactions on Knowledge and Data Engineering, 14(2), 210–225 (March/April).CrossRef
Zurück zum Zitat Aggarwal, C. C., Gates, S. C., & Yu, P. S. (2004). On using partial supervision for text categorization. IEEE Transactions on Knowledge and Data Engineering, 16(2), 245–255.CrossRef Aggarwal, C. C., Gates, S. C., & Yu, P. S. (2004). On using partial supervision for text categorization. IEEE Transactions on Knowledge and Data Engineering, 16(2), 245–255.CrossRef
Zurück zum Zitat Backer, S., Naud, A., & Scheunders, P. (1998). Non-linear dimensionality reduction techniques for unsupervised feature extraction. Pattern Recognition Letters, 19, 711–720.MATHCrossRef Backer, S., Naud, A., & Scheunders, P. (1998). Non-linear dimensionality reduction techniques for unsupervised feature extraction. Pattern Recognition Letters, 19, 711–720.MATHCrossRef
Zurück zum Zitat Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Wokingham, UK: Addison Wesley. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Wokingham, UK: Addison Wesley.
Zurück zum Zitat Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “Nearest Neighbor” meaningful?. In Proc. of the international conference on database theory (ICDT). Lecture notes in computer science (Vol. 1540, pp. 217–235). Jerusalem, Israel: Springer. Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “Nearest Neighbor” meaningful?. In Proc. of the international conference on database theory (ICDT). Lecture notes in computer science (Vol. 1540, pp. 217–235). Jerusalem, Israel: Springer.
Zurück zum Zitat Bezdek, J. C., & Pal, N. R. (1995). An index of topological preservation for feature extraction. Pattern Recognition, 28(3), 381–391.CrossRef Bezdek, J. C., & Pal, N. R. (1995). An index of topological preservation for feature extraction. Pattern Recognition, 28(3), 381–391.CrossRef
Zurück zum Zitat Buja, A., Logan, B., Reeds, F., & Shepp, R. (1994). Inequalities and positive default functions arising from a problem in multidimensional scaling. Annals of Statistics, 22, 406–438.MATHCrossRefMathSciNet Buja, A., Logan, B., Reeds, F., & Shepp, R. (1994). Inequalities and positive default functions arising from a problem in multidimensional scaling. Annals of Statistics, 22, 406–438.MATHCrossRefMathSciNet
Zurück zum Zitat Chapelle, O., Weston, J., & Schölkopf, B. (2003). Cluster kernels for semi-supervised learning. Annual Conference on Neural Information Processing Systems (NIPS), 15. Chapelle, O., Weston, J., & Schölkopf, B. (2003). Cluster kernels for semi-supervised learning. Annual Conference on Neural Information Processing Systems (NIPS), 15.
Zurück zum Zitat Chung, Y. M., & Lee, J. Y. (2001) A corpus-based approach to comparative evaluation of statistical term association measures. Journal of the American Society for Information Science and Technology, 52(4), 283–296.CrossRef Chung, Y. M., & Lee, J. Y. (2001) A corpus-based approach to comparative evaluation of statistical term association measures. Journal of the American Society for Information Science and Technology, 52(4), 283–296.CrossRef
Zurück zum Zitat Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling (2nd ed.). USA: Chapman & Hall/CRC.MATH Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling (2nd ed.). USA: Chapman & Hall/CRC.MATH
Zurück zum Zitat Demartines, P., & Hérault, J. (1996). Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transactions on Neural Networks, 20, 1–6. Demartines, P., & Hérault, J. (1996). Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transactions on Neural Networks, 20, 1–6.
Zurück zum Zitat Joachims, T. (2002). Learning to classify text using support vector machines. Methods, theory and algorithms. Boston: Kluwer. Joachims, T. (2002). Learning to classify text using support vector machines. Methods, theory and algorithms. Boston: Kluwer.
Zurück zum Zitat Kaplan, W. (1999). MAXIMA and MINIMA with applications. New York: Wiley.MATH Kaplan, W. (1999). MAXIMA and MINIMA with applications. New York: Wiley.MATH
Zurück zum Zitat Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data. An introduction to cluster analysis. New York: Wiley. Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data. An introduction to cluster analysis. New York: Wiley.
Zurück zum Zitat Kohonen, T. (1995). Self-organizing maps (2nd ed.). Berlin: Springer Verlag. Kohonen, T. (1995). Self-organizing maps (2nd ed.). Berlin: Springer Verlag.
Zurück zum Zitat Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., et al. (2000). Organization of a massive document collection. IEEE Transactions on Neural Networks, 11(3), 574–585.CrossRef Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., et al. (2000). Organization of a massive document collection. IEEE Transactions on Neural Networks, 11(3), 574–585.CrossRef
Zurück zum Zitat Kothari, R., & Jain, V. (2003). Learning from labeled and unlabeled data using a minimal number of queries. IEEE Transactions on Neural Networks, 14(6), 1496–1505 (November).CrossRef Kothari, R., & Jain, V. (2003). Learning from labeled and unlabeled data using a minimal number of queries. IEEE Transactions on Neural Networks, 14(6), 1496–1505 (November).CrossRef
Zurück zum Zitat Kraaijveld, M., Mao, J., & Jain, A. (1995). A nonlinear projection method based on kohonen’s topology preserving maps. IEEE Transactions on Neural Networks, 6(3), 548–559 (May).CrossRef Kraaijveld, M., Mao, J., & Jain, A. (1995). A nonlinear projection method based on kohonen’s topology preserving maps. IEEE Transactions on Neural Networks, 6(3), 548–559 (May).CrossRef
Zurück zum Zitat Lee, J. A., Lendasse, A., & Verleysen, M. (2004). Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis. Neurocomputing, 37, 49–76.CrossRef Lee, J. A., Lendasse, A., & Verleysen, M. (2004). Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis. Neurocomputing, 37, 49–76.CrossRef
Zurück zum Zitat Mao, J., & Jain, A. K. (1995). Artificial neural networks for feature extraction and multivariate data projection. IEEE Transactions on Neural Networks, 6(2), 296–317 (March).CrossRef Mao, J., & Jain, A. K. (1995). Artificial neural networks for feature extraction and multivariate data projection. IEEE Transactions on Neural Networks, 6(2), 296–317 (March).CrossRef
Zurück zum Zitat Martín-Merino, M., & Muñoz, A. (2001). Self organizing map and Sammon mapping for asymmetric proximities. LNCS (Vol. 2130, pp. 429–435). Springer. Martín-Merino, M., & Muñoz, A. (2001). Self organizing map and Sammon mapping for asymmetric proximities. LNCS (Vol. 2130, pp. 429–435). Springer.
Zurück zum Zitat Martín-Merino, M., & Muñoz, A. (2004a). A new MDS algorithm for textual data analysis. Lecture notes in computer science LNCS-3316 (pp. 860–867). Springer. Martín-Merino, M., & Muñoz, A. (2004a). A new MDS algorithm for textual data analysis. Lecture notes in computer science LNCS-3316 (pp. 860–867). Springer.
Zurück zum Zitat Martín-Merino, M., & Muñoz, A. (2004b). A new Sammon algorithm for sparse data visualization. In International Conference on Pattern Recognition (Vol. 1, pp. 477–481) Cambridge, August. Martín-Merino, M., & Muñoz, A. (2004b). A new Sammon algorithm for sparse data visualization. In International Conference on Pattern Recognition (Vol. 1, pp. 477–481) Cambridge, August.
Zurück zum Zitat Muñoz, A. (1997). Compound key word generation from document databases using a hierarchical clustering ART model. Journal of Intelligent Data Analysis, 1(1), 25–48.CrossRef Muñoz, A. (1997). Compound key word generation from document databases using a hierarchical clustering ART model. Journal of Intelligent Data Analysis, 1(1), 25–48.CrossRef
Zurück zum Zitat Pedrycz, W., & Vukovich, G. (2004). Fuzzy clustering with supervision. Pattern Recognition, 37, 1339–1349.MATHCrossRef Pedrycz, W., & Vukovich, G. (2004). Fuzzy clustering with supervision. Pattern Recognition, 37, 1339–1349.MATHCrossRef
Zurück zum Zitat Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18, 401–409 (May).CrossRef Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18, 401–409 (May).CrossRef
Zurück zum Zitat Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.
Zurück zum Zitat Strehl, A., Ghosh, J., & Mooney, R. (2000). Impact of similarity measures on web-page clustering. In Proceedings of the 17th national conference on artificial intelligence: Workshop of artificial intelligence for Web search (pp. 58–64) Austin, USA (July). Strehl, A., Ghosh, J., & Mooney, R. (2000). Impact of similarity measures on web-page clustering. In Proceedings of the 17th national conference on artificial intelligence: Workshop of artificial intelligence for Web search (pp. 58–64) Austin, USA (July).
Zurück zum Zitat Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.MATH Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.MATH
Zurück zum Zitat Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Proc. of the 14th international conference on machine learning (pp. 412–420). Nashville, Tennessee, USA (July). Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Proc. of the 14th international conference on machine learning (pp. 412–420). Nashville, Tennessee, USA (July).
Metadaten
Titel
A local semi-supervised Sammon algorithm for textual data visualization
verfasst von
Manuel Martín-Merino
Ángela Blanco
Publikationsdatum
01.08.2009
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 1/2009
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-008-0056-5

Weitere Artikel der Ausgabe 1/2009

Journal of Intelligent Information Systems 1/2009 Zur Ausgabe

OriginalPaper

Introduction

Premium Partner