nach oben

Journal of Intelligent Information Systems

Erschienen in:

01.08.2009

A local semi-supervised Sammon algorithm for textual data visualization

verfasst von: Manuel Martín-Merino, Ángela Blanco

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 1/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Sammon’s mapping is a powerful non-linear technique that allow us to visualize high dimensional object relationships. It has been applied to a broad range of practical problems and particularly to the visualization of the semantic relations among terms in textual databases. The word maps generated by the Sammon mapping suffer from a low discriminant power due to the well known “curse of dimensionality” and to the unsupervised nature of the algorithm. Fortunately the textual databases provide frequently a manually created classification for a subset of documents that may help to overcome this problem. In this paper we first introduce a modification of the Sammon mapping (SSammon) that enhances the local topology reducing the sensibility to the ’curse of dimensionality’. Next a semi-supervised version is proposed that takes advantage of the a priori categorization of a subset of documents to improve the discriminant power of the word maps generated. The new algorithm has been applied to the challenging problem of word map generation. The experimental results suggest that the new model improves significantly well known unsupervised alternatives.

Vorheriger Artikel A new approach to discover interlacing data structures in high-dimensional space

Nächster Artikel FARICS: a method of mining spatial association rules and collocations using clustering and Delaunay diagrams

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Available from http://svmlight.joachims.org.

Aggarwal, C. C. (2001). Re-designing distance functions and distance-based applications for high dimensional applications. In Proc. of SIGMOD-PODS (Vol. 1, pp. 13–18).

Aggarwal, C. C., & Yu, P. S. (2002). Redefining clustering for high-dimensional applications. IEEE Transactions on Knowledge and Data Engineering, 14(2), 210–225 (March/April).CrossRef

Aggarwal, C. C., Gates, S. C., & Yu, P. S. (2004). On using partial supervision for text categorization. IEEE Transactions on Knowledge and Data Engineering, 16(2), 245–255.CrossRef

Backer, S., Naud, A., & Scheunders, P. (1998). Non-linear dimensionality reduction techniques for unsupervised feature extraction. Pattern Recognition Letters, 19, 711–720.MATHCrossRef

Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Wokingham, UK: Addison Wesley.

Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “Nearest Neighbor” meaningful?. In Proc. of the international conference on database theory (ICDT). Lecture notes in computer science (Vol. 1540, pp. 217–235). Jerusalem, Israel: Springer.

Bezdek, J. C., & Pal, N. R. (1995). An index of topological preservation for feature extraction. Pattern Recognition, 28(3), 381–391.CrossRef

Buja, A., Logan, B., Reeds, F., & Shepp, R. (1994). Inequalities and positive default functions arising from a problem in multidimensional scaling. Annals of Statistics, 22, 406–438.MATHCrossRefMathSciNet

Chapelle, O., Weston, J., & Schölkopf, B. (2003). Cluster kernels for semi-supervised learning. Annual Conference on Neural Information Processing Systems (NIPS), 15.

Chung, Y. M., & Lee, J. Y. (2001) A corpus-based approach to comparative evaluation of statistical term association measures. Journal of the American Society for Information Science and Technology, 52(4), 283–296.CrossRef

Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling (2nd ed.). USA: Chapman & Hall/CRC.MATH

Demartines, P., & Hérault, J. (1996). Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transactions on Neural Networks, 20, 1–6.

Joachims, T. (2002). Learning to classify text using support vector machines. Methods, theory and algorithms. Boston: Kluwer.

Kaplan, W. (1999). MAXIMA and MINIMA with applications. New York: Wiley.MATH

Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data. An introduction to cluster analysis. New York: Wiley.

Kohonen, T. (1995). Self-organizing maps (2nd ed.). Berlin: Springer Verlag.

Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., et al. (2000). Organization of a massive document collection. IEEE Transactions on Neural Networks, 11(3), 574–585.CrossRef

Kothari, R., & Jain, V. (2003). Learning from labeled and unlabeled data using a minimal number of queries. IEEE Transactions on Neural Networks, 14(6), 1496–1505 (November).CrossRef

Kraaijveld, M., Mao, J., & Jain, A. (1995). A nonlinear projection method based on kohonen’s topology preserving maps. IEEE Transactions on Neural Networks, 6(3), 548–559 (May).CrossRef

Lee, J. A., Lendasse, A., & Verleysen, M. (2004). Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis. Neurocomputing, 37, 49–76.CrossRef

Mao, J., & Jain, A. K. (1995). Artificial neural networks for feature extraction and multivariate data projection. IEEE Transactions on Neural Networks, 6(2), 296–317 (March).CrossRef

Martín-Merino, M., & Muñoz, A. (2001). Self organizing map and Sammon mapping for asymmetric proximities. LNCS (Vol. 2130, pp. 429–435). Springer.

Martín-Merino, M., & Muñoz, A. (2004a). A new MDS algorithm for textual data analysis. Lecture notes in computer science LNCS-3316 (pp. 860–867). Springer.

Martín-Merino, M., & Muñoz, A. (2004b). A new Sammon algorithm for sparse data visualization. In International Conference on Pattern Recognition (Vol. 1, pp. 477–481) Cambridge, August.

Muñoz, A. (1997). Compound key word generation from document databases using a hierarchical clustering ART model. Journal of Intelligent Data Analysis, 1(1), 25–48.CrossRef

Pedrycz, W., & Vukovich, G. (2004). Fuzzy clustering with supervision. Pattern Recognition, 37, 1339–1349.MATHCrossRef

Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18, 401–409 (May).CrossRef

Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.

Strehl, A., Ghosh, J., & Mooney, R. (2000). Impact of similarity measures on web-page clustering. In Proceedings of the 17th national conference on artificial intelligence: Workshop of artificial intelligence for Web search (pp. 58–64) Austin, USA (July).

Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.MATH

Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Proc. of the 14th international conference on machine learning (pp. 412–420). Nashville, Tennessee, USA (July).

Titel: A local semi-supervised Sammon algorithm for textual data visualization
verfasst von: Manuel Martín-Merino
Ángela Blanco
Publikationsdatum: 01.08.2009
Verlag: Springer US
Erschienen in: Journal of Intelligent Information Systems / Ausgabe 1/2009
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI: https://doi.org/10.1007/s10844-008-0056-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2009

Data mining-based materialized view and index selection in data warehouses

FARICS: a method of mining spatial association rules and collocations using clustering and Delaunay diagrams

Introduction

A new approach to discover interlacing data structures in high-dimensional space

Premium Partner