Skip to main content

2018 | OriginalPaper | Buchkapitel

An Overview of Information Discovery Using Latent Semantic Indexing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years there has been a dramatic increase in the size of information collections of importance. At the same time, there has been a growing interest in extracting as much useful information as possible from such collections. These trends place significant demands on modern information retrieval systems. In particular there is a great need for tools that can support discovery of new and useful information. The technique of latent semantic indexing (LSI) has a number of attributes that make it particularly well-adapted to information discovery applications. This paper provides an overview of LSI-based techniques that have been successfully employed in facilitating discovery in practical applications. The techniques range from user aids to state-of-the-art discovery methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
A review of major online sources (Science Direct, Springer Link, Google Scholar, and the digital libraries of the IEEE and ACM) indicates that publications regarding biomedical applications of LSI were negligible prior to 2000, grew linearly between then and 2007, surged in 2008, and have been growing at a rate of 10–15% per year since then.
.
 
2
These figures are from commercial and government applications worked on by the author between 2005 and 2013. These applications primarily involved conceptual retrieval, text clustering, and/or document categorization tasks. Most had at least some focus on new information discovery.
 
3
Subsequent to the timeframe covered by the news articles used in the experiment, the Salafist Group for Preaching and Combat (GSPC), changed its’ name to Al Qaida in the Islamic Maghreb (AQIM).
 
4
For example, Person A – Person B – Organization C - Telephone Number D – Person E – fraud.
 
Literatur
1.
Zurück zum Zitat Sadeh, T.: From search to discovery. In: World Library and Information Congress, Singapore (2013) Sadeh, T.: From search to discovery. In: World Library and Information Congress, Singapore (2013)
2.
Zurück zum Zitat Bellegarda, J.: Latent semantic mapping. IEEE Sig. Process. Mag. 22(5), 70–80 (2005) Bellegarda, J.: Latent semantic mapping. IEEE Sig. Process. Mag. 22(5), 70–80 (2005)
3.
Zurück zum Zitat Bradford, R.: Techniques for processing LSI queries incorporating phrases. In: 6th International Conference, IC3 K. CCIS, Rome, vol. 553, pp. 99–117. Springer (2014) Bradford, R.: Techniques for processing LSI queries incorporating phrases. In: 6th International Conference, IC3 K. CCIS, Rome, vol. 553, pp. 99–117. Springer (2014)
4.
Zurück zum Zitat Furnas, G., et al.: Information retrieval using a singular value decomposition model of latent semantic structure. In: 11th SIGIR, Grenoble, France, pp. 465–480 (1988) Furnas, G., et al.: Information retrieval using a singular value decomposition model of latent semantic structure. In: 11th SIGIR, Grenoble, France, pp. 465–480 (1988)
5.
Zurück zum Zitat Bradford, R.: Comparability of LSI and human judgment in text analysis tasks. In: Applied Computing Conference, Athens, Greece, pp. 359–366 (2009) Bradford, R.: Comparability of LSI and human judgment in text analysis tasks. In: Applied Computing Conference, Athens, Greece, pp. 359–366 (2009)
6.
Zurück zum Zitat Michel, K.: Personal communication, 14 April 2017 Michel, K.: Personal communication, 14 April 2017
7.
Zurück zum Zitat Oard, W., Webber, W.: Information retrieval for e-discovery. Found. Trends Inf. Retrieval 7(2–3), 99–237 (2013)CrossRef Oard, W., Webber, W.: Information retrieval for e-discovery. Found. Trends Inf. Retrieval 7(2–3), 99–237 (2013)CrossRef
8.
Zurück zum Zitat McArthur, R., Bruza, P.: Discovery of implicit and explicit connections between people using email utterance. In: 8th European Conference on CSCW, pp. 21–40 (2003) McArthur, R., Bruza, P.: Discovery of implicit and explicit connections between people using email utterance. In: 8th European Conference on CSCW, pp. 21–40 (2003)
9.
Zurück zum Zitat Skillicorn, D.: Detecting anomalies in graphs. Technical report # 2007-529, Queen’s University, Ontario, Canada (2007) Skillicorn, D.: Detecting anomalies in graphs. Technical report # 2007-529, Queen’s University, Ontario, Canada (2007)
10.
Zurück zum Zitat Fortuna, B., Mladenič, D., Grobelnik, M.: Semi-automatic construction of topic ontologies. In: Semantics, Web and Mining. LNCS, vol. 4289, pp. 121–131. Springer, Heidelberg (2006) Fortuna, B., Mladenič, D., Grobelnik, M.: Semi-automatic construction of topic ontologies. In: Semantics, Web and Mining. LNCS, vol. 4289, pp. 121–131. Springer, Heidelberg (2006)
11.
Zurück zum Zitat Louwerse, M., Zwaan, R.: Language encodes geographical information. Cogn. Sci. 33, 51–73 (2009)CrossRef Louwerse, M., Zwaan, R.: Language encodes geographical information. Cogn. Sci. 33, 51–73 (2009)CrossRef
12.
Zurück zum Zitat Lia, W., Goodchild, M., Raskinc, R.: Towards geospatial semantic search: exploiting latent semantic relations in geospatial data. Int. J. Digital Earth 7(1), 17–37 (2014)CrossRef Lia, W., Goodchild, M., Raskinc, R.: Towards geospatial semantic search: exploiting latent semantic relations in geospatial data. Int. J. Digital Earth 7(1), 17–37 (2014)CrossRef
13.
Zurück zum Zitat Fu, K., Cagan, J., Kotovsky, K.: A methodology for discovering structure in design data-bases. In: International Conference on Engineering Design, Denmark, vol. 6 (2011) Fu, K., Cagan, J., Kotovsky, K.: A methodology for discovering structure in design data-bases. In: International Conference on Engineering Design, Denmark, vol. 6 (2011)
14.
Zurück zum Zitat Vockner, B., Richter, A., Mittlböck, M.: From geoportals to geographic knowledge portals. Int. J. Geo-Inf. 2(2), 256–275 (2013)CrossRef Vockner, B., Richter, A., Mittlböck, M.: From geoportals to geographic knowledge portals. Int. J. Geo-Inf. 2(2), 256–275 (2013)CrossRef
15.
Zurück zum Zitat de Boer, R., Vliet, H.: Architectural knowledge discovery with latent semantic analysis: constructing a reading guide for software product audits. J. Syst. Softw. 81(9), 1456–1469 (2008)CrossRef de Boer, R., Vliet, H.: Architectural knowledge discovery with latent semantic analysis: constructing a reading guide for software product audits. J. Syst. Softw. 81(9), 1456–1469 (2008)CrossRef
16.
Zurück zum Zitat Kesorn, K.: Multi-modal multi-semantic image retrieval, Ph.D. thesis, University of London (2010) Kesorn, K.: Multi-modal multi-semantic image retrieval, Ph.D. thesis, University of London (2010)
17.
Zurück zum Zitat Chen, X., et al.: A latent semantic indexing based method for solving multiple instance learning problem in region-based image retrieval. In: 7th IEEE ISM, Taiwan (2005) Chen, X., et al.: A latent semantic indexing based method for solving multiple instance learning problem in region-based image retrieval. In: 7th IEEE ISM, Taiwan (2005)
18.
Zurück zum Zitat Jassez, J.-L., et al.: Signature based intrusion detection using latent semantic analysis. In: IJCNN, Hong Kong, pp. 1068–1074 (2008) Jassez, J.-L., et al.: Signature based intrusion detection using latent semantic analysis. In: IJCNN, Hong Kong, pp. 1068–1074 (2008)
19.
Zurück zum Zitat Pramanick, S., Rajagopalan, S., van den Berg, E.: Mitigating the insider threat with high-dimensional anomaly detection, AFRL-IF-RS-TR-2004-338, Final report (2004) Pramanick, S., Rajagopalan, S., van den Berg, E.: Mitigating the insider threat with high-dimensional anomaly detection, AFRL-IF-RS-TR-2004-338, Final report (2004)
20.
Zurück zum Zitat Zhu, W., Chen, C.: Storylines: visual exploration and analysis in latent semantic spaces. Comput. Graph. 31(3), 338–349 (2007)CrossRef Zhu, W., Chen, C.: Storylines: visual exploration and analysis in latent semantic spaces. Comput. Graph. 31(3), 338–349 (2007)CrossRef
21.
Zurück zum Zitat Freitas, A., Curry, E., Handschuh, S.: Towards a distributional semantic web stack. In: 10th International Workshop on Uncertainty Reasoning for the Semantic Web, pp. 49–52 (2014) Freitas, A., Curry, E., Handschuh, S.: Towards a distributional semantic web stack. In: 10th International Workshop on Uncertainty Reasoning for the Semantic Web, pp. 49–52 (2014)
22.
Zurück zum Zitat Ma, J., Zhang, Y., He, J.: Web services discovery based on latent semantic approach. In: IEEE International Conference on Web Services, Beijing, pp. 740–747 (2008) Ma, J., Zhang, Y., He, J.: Web services discovery based on latent semantic approach. In: IEEE International Conference on Web Services, Beijing, pp. 740–747 (2008)
23.
Zurück zum Zitat Shahriar, H., Haddad, H.: Object injection vulnerability discovery based on latent semantic indexing. In: 31st Annual ACM SAC, Pisa, Italy, pp. 801–807 (2016) Shahriar, H., Haddad, H.: Object injection vulnerability discovery based on latent semantic indexing. In: 31st Annual ACM SAC, Pisa, Italy, pp. 801–807 (2016)
24.
Zurück zum Zitat Bhatia, L., Cao, K.: Intelligent polar infrastructure: enabling semantic search in geospatial metadata catalogue to support polar data discovery. Earth Sci. Inform. 8(1), 111–123 (2015)CrossRef Bhatia, L., Cao, K.: Intelligent polar infrastructure: enabling semantic search in geospatial metadata catalogue to support polar data discovery. Earth Sci. Inform. 8(1), 111–123 (2015)CrossRef
25.
Zurück zum Zitat Hashimoto, T., Kuboyama, T., Chakraborty, B.: Temporal awareness of changes in afflicted people’s needs after the East Japan Great Earthquake. In: IEEE TENCON, pp. 1–6 (2013) Hashimoto, T., Kuboyama, T., Chakraborty, B.: Temporal awareness of changes in afflicted people’s needs after the East Japan Great Earthquake. In: IEEE TENCON, pp. 1–6 (2013)
26.
Zurück zum Zitat Speer, R., Havasi, C., Liebermen, H.: Analogy space: reducing the dimensionality of common sense knowledge. In: 23rd National Conference on Artificial Intelligence, pp. 548–553 (2008) Speer, R., Havasi, C., Liebermen, H.: Analogy space: reducing the dimensionality of common sense knowledge. In: 23rd National Conference on Artificial Intelligence, pp. 548–553 (2008)
27.
Zurück zum Zitat Keila, P., Skillicorn, D.: Detecting unusual and deceptive communication in email. Technical Report # 2005-498, Queen’s University, Ontario, Canada (2005) Keila, P., Skillicorn, D.: Detecting unusual and deceptive communication in email. Technical Report # 2005-498, Queen’s University, Ontario, Canada (2005)
28.
Zurück zum Zitat Rossi, R.: Latent semantic analysis of the languages of life. In: 4th ISICA. CCIS, Huangshi, China. Springer, vol. 51, pp. 128–137 (2009) Rossi, R.: Latent semantic analysis of the languages of life. In: 4th ISICA. CCIS, Huangshi, China. Springer, vol. 51, pp. 128–137 (2009)
29.
Zurück zum Zitat Homayouni, R.: Gene clustering by latent semantic indexing of medline abstracts. Bioinformatics 21(1), 104–115 (2005)CrossRef Homayouni, R.: Gene clustering by latent semantic indexing of medline abstracts. Bioinformatics 21(1), 104–115 (2005)CrossRef
30.
Zurück zum Zitat Gong, L., Yang, R., Yan, Q., Sun, X.: Prioritization of disease susceptibility genes using LSM/SVD. IEEE Trans. Biomed. Eng. 60(12), 3410–3417 (2013)CrossRef Gong, L., Yang, R., Yan, Q., Sun, X.: Prioritization of disease susceptibility genes using LSM/SVD. IEEE Trans. Biomed. Eng. 60(12), 3410–3417 (2013)CrossRef
31.
Zurück zum Zitat Kim, H., Park, H.: Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations using a priori knowledge of gene relationships. In: 1st International Workshop on Text Mining in Bioinformatics, Virginia, pp. 60–67 (2006) Kim, H., Park, H.: Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations using a priori knowledge of gene relationships. In: 1st International Workshop on Text Mining in Bioinformatics, Virginia, pp. 60–67 (2006)
32.
Zurück zum Zitat Fukushima, A.: SVD-based anatomy of gene expressions for correlation analysis in Arabi-dopsis thaliana. DNA Res. 15(6), 367–374 (2008)CrossRef Fukushima, A.: SVD-based anatomy of gene expressions for correlation analysis in Arabi-dopsis thaliana. DNA Res. 15(6), 367–374 (2008)CrossRef
33.
Zurück zum Zitat Vanteru, B., Shaik, J., Teasin, M.: Semantically linking and browsing PubMed abstracts with gene ontology. BMC Genom. 9(Suppl 1), S10 (2008). BIOCOMP 2007CrossRef Vanteru, B., Shaik, J., Teasin, M.: Semantically linking and browsing PubMed abstracts with gene ontology. BMC Genom. 9(Suppl 1), S10 (2008). BIOCOMP 2007CrossRef
34.
Zurück zum Zitat Roy, S., et al.: Latent semantic indexing of PubMed abstracts for identification of transcription factor candidates from microarray derived gene sets. BMC Bioinform. 12(Suppl 10), S19 (2011)CrossRef Roy, S., et al.: Latent semantic indexing of PubMed abstracts for identification of transcription factor candidates from microarray derived gene sets. BMC Bioinform. 12(Suppl 10), S19 (2011)CrossRef
35.
Zurück zum Zitat Xu, L., et al.: Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts. PLoS ONE 6(4), e18851 (2011)CrossRef Xu, L., et al.: Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts. PLoS ONE 6(4), e18851 (2011)CrossRef
36.
Zurück zum Zitat Wei, L., et al.: Inferring gene regulatory mechanisms from microarray data using latent semantic indexing of MEDLINE abstracts: the role of Rel in Type-I interferon signaling. FASEB J. 20, A929 (2006) Wei, L., et al.: Inferring gene regulatory mechanisms from microarray data using latent semantic indexing of MEDLINE abstracts: the role of Rel in Type-I interferon signaling. FASEB J. 20, A929 (2006)
37.
Zurück zum Zitat Doong, S., Hong, S-F.: Protein-protein interaction document mining. Advances in Intelligent Systems Research (2006) Doong, S., Hong, S-F.: Protein-protein interaction document mining. Advances in Intelligent Systems Research (2006)
38.
Zurück zum Zitat Dos Santos, E., et al.: A semantic-based similarity measure for human druggable target proteins. In: The Fifth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO2013), Lisbon, Portugal, March 24–29 (2013) Dos Santos, E., et al.: A semantic-based similarity measure for human druggable target proteins. In: The Fifth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO2013), Lisbon, Portugal, March 24–29 (2013)
39.
Zurück zum Zitat Bradford, R.: Efficient discovery of new information in large text databases. In: Intelligence and Security Informatics. LNCS, vol. 3495, pp. 374–380. Springer (2005) Bradford, R.: Efficient discovery of new information in large text databases. In: Intelligence and Security Informatics. LNCS, vol. 3495, pp. 374–380. Springer (2005)
40.
Zurück zum Zitat Bradford, R.: Use of latent semantic indexing to identify name variants in large data collections. In: IEEE Intelligence and Security Informatics, pp. 27–32 (2013) Bradford, R.: Use of latent semantic indexing to identify name variants in large data collections. In: IEEE Intelligence and Security Informatics, pp. 27–32 (2013)
41.
Zurück zum Zitat Bradford, R.: Relationship discovery in large text collections using latent semantic indexing. In: SIAM Data Mining Conference, Workshop on Link Analysis, Counterterrorism and Security, Bethesda, Maryland (2006) Bradford, R.: Relationship discovery in large text collections using latent semantic indexing. In: SIAM Data Mining Conference, Workshop on Link Analysis, Counterterrorism and Security, Bethesda, Maryland (2006)
42.
Zurück zum Zitat Kontostathis, A., Pottenger, W.: Mathematical view of latent semantic indexing: tracing term co-occurrences. Technical Report LU-CSE-02-006, Department of Computer Science and Engineering, Lehigh University (2002) Kontostathis, A., Pottenger, W.: Mathematical view of latent semantic indexing: tracing term co-occurrences. Technical Report LU-CSE-02-006, Department of Computer Science and Engineering, Lehigh University (2002)
Metadaten
Titel
An Overview of Information Discovery Using Latent Semantic Indexing
verfasst von
Roger Bradford
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-61911-8_14

Premium Partner