Skip to main content
Erschienen in:
Buchtitelbild

2015 | OriginalPaper | Buchkapitel

Document Analysis and Retrieval Tasks in Scientific Digital Libraries

verfasst von : Sujatha Das Gollapalli, Cornelia Caragea, Xiaoli Li, C. Lee Giles

Erschienen in: Information Retrieval

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Machine Learning (ML) algorithms have opened up new possibilities for the acquisition and processing of documents in Information Retrieval (IR) systems. Indeed, it is now possible to automate several labor-intensive tasks related to documents such as categorization and entity extraction. Consequently, the application of machine learning techniques for various large-scale IR tasks has gathered significant research interest in both the ML and IR communities. This tutorial provides a reference summary of our research in applying machine learning techniques to diverse tasks in Digital Libraries (DL). Digital library portals are specialized IR systems that work on collections of documents related to particular domains. We focus on open-access, scientific digital libraries such as CiteSeer\(^x\), which involve several crawling, ranking, content analysis, and metadata extraction tasks. We elaborate on the challenges involved in these tasks and highlight how machine learning methods can successfully address these challenges.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Hood, W.W., Wilson, C.S.: The literature of bibliometrics, scientometrics, and informetrics. Scientometrics 52(2), 291–314 (2001)CrossRef Hood, W.W., Wilson, C.S.: The literature of bibliometrics, scientometrics, and informetrics. Scientometrics 52(2), 291–314 (2001)CrossRef
2.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
3.
Zurück zum Zitat Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. In: IJCNLP (2013) Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. In: IJCNLP (2013)
4.
Zurück zum Zitat Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)CrossRef Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)CrossRef
5.
Zurück zum Zitat Caragea, C., Wu, J., Williams, K., Gollapalli, S.D., Khabsa, M., Teregowda, P., Giles, C.L.: Automatic identification of research articles from crawled documents. In: Web-Scale Classification: Classifying Big Data from the Web, Co-Located with WSDM (2014) Caragea, C., Wu, J., Williams, K., Gollapalli, S.D., Khabsa, M., Teregowda, P., Giles, C.L.: Automatic identification of research articles from crawled documents. In: Web-Scale Classification: Classifying Big Data from the Web, Co-Located with WSDM (2014)
6.
Zurück zum Zitat Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauffman, Burlington (2002) Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauffman, Burlington (2002)
7.
Zurück zum Zitat Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific web resource discovery. Comput. Netw. 31(11–16), 1623–1640 (1999)CrossRef Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific web resource discovery. Comput. Netw. 31(11–16), 1623–1640 (1999)CrossRef
8.
Zurück zum Zitat Chen, B., Zhu, L., Kifer, D., Lee, D.: What is an opinion about? exploring political standpoints using opinion scoring model. In: AAAI (2010) Chen, B., Zhu, L., Kifer, D., Lee, D.: What is an opinion about? exploring political standpoints using opinion scoring model. In: AAAI (2010)
9.
Zurück zum Zitat Councill, I.G., Giles, C.L., Kan, M.-Y.: Parscit: an open-source crf reference string parsing package. In: LREC (2008) Councill, I.G., Giles, C.L., Kan, M.-Y.: Parscit: an open-source crf reference string parsing package. In: LREC (2008)
10.
Zurück zum Zitat Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)CrossRef Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)CrossRef
11.
Zurück zum Zitat Deng, H., King, I., Lyu, M.R.: Formal models for expert finding on dblp bibliography data. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 163–172. IEEE Computer Society, Washington, DC, USA (2008) Deng, H., King, I., Lyu, M.R.: Formal models for expert finding on dblp bibliography data. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 163–172. IEEE Computer Society, Washington, DC, USA (2008)
12.
Zurück zum Zitat Druck, G., Mann, G., McCallum, A.: Learning from labeled features using generalized expectation criteria. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 595–602. ACM, New York (2008) Druck, G., Mann, G., McCallum, A.: Learning from labeled features using generalized expectation criteria. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 595–602. ACM, New York (2008)
13.
Zurück zum Zitat Firdhous, M.: Automating legal research through data mining. CoRR, abs/1211.1861 (2012) Firdhous, M.: Automating legal research through data mining. CoRR, abs/1211.1861 (2012)
14.
Zurück zum Zitat Frank, E., Paynter, G.W., Witten, I.H., Gutwin, G., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: IJCAI (1999) Frank, E., Paynter, G.W., Witten, I.H., Gutwin, G., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: IJCAI (1999)
15.
Zurück zum Zitat Ganchev, K., Graça, J., Gillenwater, J., Taskar, B.: Posterior regularization for structured latent variable models. J. Mach. Learn. Res. 11, 2001–2049 (2010)MathSciNetMATH Ganchev, K., Graça, J., Gillenwater, J., Taskar, B.: Posterior regularization for structured latent variable models. J. Mach. Learn. Res. 11, 2001–2049 (2010)MathSciNetMATH
16.
Zurück zum Zitat Gollapalli, S.D., Caragea, C.: Extracting keyphrases from research papers using citation networks. In: AAAI, pp. 1629–1635 (2014) Gollapalli, S.D., Caragea, C.: Extracting keyphrases from research papers using citation networks. In: AAAI, pp. 1629–1635 (2014)
17.
Zurück zum Zitat Gollapalli, S.D., Caragea, C., Mitra, P., Giles, C.L.: Researcher homepage classification using unlabeled data. In: Proceedings of the 22nd International Conference on World Wide Web, WWW 2013, pp. 471–482. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2013) Gollapalli, S.D., Caragea, C., Mitra, P., Giles, C.L.: Researcher homepage classification using unlabeled data. In: Proceedings of the 22nd International Conference on World Wide Web, WWW 2013, pp. 471–482. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2013)
18.
Zurück zum Zitat Gollapalli, S.D., Giles, C.L., Mitra, P., Caragea, C.: On identifying academic homepages for digital libraries. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, JCDL 2011, pp. 123–132. ACM, New York (2011) Gollapalli, S.D., Giles, C.L., Mitra, P., Caragea, C.: On identifying academic homepages for digital libraries. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, JCDL 2011, pp. 123–132. ACM, New York (2011)
19.
Zurück zum Zitat Gollapalli, S.D., Mitra, P., Giles, C.L.: Learning to rank homepages for researcher-name queries. In: SIGIR Workshop on Entity Oriented Search (2011) Gollapalli, S.D., Mitra, P., Giles, C.L.: Learning to rank homepages for researcher-name queries. In: SIGIR Workshop on Entity Oriented Search (2011)
20.
Zurück zum Zitat Gollapalli, S.D., Mitra, P., Giles, C.L.: Ranking experts using author-document-topic graphs. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital libraries, JCDL 2013, pp. 87–96, ACM, New York (2011) Gollapalli, S.D., Mitra, P., Giles, C.L.: Ranking experts using author-document-topic graphs. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital libraries, JCDL 2013, pp. 87–96, ACM, New York (2011)
21.
Zurück zum Zitat Gollapalli, S.D., Qi, Y., Mitra, P., Giles, C.L.: Extracting researcher metadata with labeled features. In: SDM, pp. 740–748 (2014) Gollapalli, S.D., Qi, Y., Mitra, P., Giles, C.L.: Extracting researcher metadata with labeled features. In: SDM, pp. 740–748 (2014)
22.
Zurück zum Zitat Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. U.S.A. 101(Suppl 1), 5228–5235 (2004)CrossRef Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. U.S.A. 101(Suppl 1), 5228–5235 (2004)CrossRef
23.
Zurück zum Zitat Hammouda, K.M., Matute, D.N., Kamel, M.S.: Corephrase: keyphrase extraction for document clustering. In: Machine Learning and Data Mining in Pattern Recognition (2005) Hammouda, K.M., Matute, D.N., Kamel, M.S.: Corephrase: keyphrase extraction for document clustering. In: Machine Learning and Data Mining in Pattern Recognition (2005)
24.
Zurück zum Zitat Han, H., Giles, C.L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines. In: Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital libraries, JCDL 2003, pp. 37–48. IEEE Computer Society, Washington, DC, USA (2003) Han, H., Giles, C.L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines. In: Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital libraries, JCDL 2003, pp. 37–48. IEEE Computer Society, Washington, DC, USA (2003)
25.
Zurück zum Zitat Han, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., Burlington (2005) Han, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., Burlington (2005)
26.
Zurück zum Zitat Haveliwala, T., Kamvar, S., Klein, D., Manning, C., Golub, G.: Computing pagerank using power extrapolation. Number 2003–45. Stanford (2003) Haveliwala, T., Kamvar, S., Klein, D., Manning, C., Golub, G.: Computing pagerank using power extrapolation. Number 2003–45. Stanford (2003)
27.
Zurück zum Zitat He, Q., Chen, B., Pei, J., Qiu, B., Mitra, P., Giles, C.L.: Detecting topic evolution in scientific literature: how can citations help? In: CIKM, pp. 957–966 (2009) He, Q., Chen, B., Pei, J., Qiu, B., Mitra, P., Giles, C.L.: Detecting topic evolution in scientific literature: how can citations help? In: CIKM, pp. 957–966 (2009)
28.
Zurück zum Zitat Heinrich, G.: Parameter estimation for text analysis. Technical report (2008) Heinrich, G.: Parameter estimation for text analysis. Technical report (2008)
29.
Zurück zum Zitat Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, pp. 50–57. ACM, New York (1999) Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, pp. 50–57. ACM, New York (1999)
30.
Zurück zum Zitat Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: EMNLP, pp. 216–223 (2003) Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: EMNLP, pp. 216–223 (2003)
31.
Zurück zum Zitat Jakulin, A., Buntine, W., La Pira, T., Brasher, H.: Analyzing the U.S. senate in 2003: similarities, clusters, and blocs. Polit. Anal. 17(3), 10 (2009)CrossRef Jakulin, A., Buntine, W., La Pira, T., Brasher, H.: Analyzing the U.S. senate in 2003: similarities, clusters, and blocs. Polit. Anal. 17(3), 10 (2009)CrossRef
32.
Zurück zum Zitat Jones, S., Staveley, M.S.: Phrasier: a system for interactive document retrieval using keyphrases. In: SIGIR (1999) Jones, S., Staveley, M.S.: Phrasier: a system for interactive document retrieval using keyphrases. In: SIGIR (1999)
33.
Zurück zum Zitat Kataria, S., Kumar, K.S., Rastogi, R., Sen, P., Sengamedu, S.H.: Entity disambiguation with hierarchical topic models. In: KDD, pp. 1037–1045 (2011) Kataria, S., Kumar, K.S., Rastogi, R., Sen, P., Sengamedu, S.H.: Entity disambiguation with hierarchical topic models. In: KDD, pp. 1037–1045 (2011)
34.
Zurück zum Zitat Kataria, S., Mitra, P., Bhatia, S.: Utilizing context in generative bayesian models for linked corpus. In: AAAI (2010) Kataria, S., Mitra, P., Bhatia, S.: Utilizing context in generative bayesian models for linked corpus. In: AAAI (2010)
35.
Zurück zum Zitat Kataria, S., Mitra, P., Caragea, C., Giles, C.L.: Context sensitive topic models for author influence in document networks. In: IJCAI, pp. 2274–2280 (2011) Kataria, S., Mitra, P., Caragea, C., Giles, C.L.: Context sensitive topic models for author influence in document networks. In: IJCAI, pp. 2274–2280 (2011)
36.
Zurück zum Zitat Kim, S.N., Kan, M.-Y.: Re-examining automatic keyphrase extraction approaches in scientific articles. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, MWE 2009 (2009) Kim, S.N., Kan, M.-Y.: Re-examining automatic keyphrase extraction approaches in scientific articles. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, MWE 2009 (2009)
37.
Zurück zum Zitat Kim, S.N., Medelyan, O., Kan, M.-Y., Baldwin, T.: Automatic keyphrase extraction from scientific articles. Lang. Resour. Eval. 47(3), 723–742 (2013)CrossRef Kim, S.N., Medelyan, O., Kan, M.-Y., Baldwin, T.: Automatic keyphrase extraction from scientific articles. Lang. Resour. Eval. 47(3), 723–742 (2013)CrossRef
38.
Zurück zum Zitat Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289, Morgan Kaufmann Publishers Inc., San Francisco (2001) Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289, Morgan Kaufmann Publishers Inc., San Francisco (2001)
39.
Zurück zum Zitat Li, H., Councill, I.G., Bolelli, L., Zhou, D., Song, Y., Lee, W.-C., Sivasubramaniam, A., Giles, C.L.: Citeseerx: a scalable autonomous scientific digital library. In: Proceedings of the 1st International Conference on Scalable Information Systems, InfoScale 2006. ACM, New York (2006) Li, H., Councill, I.G., Bolelli, L., Zhou, D., Song, Y., Lee, W.-C., Sivasubramaniam, A., Giles, C.L.: Citeseerx: a scalable autonomous scientific digital library. In: Proceedings of the 1st International Conference on Scalable Information Systems, InfoScale 2006. ACM, New York (2006)
40.
Zurück zum Zitat Li, X., Ng, S.-K., Wang, J.T.L.: Biological Data Mining and Its Applications in Healthcare, 1st edn. World Scientific Publishing Co., Inc., Singapore (2013) Li, X., Ng, S.-K., Wang, J.T.L.: Biological Data Mining and Its Applications in Healthcare, 1st edn. World Scientific Publishing Co., Inc., Singapore (2013)
41.
Zurück zum Zitat Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications). Springer-Verlag New York Inc., New York (2006) Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications). Springer-Verlag New York Inc., New York (2006)
42.
Zurück zum Zitat Liu, F., Pennell, D., Liu, F., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of NAACL 2009, pp. 620–628 (2009) Liu, F., Pennell, D., Liu, F., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of NAACL 2009, pp. 620–628 (2009)
43.
Zurück zum Zitat Liu, X., Croft, W.B.: Statistical language modeling for information retrieval. ARIST 39(1), 1–31 (2005) Liu, X., Croft, W.B.: Statistical language modeling for information retrieval. ARIST 39(1), 1–31 (2005)
44.
Zurück zum Zitat Mann, G.S., McCallum, A.: Generalized expectation criteria for semi-supervised learning with weakly labeled data. J. Mach. Learn. Res. 11, 955–984 (2010)MathSciNetMATH Mann, G.S., McCallum, A.: Generalized expectation criteria for semi-supervised learning with weakly labeled data. J. Mach. Learn. Res. 11, 955–984 (2010)MathSciNetMATH
45.
Zurück zum Zitat Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATH Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATH
46.
Zurück zum Zitat Marujo, L., Ribeiro, R., de Matos, D.M., Neto, J.P., Gershman, A., Carbonell, J.G.: Key phrase extraction of lightly filtered broadcast news. CoRR (2013) Marujo, L., Ribeiro, R., de Matos, D.M., Neto, J.P., Gershman, A., Carbonell, J.G.: Key phrase extraction of lightly filtered broadcast news. CoRR (2013)
47.
Zurück zum Zitat Nguyen, T.D., Kan, M.-Y.: Keyphrase extraction in scientific publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007) Nguyen, T.D., Kan, M.-Y.: Keyphrase extraction in scientific publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007)
48.
Zurück zum Zitat Ortega-Priego, J.-L., Aguillo, I.F., Prieto-Valverde, J.A.: Longitudinal study of contents and elements in the scientific web environment. J. Inf. Sci. 32(4), 344–351 (2006)CrossRef Ortega-Priego, J.-L., Aguillo, I.F., Prieto-Valverde, J.A.: Longitudinal study of contents and elements in the scientific web environment. J. Inf. Sci. 32(4), 344–351 (2006)CrossRef
49.
Zurück zum Zitat Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report (1999) Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report (1999)
50.
Zurück zum Zitat Pudota, N., Dattolo, A., Baruzzo, A., Ferrara, F., Tasso, C.: Automatic keyphrase extraction and ontology mining for content-based tag recommendation. Int. J. Intell. Syst. 25(12), 1158–1186 (2010)CrossRefMATH Pudota, N., Dattolo, A., Baruzzo, A., Ferrara, F., Tasso, C.: Automatic keyphrase extraction and ontology mining for content-based tag recommendation. Int. J. Intell. Syst. 25(12), 1158–1186 (2010)CrossRefMATH
51.
Zurück zum Zitat Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc., New York (1986)MATH Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc., New York (1986)MATH
52.
53.
Zurück zum Zitat Tang, J., Jin, R., Zhang, J.: A topic modeling approach and its integration into the random walk framework for academic search. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 1055–1060. IEEE Computer Society, Washington, DC, USA (2008) Tang, J., Jin, R., Zhang, J.: A topic modeling approach and its integration into the random walk framework for academic search. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 1055–1060. IEEE Computer Society, Washington, DC, USA (2008)
54.
Zurück zum Zitat Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery nd Data Mining, KDD 2008, pp. 990–998. ACM, New York (2008) Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery nd Data Mining, KDD 2008, pp. 990–998. ACM, New York (2008)
55.
Zurück zum Zitat Teregowda, P.B., Councill, I.G., Fernández, R.J.P., Khabsa, M., Zheng, S., Giles, C.L.: Seersuite: developing a scalable and reliable application framework for building digital libraries by crawling the web. In: Proceedings of the 2010 USENIX Conference on Web Application Development WebApps 2010 (2010) Teregowda, P.B., Councill, I.G., Fernández, R.J.P., Khabsa, M., Zheng, S., Giles, C.L.: Seersuite: developing a scalable and reliable application framework for building digital libraries by crawling the web. In: Proceedings of the 2010 USENIX Conference on Web Application Development WebApps 2010 (2010)
56.
Zurück zum Zitat Tuarob, S., Pouchard, L.C., Giles, C.L.: Automatic tag recommendation for metadata annotation using probabilistic topic modeling. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013, pp. 239–248. ACM (2013) Tuarob, S., Pouchard, L.C., Giles, C.L.: Automatic tag recommendation for metadata annotation using probabilistic topic modeling. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013, pp. 239–248. ACM (2013)
57.
Zurück zum Zitat Wu, J., Williams, K., Chen, H.-H., Khabsa, M., Caragea, C., Ororbia, A., Jordan, D., Giles, C.L.: Citeseerx: Ai in a digital library search engine. In: IAAI (2014) Wu, J., Williams, K., Chen, H.-H., Khabsa, M., Caragea, C., Ororbia, A., Jordan, D., Giles, C.L.: Citeseerx: Ai in a digital library search engine. In: IAAI (2014)
58.
Zurück zum Zitat Zha, H.: Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: SIGIR (2002) Zha, H.: Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: SIGIR (2002)
59.
Zurück zum Zitat Zheng, S., Zhou, D., Li, J., Giles, C.L.: Extracting author meta-data from web using visual features. In: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, ICDMW 2007, pp. 33–40. IEEE Computer Society, Washington, DC, USA (2007) Zheng, S., Zhou, D., Li, J., Giles, C.L.: Extracting author meta-data from web using visual features. In: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, ICDMW 2007, pp. 33–40. IEEE Computer Society, Washington, DC, USA (2007)
Metadaten
Titel
Document Analysis and Retrieval Tasks in Scientific Digital Libraries
verfasst von
Sujatha Das Gollapalli
Cornelia Caragea
Xiaoli Li
C. Lee Giles
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-25485-2_1

Neuer Inhalt