ABSTRACT
Automatic recommendation of citations for a manuscript is highly valuable for scholarly activities since it can substantially improve the efficiency and quality of literature search. The prior techniques placed a considerable burden on users, who were required to provide a representative bibliography or to mark passages where citations are needed. In this paper we present a system that considerably reduces this burden: a user simply inputs a query manuscript (without a bibliography) and our system automatically finds locations where citations are needed. We show that naïve approaches do not work well due to massive noise in the document corpus. We produce a successful approach by carefully examining the relevance between segments in a query manuscript and the representative segments extracted from a document corpus. An extensive empirical evaluation using the CiteSeerX data set shows that our approach is effective.
- S. Aya, C. Lagoze, and T. Joachims. Citation classification and its applications. ICKM, 2005.Google ScholarCross Ref
- C. Basu, H. Hirsh, W. Cohen, and C. Nevill-Manning. Technical paper recommendation: A study in combining multiple information sources. J. of Artificial Intelligence Research, 2001. Google ScholarDigital Library
- D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. J. Machine Learning Research, 2003. Google ScholarDigital Library
- A. Broder, M. Fontoura, V. Josifovski, and L. Riedel. A semantic approach to contextual advertising. SIGIR, 2007. Google ScholarDigital Library
- K. Chandrasekaran, S. Gauch, P. Lakkaraju, and H. Luong. Concept-Based Document Recommendations for CiteSeer Authors. Adaptive Hypermedia and Adaptive Web-Based Systems, Springer, 2008. Google ScholarDigital Library
- D. Cohn and T. Hofmann. The missing link -- a probabilistic model of document content and hypertext connectivity. NIPS, 2001.Google Scholar
- R. Durrett. Probability: Theory and Examples. Duxbury Press, 2nd edition, 1995.Google Scholar
- E. Erosheva, S. Fienberg, and J. Lafferty. Mixed membership models of scientific publications. PNAS, 2004.Google ScholarCross Ref
- Q. He, J. Pei, D. Kifer, P. Mitra, and L. Giles. Context-aware citation recommendation. WWW, 2010. Google ScholarDigital Library
- S. Huang, G. Xue, B. Zhang, Z. Chen, Y. Yu, and W. Ma. Tssp: A reinforcement algorithm to find related papers. WI, 2004. Google ScholarDigital Library
- S. Kataria, P. Mitra, and S. Bhatia. Utilizing context in generative bayesian models for linked corpus. AAAI, 2010.Google Scholar
- S. M. Katz. Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1987.Google Scholar
- J. Kleinberg. Bursty and hierarchical structure in streams. SIGKDD, 2002. Google ScholarDigital Library
- D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. CIKM, 2003. Google ScholarDigital Library
- S. McNee, I. Albert, D. Cosley, P. Gopalkrishnan, S. Lam, A. Rashid, J. Konstan, and J. Riedl. On the recommending of citations for research papers. CSCW, 2002. Google ScholarDigital Library
- R. Nallapati, A. Ahmed, E. Xing, and W. Cohen. Joint latent topic models for text and citations. SIGKDD, 2008. Google ScholarDigital Library
- J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. Research and Development in Information Retrieval, 1998. Google ScholarDigital Library
- J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
- B. Ribeiro-Neto, M. Cristo, P. B. Golgher, and E. S. de Moura. Impedance coupling in contenttargeted advertising. SIGIR, 2005. Google ScholarDigital Library
- C. Rijsbergen. The Geometry of Information Retrieval. Cambridge University Press, 2004. Google ScholarDigital Library
- A. Ritchie. Citation context analysis for information retrieval. PhD thesis, University of Cambridge, 2008.Google Scholar
- B. Shaparenko and T. Joachims. Identifying the original contribution of a document via language modeling. ECML, 2009. Google ScholarDigital Library
- D. Simon. Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches. Wiley-Interscience, 2006. Google ScholarDigital Library
- T. Strohman, B. Croft, and D. Jensen. Recommending citations for academic papers. SIGIR, 2007. Google ScholarDigital Library
- J. Tang and J. Zhang. A discriminative approach to topic-based citation recommendations. PAKDD, 2009. Google ScholarDigital Library
- R. Torres, S. McNee, M. Abel, J. Konstan, and J. Riedl. Enhancing digitial libraries with techlens. JCDL, 2004. Google ScholarDigital Library
- V. von Brzeski, U. Irmak, and R. Kraft. Leveraging context in user-centric entity detection systems. CIKM, 2007. Google ScholarDigital Library
- F. Wang, B. Chen, and Z. Miao. A survey on reviewer assignment problem. IEA/AIE, 2008. Google ScholarDigital Library
- W. Yih, J. Goodman, and V. R. Carvalho. Finding advertising keywords on web pages. WWW, 2006. Google ScholarDigital Library
- Y. Zhao and G. Karypis. Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 2005. Google ScholarDigital Library
- D. Zhou, S. Zhu, K. Yu, X. Song, B. Tseng, H. Zha, and L. Giles. Learning multiple graphs for document recommendations. WWW, 2008. Google ScholarDigital Library
Index Terms
- Citation recommendation without author supervision
Recommendations
Context-aware citation recommendation
WWW '10: Proceedings of the 19th international conference on World wide webWhen you write papers, how many times do you want to make some citations at a place but you are not sure which papers to cite? Do you wish to have a recommendation system which can recommend a small number of good candidates for every place that you ...
How does author affiliation affect preprint citation count?: analyzing citation bias at the institution and country level
JCDL '22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital LibrariesCiting is an important aspect of scientific discourse and important for quantifying the scientific impact quantification of researchers. Previous works observed that citations are made not only based on the pure scholarly contributions but also based on ...
Loops in publication citation networks
Traditionally, publication citation networks are regarded as acyclic, that is, no loops in the network as an earlier published article cannot cite a later published article. However, due to the accessibility of pre-print versions of articles, there might ...
Comments