Skip to main content
Erschienen in: International Journal on Digital Libraries 2-3/2018

19.05.2017

Bag of works retrieval: TF*IDF weighting of works co-cited with a seed

verfasst von: Howard D. White

Erschienen in: International Journal on Digital Libraries | Ausgabe 2-3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Although not presently possible in any system, the style of retrieval described here combines familiar components—co-citation linkages of documents and TF*IDF weighting of terms—in a way that could be implemented in future databases. Rather than entering keywords, the user enters a string identifying a work—a seed—to retrieve the strings identifying other works that are co-cited with it. Each of the latter is part of a “bag of works,” and it presumably has both a co-citation count with the seed and an overall citation count in the database. These two counts can be plugged into a standard formula for TF*IDF weighting such that all the co-cited items can be ranked for relevance to the seed, given that the entire retrieval is relevant to it by evidence from multiple co-citing authors. The result is analogous to, but different from, traditional “bag of words” retrieval, which it supplements. Some properties of the ranking are illustrated by works co-cited with three seeds: an article on search behavior, an information retrieval textbook, and an article on centrality in networks. While these are case studies, their properties apply to bag of works retrievals in general and have implications for users (e.g., humanities scholars, domain analysts) that go beyond any one example.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting tf-idf term weights as making relevance decisions. ACM Trans. Inf. Syst. 26(3), 13 (2008)CrossRef Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting tf-idf term weights as making relevance decisions. ACM Trans. Inf. Syst. 26(3), 13 (2008)CrossRef
2.
Zurück zum Zitat Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C.L., Rokach, L.: Recommending citations: translating papers into references. In: Proceedings of the 21st International Conference on Information and Knowledge Management, pp. 1910–1914 (2012) Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C.L., Rokach, L.: Recommending citations: translating papers into references. In: Proceedings of the 21st International Conference on Information and Knowledge Management, pp. 1910–1914 (2012)
3.
Zurück zum Zitat Nascimento, C., Laender, A.H.F., da Silva, A.S., Gonçalves, M.A.: A source independent framework for research paper recommendation. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital libraries, pp. 297–306 (2011) Nascimento, C., Laender, A.H.F., da Silva, A.S., Gonçalves, M.A.: A source independent framework for research paper recommendation. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital libraries, pp. 297–306 (2011)
4.
Zurück zum Zitat Manning, C.D., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefMATH Manning, C.D., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefMATH
5.
Zurück zum Zitat Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATH Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATH
6.
Zurück zum Zitat Eto, M.: Evaluations of context-based co-citation searching. Scientometrics 94, 651–673 (2013)CrossRef Eto, M.: Evaluations of context-based co-citation searching. Scientometrics 94, 651–673 (2013)CrossRef
7.
Zurück zum Zitat Liu, S., Chen, C.: The proximity of co-citation. Scientometrics 91, 495–511 (2012)CrossRef Liu, S., Chen, C.: The proximity of co-citation. Scientometrics 91, 495–511 (2012)CrossRef
8.
Zurück zum Zitat Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28, 11–21 (1972)CrossRef Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28, 11–21 (1972)CrossRef
9.
Zurück zum Zitat Carevic, Z., Schaer, P.: On the connection between citation-based and topical relevance ranking: Results of a pretest using iSearch. In: Proceedings of the First Workshop on Bibliometric-enhanced Information Retrieval, pp. 37–44 (2014) Carevic, Z., Schaer, P.: On the connection between citation-based and topical relevance ranking: Results of a pretest using iSearch. In: Proceedings of the First Workshop on Bibliometric-enhanced Information Retrieval, pp. 37–44 (2014)
10.
Zurück zum Zitat White, H.D.: Some new tests of relevance theory in information science. Scientometrics 83, 653–667 (2010)CrossRef White, H.D.: Some new tests of relevance theory in information science. Scientometrics 83, 653–667 (2010)CrossRef
11.
Zurück zum Zitat Beel, J., Gipp, B., Langer, S., Breitinger, C.: Research paper recommender systems: a literature survey. Int. J. Digit. Libr. 17(4), 305–338 (2016) Beel, J., Gipp, B., Langer, S., Breitinger, C.: Research paper recommender systems: a literature survey. Int. J. Digit. Libr. 17(4), 305–338 (2016)
12.
Zurück zum Zitat Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24, 265–269 (1973)CrossRef Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24, 265–269 (1973)CrossRef
13.
Zurück zum Zitat Lawrence, S., Giles, C.L., Bollacker, K.: Digital libraries and autonomous citation indexing. IEEE Comput. 32(6), 67–71 (1999)CrossRef Lawrence, S., Giles, C.L., Bollacker, K.: Digital libraries and autonomous citation indexing. IEEE Comput. 32(6), 67–71 (1999)CrossRef
14.
Zurück zum Zitat Huynh, T., Hoang, K., Do, L., Tran, H., Luong, H., Gauch, S.: Scientific publication recommendations based on collaborative citation networks. In: Proceedings of the International Conference on Collaboration Technologies and Systems (CTS), pp. 316–321 (2012) Huynh, T., Hoang, K., Do, L., Tran, H., Luong, H., Gauch, S.: Scientific publication recommendations based on collaborative citation networks. In: Proceedings of the International Conference on Collaboration Technologies and Systems (CTS), pp. 316–321 (2012)
15.
Zurück zum Zitat Liang, Y., Li, Q., Qian, T.: Finding relevant papers based on citation relations. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) Lecture Notes on Computer Science, vol. 6897, pp. 403–414 (2011) Liang, Y., Li, Q., Qian, T.: Finding relevant papers based on citation relations. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) Lecture Notes on Computer Science, vol. 6897, pp. 403–414 (2011)
16.
Zurück zum Zitat Küçüktunç, O., Saule, E., Kaya, K., Çatalyürek, U.V.: Towards a personalized, scalable, and exploratory academic recommendation service. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 636–641 (2013) Küçüktunç, O., Saule, E., Kaya, K., Çatalyürek, U.V.: Towards a personalized, scalable, and exploratory academic recommendation service. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 636–641 (2013)
17.
Zurück zum Zitat Pan, L., Dai, X., Huang, S., Chen, J.: Academic paper recommendation based on heterogeneous graph. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds.) Lecture Notes on Computer Science, vol. 9427, pp. 381–392 (2015) Pan, L., Dai, X., Huang, S., Chen, J.: Academic paper recommendation based on heterogeneous graph. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds.) Lecture Notes on Computer Science, vol. 9427, pp. 381–392 (2015)
18.
Zurück zum Zitat Beel, J., Breitinger, C., Langer, S.: Evaluating the CC-IDF citation-weighting scheme: how effectively can ‘Inverse Document Frequency’ (IDF) be applied to references? In: Proceedings of the 12th iConference (in press) (2017) Beel, J., Breitinger, C., Langer, S.: Evaluating the CC-IDF citation-weighting scheme: how effectively can ‘Inverse Document Frequency’ (IDF) be applied to references? In: Proceedings of the 12th iConference (in press) (2017)
19.
Zurück zum Zitat Bates, M.J.: The design of browsing and berrypicking techniques for the online search interface. Online Review 13: 407–424 [Quoted as reprinted in her (2016) Information users and information system design. Ketchikan Press, Berkeley, California, pp 195–216] (1989) Bates, M.J.: The design of browsing and berrypicking techniques for the online search interface. Online Review 13: 407–424 [Quoted as reprinted in her (2016) Information users and information system design. Ketchikan Press, Berkeley, California, pp 195–216] (1989)
20.
Zurück zum Zitat White, H.D.: Co-cited author retrieval and relevance theory: examples from the humanities. Scientometrics 102, 2275–2299 (2014)CrossRef White, H.D.: Co-cited author retrieval and relevance theory: examples from the humanities. Scientometrics 102, 2275–2299 (2014)CrossRef
21.
Zurück zum Zitat Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92, 1170–1182 (1987)CrossRef Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92, 1170–1182 (1987)CrossRef
22.
Zurück zum Zitat Bonacich, P.: Factoring and weighting approaches to status scores and clique identification. J. Math. Sociol. 2, 113–120 (1972)CrossRef Bonacich, P.: Factoring and weighting approaches to status scores and clique identification. J. Math. Sociol. 2, 113–120 (1972)CrossRef
23.
Zurück zum Zitat White, H.D.: Combining bibliometrics, information retrieval, and relevance theory, part 1: first examples of a synthesis. J. Am. Soc. Inf. Sci. Technol. 58, 536–559 (2007)CrossRef White, H.D.: Combining bibliometrics, information retrieval, and relevance theory, part 1: first examples of a synthesis. J. Am. Soc. Inf. Sci. Technol. 58, 536–559 (2007)CrossRef
24.
Zurück zum Zitat White, H.D.: Combining bibliometrics, information retrieval, and relevance theory, part 2: some implications for information science. J. Am. Soc. Inf. Sci. Technol. 58, 583–605 (2007)CrossRef White, H.D.: Combining bibliometrics, information retrieval, and relevance theory, part 2: some implications for information science. J. Am. Soc. Inf. Sci. Technol. 58, 583–605 (2007)CrossRef
26.
Zurück zum Zitat White, H.D., Mayr, P.: Pennants for descriptors. Paper presented at the 12th International Conference on Theory and Practice of Digital Libraries. arXiv:1310.3808 (2013) White, H.D., Mayr, P.: Pennants for descriptors. Paper presented at the 12th International Conference on Theory and Practice of Digital Libraries. arXiv:​1310.​3808 (2013)
27.
Zurück zum Zitat Bates, M.J.: Document familiarity, relevance, and Bradford’s Law: the Getty Online Searching Project report no. 5. Information Processing & Management 32, 697–707 [Reprinted in her (2016) Information users and information system design. Ketchikan Press, Berkeley, California, pp. 283–300], (1996) Bates, M.J.: Document familiarity, relevance, and Bradford’s Law: the Getty Online Searching Project report no. 5. Information Processing & Management 32, 697–707 [Reprinted in her (2016) Information users and information system design. Ketchikan Press, Berkeley, California, pp. 283–300], (1996)
28.
Zurück zum Zitat Jarneving, B.: A comparison of two bibliometric methods for mapping of the research front. Scientometrics 65, 245–263 (2005)CrossRef Jarneving, B.: A comparison of two bibliometric methods for mapping of the research front. Scientometrics 65, 245–263 (2005)CrossRef
Metadaten
Titel
Bag of works retrieval: TF*IDF weighting of works co-cited with a seed
verfasst von
Howard D. White
Publikationsdatum
19.05.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Digital Libraries / Ausgabe 2-3/2018
Print ISSN: 1432-5012
Elektronische ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-017-0217-7

Weitere Artikel der Ausgabe 2-3/2018

International Journal on Digital Libraries 2-3/2018 Zur Ausgabe