Skip to main content
Erschienen in: The Journal of Supercomputing 10/2020

10.01.2017

Relevance maximization for high-recall retrieval problem: finding all needles in a haystack

verfasst von: Justin JongSu Song, Wookey Lee

Erschienen in: The Journal of Supercomputing | Ausgabe 10/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

High-recall retrieval problem, aiming at finding the full set of relevant documents in a huge result set by effective mining techniques, is particularly useful for patent information retrieval, legal document retrieval, medical document retrieval, market information retrieval, and literature review. The existing high-recall retrieval methods, however, have been far from satisfactory to retrieve all relevant documents due to not only high-recall and precision threshold measurements but also a sheer minimize the number of reviewed documents. To address this gap, we generalize the problem to a novel high-recall retrieval model, which can be represented as finding all needles in a giant haystack. To compute candidate groups consisting of k relevant documents efficiently, we propose dynamic diverse retrieval algorithms specialized for the patent-searching method, in which an effective dynamic interactive retrieval can be achieved. In the various types of datasets, the dynamic ranking method shows considerable improvements with respect to time and cost over the conventional static ranking approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval Addison-Wesley Longman Publishing Co., Inc., Boston Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval Addison-Wesley Longman Publishing Co., Inc., Boston
2.
Zurück zum Zitat Rijsbergen CJV (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, NewtonMATH Rijsbergen CJV (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, NewtonMATH
3.
Zurück zum Zitat Magdy W, Jones GJ (2010) Pres: a score metric for evaluating recall-oriented information retrieval applications. In: SIGIR ’10. ACM, New York, NY, USA, pp 611–618 Magdy W, Jones GJ (2010) Pres: a score metric for evaluating recall-oriented information retrieval applications. In: SIGIR ’10. ACM, New York, NY, USA, pp 611–618
4.
Zurück zum Zitat Abbas A, Zhang L, Khan SU (2014) A literature review on the state-of-the-art in patent analysis. World Pat Inf 37:3–13CrossRef Abbas A, Zhang L, Khan SU (2014) A literature review on the state-of-the-art in patent analysis. World Pat Inf 37:3–13CrossRef
5.
Zurück zum Zitat Magdy W, Jones GJ (2011) A study on query expansion methods for patent retrieval. In: Proceedings of the 4th Workshop on Patent Information Retrieval—PaIR ’11, p 19 Magdy W, Jones GJ (2011) A study on query expansion methods for patent retrieval. In: Proceedings of the 4th Workshop on Patent Information Retrieval—PaIR ’11, p 19
6.
Zurück zum Zitat Magdy W, Lopez P, Jones GJF (2011) Simple vs. sophisticated approaches for patent prior-art search. In: Proceedings of the 33rd European Conference on Advances in Information Retrieval, ECIR’11. Springer, Berlin, pp 725–728 Magdy W, Lopez P, Jones GJF (2011) Simple vs. sophisticated approaches for patent prior-art search. In: Proceedings of the 33rd European Conference on Advances in Information Retrieval, ECIR’11. Springer, Berlin, pp 725–728
7.
Zurück zum Zitat Magdy W, Leveling J, Jones GJF (2010) Exploring structured documents and query formulation techniques for patent retrieval. Lecture Notes in Computer Science 6241 LNCS, pp 410–417 Magdy W, Leveling J, Jones GJF (2010) Exploring structured documents and query formulation techniques for patent retrieval. Lecture Notes in Computer Science 6241 LNCS, pp 410–417
8.
Zurück zum Zitat Tseng Y-H, Lin C-J, Lin Y-I (2007) Text mining techniques for patent analysis. Inf Process Manag 43(5):1216–1247CrossRef Tseng Y-H, Lin C-J, Lin Y-I (2007) Text mining techniques for patent analysis. Inf Process Manag 43(5):1216–1247CrossRef
9.
Zurück zum Zitat Masiakowski P, Wang S (2013) Integration of software tools in patent analysis. World Pat Inf 35(2):97–104CrossRef Masiakowski P, Wang S (2013) Integration of software tools in patent analysis. World Pat Inf 35(2):97–104CrossRef
10.
Zurück zum Zitat Yoon J, Park H, Kim K (2013) Identifying technological competition trends for r&d planning using dynamic patent maps: Sao-based content analysis. Scientometrics 94(1):313–331CrossRef Yoon J, Park H, Kim K (2013) Identifying technological competition trends for r&d planning using dynamic patent maps: Sao-based content analysis. Scientometrics 94(1):313–331CrossRef
11.
Zurück zum Zitat Fleiner T, Jankó Z (2014) Choice function-based two-sided markets: stability, lattice property, path independence and algorithms. Algorithms 7(1):32–59MathSciNetCrossRef Fleiner T, Jankó Z (2014) Choice function-based two-sided markets: stability, lattice property, path independence and algorithms. Algorithms 7(1):32–59MathSciNetCrossRef
12.
Zurück zum Zitat Chang P-L, Wu C-C, Leu H-J (2010) Using patent analyses to monitor the technological trends in an emerging field of technology: a case of carbon nanotube field emission display. Scientometrics 82(1):5–19CrossRef Chang P-L, Wu C-C, Leu H-J (2010) Using patent analyses to monitor the technological trends in an emerging field of technology: a case of carbon nanotube field emission display. Scientometrics 82(1):5–19CrossRef
13.
Zurück zum Zitat Tang J, Wang B, Yang Y, Hu P, Zhao Y, Yan X, Gao B, Huang M, Xu P, Li W, Usadi AK (2012) Patentminer: topic-driven patent analysis and mining. In: KDD. ACM, pp 1366–1374 Tang J, Wang B, Yang Y, Hu P, Zhao Y, Yan X, Gao B, Huang M, Xu P, Li W, Usadi AK (2012) Patentminer: topic-driven patent analysis and mining. In: KDD. ACM, pp 1366–1374
14.
Zurück zum Zitat Shi C, Cai Y, Fu D, Dong Y, Wu B (2013) A link clustering based overlapping community detection algorithm. Data Knowl Eng 87:394–404CrossRef Shi C, Cai Y, Fu D, Dong Y, Wu B (2013) A link clustering based overlapping community detection algorithm. Data Knowl Eng 87:394–404CrossRef
15.
Zurück zum Zitat Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Inf Sci 41:288–297CrossRef Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Inf Sci 41:288–297CrossRef
16.
Zurück zum Zitat Joachims T, Granka L, Pan B, Hembrooke H, Gay G (2005) Accurately interpreting clickthrough data as implicit feedback. In: SIGIR ’05. ACM, New York, NY, USA, pp 154–161 Joachims T, Granka L, Pan B, Hembrooke H, Gay G (2005) Accurately interpreting clickthrough data as implicit feedback. In: SIGIR ’05. ACM, New York, NY, USA, pp 154–161
17.
Zurück zum Zitat Yu S, Cai D, Wen J-R, Ma W-Y (2003) Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: WWW, pp 11–18 Yu S, Cai D, Wen J-R, Ma W-Y (2003) Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: WWW, pp 11–18
18.
Zurück zum Zitat Kishida K (2003) Pseudo relevance feedback method based on taylor expansion of retrieval function in ntcir-3 patent retrieval task. PATENT ’03, pp 33–40 Kishida K (2003) Pseudo relevance feedback method based on taylor expansion of retrieval function in ntcir-3 patent retrieval task. PATENT ’03, pp 33–40
19.
Zurück zum Zitat Lupu M, Mayer K, Tait J, Trippe AJ (2011) Current challenges in Patent information retrieval, vol 29. Springer, Berlin Lupu M, Mayer K, Tait J, Trippe AJ (2011) Current challenges in Patent information retrieval, vol 29. Springer, Berlin
20.
Zurück zum Zitat Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. SIGMOD 22(2):207–216CrossRef Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. SIGMOD 22(2):207–216CrossRef
21.
Zurück zum Zitat Bonino D, Ciaramella A, Corno F (2010) Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics. World Pat Inf 32(1):30–38CrossRef Bonino D, Ciaramella A, Corno F (2010) Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics. World Pat Inf 32(1):30–38CrossRef
22.
Zurück zum Zitat Morales GDF, Gionis A (2016) Streaming similarity self-join. PVLDB 9(10):792–803 Morales GDF, Gionis A (2016) Streaming similarity self-join. PVLDB 9(10):792–803
23.
Zurück zum Zitat Kahraman HT (2012) A novel and powerful hybrid classifier method: development and testing of heuristic k-nn algorithm with fuzzy distance metric. Data Knowl Eng 103:44–59CrossRef Kahraman HT (2012) A novel and powerful hybrid classifier method: development and testing of heuristic k-nn algorithm with fuzzy distance metric. Data Knowl Eng 103:44–59CrossRef
24.
Zurück zum Zitat Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47CrossRef Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47CrossRef
25.
Zurück zum Zitat Li C, Wang Y, Resnick P, Mei Q (2014) Req-rec: High recall retrieval with query pooling and interactive classification. In: SIGIR ’14. ACM, New York, NY, USA, pp 163–172 Li C, Wang Y, Resnick P, Mei Q (2014) Req-rec: High recall retrieval with query pooling and interactive classification. In: SIGIR ’14. ACM, New York, NY, USA, pp 163–172
26.
Zurück zum Zitat Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976 Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
27.
Zurück zum Zitat Li L, Chan C-Y (2013) Efficient indexing for diverse query results. VLDB 6(9):745–756 Li L, Chan C-Y (2013) Efficient indexing for diverse query results. VLDB 6(9):745–756
28.
Zurück zum Zitat Ziegler C-N, McNee SM, Konstan JA, Lausen G (2002) Improving recommendation lists through topic diversification. In: WWW, pp 22–32 Ziegler C-N, McNee SM, Konstan JA, Lausen G (2002) Improving recommendation lists through topic diversification. In: WWW, pp 22–32
29.
Zurück zum Zitat Wang M-T (2016) Nearest neighbor query processing using the network voronoi diagram. Data Knowl Eng 103:19–43CrossRef Wang M-T (2016) Nearest neighbor query processing using the network voronoi diagram. Data Knowl Eng 103:19–43CrossRef
30.
31.
Zurück zum Zitat Garey MR, Johnson DS (1990) Computers and intractability; a guide to the theory of NP-completeness. W. H. Freeman & Co., New YorkMATH Garey MR, Johnson DS (1990) Computers and intractability; a guide to the theory of NP-completeness. W. H. Freeman & Co., New YorkMATH
33.
Zurück zum Zitat Suil O, West DB (2016) Cubic graphs with large ratio of independent domination number to domination number. Graph Comb 32(2):773–776MathSciNetCrossRef Suil O, West DB (2016) Cubic graphs with large ratio of independent domination number to domination number. Graph Comb 32(2):773–776MathSciNetCrossRef
34.
Zurück zum Zitat Blidia M, Bouchou A, Volkmann L (2014) Bounds on the k-independence and k-chromatic numbers of graphs. Ars Comb 113:33–46MathSciNetMATH Blidia M, Bouchou A, Volkmann L (2014) Bounds on the k-independence and k-chromatic numbers of graphs. Ars Comb 113:33–46MathSciNetMATH
35.
Zurück zum Zitat Bollobs B, Cockayne EJ (1979) Graph-theoretic parameters concerning domination, independence, and irredundance. J Graph Theory 3(3):241–249MathSciNetCrossRef Bollobs B, Cockayne EJ (1979) Graph-theoretic parameters concerning domination, independence, and irredundance. J Graph Theory 3(3):241–249MathSciNetCrossRef
36.
Zurück zum Zitat Favaron O (1988) Two relations between the parameters of independence and irredundance. Discrete Math 70(1):17–20MathSciNetCrossRef Favaron O (1988) Two relations between the parameters of independence and irredundance. Discrete Math 70(1):17–20MathSciNetCrossRef
Metadaten
Titel
Relevance maximization for high-recall retrieval problem: finding all needles in a haystack
verfasst von
Justin JongSu Song
Wookey Lee
Publikationsdatum
10.01.2017
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 10/2020
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-016-1956-8

Weitere Artikel der Ausgabe 10/2020

The Journal of Supercomputing 10/2020 Zur Ausgabe

Premium Partner