Skip to main content
Erschienen in: The VLDB Journal 4/2023

28.11.2022 | Regular Paper

A generic framework for efficient computation of top-k diverse results

verfasst von: Md Mouinul Islam, Mahsa Asadi, Sihem Amer-Yahia, Senjuti Basu Roy

Erschienen in: The VLDB Journal | Ausgabe 4/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Result diversification is extensively studied in the context of search, recommendation, and data exploration. There are numerous algorithms that return top-k results that are both diverse and relevant. These algorithms typically have computational loops that compare the pairwise diversity of records to decide which ones to retain. We propose an access primitive DivGetBatch() that replaces repeated pairwise comparisons of diversity scores of records by pairwise comparisons of “aggregate” diversity scores of a group of records, thereby improving the running time of these algorithms while preserving the same results. We integrate the access primitive inside three representative diversity algorithms and prove that the augmented algorithms leveraging the access primitive preserve original results. We analyze the worst and expected case running times of these algorithms. We propose a computational framework to design this access primitive that has a pre-computed index structure I-tree that is agnostic to the specific details of diversity algorithms. We develop principled solutions to construct and maintain I-tree. Our experiments on multiple large real-world datasets corroborate our theoretical findings, while ensuring up to a \(24\times \) speedup.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Diversity between a pair of records is simply \(1- similarity \) between them.
 
2
Please note diversity could be easily calculated from similarity bounds.
 
3
Diversity between a pair of records is simply \(1- similarity \) between them.
 
Literatur
1.
Zurück zum Zitat Abbar, S., et al.: Diverse near neighbor problem. In: SoCG pp. 207–214 (2013) Abbar, S., et al.: Diverse near neighbor problem. In: SoCG pp. 207–214 (2013)
2.
Zurück zum Zitat Abbar, S., et al.: Real-time recommendation of diverse related articles. WWW pp. 1–12 (2013) Abbar, S., et al.: Real-time recommendation of diverse related articles. WWW pp. 1–12 (2013)
3.
Zurück zum Zitat Abbassi, Z., et al.: Diversity maximization under matroid constraints. In: SIGKDD (2013) Abbassi, Z., et al.: Diversity maximization under matroid constraints. In: SIGKDD (2013)
4.
Zurück zum Zitat Agarwal, P.K., et al.: Efficient indexes for diverse top-k range queries. PODS pp. 213–227 (2020) Agarwal, P.K., et al.: Efficient indexes for diverse top-k range queries. PODS pp. 213–227 (2020)
5.
Zurück zum Zitat Agrawal, R., et al.: Diversifying search results. WSDM 5–14 (2009) Agrawal, R., et al.: Diversifying search results. WSDM 5–14 (2009)
6.
Zurück zum Zitat Angel, A., Koudas, N.: Efficient diversity-aware search. SIGMOD pp. 781–792 (2011) Angel, A., Koudas, N.: Efficient diversity-aware search. SIGMOD pp. 781–792 (2011)
7.
Zurück zum Zitat Balog, K., et al.: Transparent, scrutable and explainable user models for personalized recommendation. SIGIR (2019) Balog, K., et al.: Transparent, scrutable and explainable user models for personalized recommendation. SIGIR (2019)
8.
Zurück zum Zitat Bayer, R.: The universal b-tree for multidimensional indexing: General concepts. In: ICWCA, Springer, pp 198–209 (1997) Bayer, R.: The universal b-tree for multidimensional indexing: General concepts. In: ICWCA, Springer, pp 198–209 (1997)
9.
Zurück zum Zitat Beckmann, N., et al.: The r*-tree: An efficient and robust access method for points and rectangles. SIGMOD, pp 322–331 (1990) Beckmann, N., et al.: The r*-tree: An efficient and robust access method for points and rectangles. SIGMOD, pp 322–331 (1990)
10.
Zurück zum Zitat Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefMATH Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefMATH
11.
Zurück zum Zitat Berchtold, S., et al.: The x-tree: An efficient and robust access method for points and rectangles. VLDB, pp 28–39 (1996) Berchtold, S., et al.: The x-tree: An efficient and robust access method for points and rectangles. VLDB, pp 28–39 (1996)
12.
Zurück zum Zitat Beygelzimer. A., et al.: Cover trees for nearest neighbor. ICML (2006) Beygelzimer. A., et al.: Cover trees for nearest neighbor. ICML (2006)
13.
Zurück zum Zitat Cai, Z., et al.: Diversified spatial keyword search on rdf data. The VLDB Journal pp. 1–19 (2020) Cai, Z., et al.: Diversified spatial keyword search on rdf data. The VLDB Journal pp. 1–19 (2020)
14.
Zurück zum Zitat Carbonell, J., et al.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. SIGIR pp. 335–336 (1998) Carbonell, J., et al.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. SIGIR pp. 335–336 (1998)
15.
Zurück zum Zitat Ciaccia, P., et al.: M-tree: An efficient access method for similarity search in metric spaces. Vldb 97, 426–435 (1997) Ciaccia, P., et al.: M-tree: An efficient access method for similarity search in metric spaces. Vldb 97, 426–435 (1997)
16.
Zurück zum Zitat Cormen, T.H., et al.: Introduction to algorithms. MIT press (2009) Cormen, T.H., et al.: Introduction to algorithms. MIT press (2009)
17.
Zurück zum Zitat Drosou, M., et al.: Diversity over continuous data. IEEE Data Eng Bull 32(4), 49–56 (2009) Drosou, M., et al.: Diversity over continuous data. IEEE Data Eng Bull 32(4), 49–56 (2009)
18.
Zurück zum Zitat Drosou, M., et al.: Disc diversity: result diversification based on dissimilarity and coverage (2012). arXiv preprint arXiv:1208.3533 Drosou, M., et al.: Disc diversity: result diversification based on dissimilarity and coverage (2012). arXiv preprint arXiv:​1208.​3533
19.
Zurück zum Zitat Drosou, M., et al.: Diverse set selection over dynamic data. TKDE 26 (2013) Drosou, M., et al.: Diverse set selection over dynamic data. TKDE 26 (2013)
20.
Zurück zum Zitat Esfandiari, M., et al.: Multi-session diversity to improve user satisfaction in web applications. TWC, pp 1928–1936 (2021) Esfandiari, M., et al.: Multi-session diversity to improve user satisfaction in web applications. TWC, pp 1928–1936 (2021)
21.
Zurück zum Zitat Fraternali, P., et al.: Top-k bounded diversification. SIGMOD, pp 421–432 (2012) Fraternali, P., et al.: Top-k bounded diversification. SIGMOD, pp 421–432 (2012)
22.
Zurück zum Zitat Gollapudi, S., et al.: An axiomatic approach for result diversification. WWW pp. 381–390 (2009) Gollapudi, S., et al.: An axiomatic approach for result diversification. WWW pp. 381–390 (2009)
24.
Zurück zum Zitat Guttman, A.: R-trees: A dynamic index structure for spatial searching, ACM 14(2), (1984) Guttman, A.: R-trees: A dynamic index structure for spatial searching, ACM 14(2), (1984)
25.
Zurück zum Zitat Han, J., et al.: Data mining concepts and techniques third edition. Morgan Kaufmann Series 5(4), 83–124 (2011) Han, J., et al.: Data mining concepts and techniques third edition. Morgan Kaufmann Series 5(4), 83–124 (2011)
26.
Zurück zum Zitat Hope, T., et al.: Accelerating innovation through analogy mining. SIGKDD (2017) Hope, T., et al.: Accelerating innovation through analogy mining. SIGKDD (2017)
27.
Zurück zum Zitat Katayama, N., et al.: The sr-tree: An index structure for high-dimensional nearest neighbor queries. Sigmod Record 26(2), 369–380 (1997)CrossRef Katayama, N., et al.: The sr-tree: An index structure for high-dimensional nearest neighbor queries. Sigmod Record 26(2), 369–380 (1997)CrossRef
28.
Zurück zum Zitat Knuth, D.E.: The Art of Computer Programming, Fundamental Algorithms, vol 1, 3rd edn. Addison Wesley Longman Publishing Co., Inc., (book) (1998) Knuth, D.E.: The Art of Computer Programming, Fundamental Algorithms, vol 1, 3rd edn. Addison Wesley Longman Publishing Co., Inc., (book) (1998)
29.
Zurück zum Zitat Kumar, N., et al.: What is a good nearest neighbors algorithm for finding similar patches in images? In: European conference on computer vision, Springer, pp 364–378 (2008) Kumar, N., et al.: What is a good nearest neighbors algorithm for finding similar patches in images? In: European conference on computer vision, Springer, pp 364–378 (2008)
30.
Zurück zum Zitat Mafrur, R., et al.: Dive: diversifying view recommendation for visual data exploration. CIKM pp. 1123–1132 (2018) Mafrur, R., et al.: Dive: diversifying view recommendation for visual data exploration. CIKM pp. 1123–1132 (2018)
31.
Zurück zum Zitat Maropaki, S., et al.: Diversifying top-k point-of-interest queries via collective social reach. In: CIKM pp. 2149–2152 (2020) Maropaki, S., et al.: Diversifying top-k point-of-interest queries via collective social reach. In: CIKM pp. 2149–2152 (2020)
32.
Zurück zum Zitat Mouratidis, K.: Geometric aspects and auxiliary features to top-k processing. In: MDM (2016) Mouratidis, K.: Geometric aspects and auxiliary features to top-k processing. In: MDM (2016)
33.
Zurück zum Zitat Nikookar, S., et al.: Diversifying recommendations on sequences of sets. VLDB Journal (2022) Nikookar, S., et al.: Diversifying recommendations on sequences of sets. VLDB Journal (2022)
35.
Zurück zum Zitat Puthiya Parambath, S.A., et al.: A coverage-based approach to recommendation diversity on similarity graph. In: RecSys pp. 15–22 (2016) Puthiya Parambath, S.A., et al.: A coverage-based approach to recommendation diversity on similarity graph. In: RecSys pp. 15–22 (2016)
37.
Zurück zum Zitat Ren, P., et al.: Leveraging contextual sentence relations for extractive summarization using a neural attention model. SIGIR, pp 95–104 (2017) Ren, P., et al.: Leveraging contextual sentence relations for extractive summarization using a neural attention model. SIGIR, pp 95–104 (2017)
38.
Zurück zum Zitat Robinson, J.T.: The kdb-tree: a search structure for large multidimensional dynamic indexes. SIGMOD, pp 10–18 (1981) Robinson, J.T.: The kdb-tree: a search structure for large multidimensional dynamic indexes. SIGMOD, pp 10–18 (1981)
39.
Zurück zum Zitat Singh, A., et al.: Fairness of exposure in rankings. In: SIGKDD pp. 2219–2228 (2018) Singh, A., et al.: Fairness of exposure in rankings. In: SIGKDD pp. 2219–2228 (2018)
40.
Zurück zum Zitat Tsai, C.H., et al.: Beyond the ranked list: User-driven exploration and diversification of social recommendation. In: 23rd ICIUI pp 239–250 (2018) Tsai, C.H., et al.: Beyond the ranked list: User-driven exploration and diversification of social recommendation. In: 23rd ICIUI pp 239–250 (2018)
41.
Zurück zum Zitat Vargas, S., et al.: Rank and relevance in novelty and diversity metrics for recommender systems. RecSys (2011) Vargas, S., et al.: Rank and relevance in novelty and diversity metrics for recommender systems. RecSys (2011)
42.
Zurück zum Zitat Vargas, S., et al.: Coverage, redundancy and size-awareness in genre diversity for recommender systems. RecSys pp. 209–216 (2014) Vargas, S., et al.: Coverage, redundancy and size-awareness in genre diversity for recommender systems. RecSys pp. 209–216 (2014)
43.
Zurück zum Zitat Wang, D., et al.: Sequence-based context-aware music recommendation. Information Retrieval Journal pp. 230–252 (2018) Wang, D., et al.: Sequence-based context-aware music recommendation. Information Retrieval Journal pp. 230–252 (2018)
44.
Zurück zum Zitat Wang, L., et al.: Diversified and scalable service recommendation with accuracy guarantee. IEEE TCSS (2020) Wang, L., et al.: Diversified and scalable service recommendation with accuracy guarantee. IEEE TCSS (2020)
45.
Zurück zum Zitat White, D.A., et al.: Similarity indexing with the ss-tree. In: ICDE pp. 516–523 (1996) White, D.A., et al.: Similarity indexing with the ss-tree. In: ICDE pp. 516–523 (1996)
46.
Zurück zum Zitat Wu, W., et al.: Personalizing recommendation diversity based on user personality. UMUAI 28(3), 237–276 (2018) Wu, W., et al.: Personalizing recommendation diversity based on user personality. UMUAI 28(3), 237–276 (2018)
47.
Zurück zum Zitat Wu, Y., et al.: Beyond greedy search: pruned exhaustive search for diversified result ranking. SIGIR, pp 99–106 (2018b) Wu, Y., et al.: Beyond greedy search: pruned exhaustive search for diversified result ranking. SIGIR, pp 99–106 (2018b)
48.
Zurück zum Zitat Jg, Y., et al.: Recent advances in document summarization. KIS 53(2), 297–336 (2017) Jg, Y., et al.: Recent advances in document summarization. KIS 53(2), 297–336 (2017)
49.
Zurück zum Zitat Yu, C., et al.: It takes variety to make a world: diversification in recommender systems. EDBT pp. 368–378 (2009) Yu, C., et al.: It takes variety to make a world: diversification in recommender systems. EDBT pp. 368–378 (2009)
50.
Zurück zum Zitat Zanitti, M., et al.: A user-centric diversity by design recommender system for the movie application domain. In: Companion Proceedings of WWW, pp 1381–1389 (2018) Zanitti, M., et al.: A user-centric diversity by design recommender system for the movie application domain. In: Companion Proceedings of WWW, pp 1381–1389 (2018)
51.
Zurück zum Zitat Zehlike, M., et al.: Fa* ir: A fair top-k ranking algorithm. In: CIKM pp. 1569–1578 (2017) Zehlike, M., et al.: Fa* ir: A fair top-k ranking algorithm. In: CIKM pp. 1569–1578 (2017)
Metadaten
Titel
A generic framework for efficient computation of top-k diverse results
verfasst von
Md Mouinul Islam
Mahsa Asadi
Sihem Amer-Yahia
Senjuti Basu Roy
Publikationsdatum
28.11.2022
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 4/2023
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-022-00770-0

Weitere Artikel der Ausgabe 4/2023

The VLDB Journal 4/2023 Zur Ausgabe