Skip to main content
Top
Published in: The VLDB Journal 4/2023

28-11-2022 | Regular Paper

A generic framework for efficient computation of top-k diverse results

Authors: Md Mouinul Islam, Mahsa Asadi, Sihem Amer-Yahia, Senjuti Basu Roy

Published in: The VLDB Journal | Issue 4/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Result diversification is extensively studied in the context of search, recommendation, and data exploration. There are numerous algorithms that return top-k results that are both diverse and relevant. These algorithms typically have computational loops that compare the pairwise diversity of records to decide which ones to retain. We propose an access primitive DivGetBatch() that replaces repeated pairwise comparisons of diversity scores of records by pairwise comparisons of “aggregate” diversity scores of a group of records, thereby improving the running time of these algorithms while preserving the same results. We integrate the access primitive inside three representative diversity algorithms and prove that the augmented algorithms leveraging the access primitive preserve original results. We analyze the worst and expected case running times of these algorithms. We propose a computational framework to design this access primitive that has a pre-computed index structure I-tree that is agnostic to the specific details of diversity algorithms. We develop principled solutions to construct and maintain I-tree. Our experiments on multiple large real-world datasets corroborate our theoretical findings, while ensuring up to a \(24\times \) speedup.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Diversity between a pair of records is simply \(1- similarity \) between them.
 
2
Please note diversity could be easily calculated from similarity bounds.
 
3
Diversity between a pair of records is simply \(1- similarity \) between them.
 
Literature
1.
go back to reference Abbar, S., et al.: Diverse near neighbor problem. In: SoCG pp. 207–214 (2013) Abbar, S., et al.: Diverse near neighbor problem. In: SoCG pp. 207–214 (2013)
2.
go back to reference Abbar, S., et al.: Real-time recommendation of diverse related articles. WWW pp. 1–12 (2013) Abbar, S., et al.: Real-time recommendation of diverse related articles. WWW pp. 1–12 (2013)
3.
go back to reference Abbassi, Z., et al.: Diversity maximization under matroid constraints. In: SIGKDD (2013) Abbassi, Z., et al.: Diversity maximization under matroid constraints. In: SIGKDD (2013)
4.
go back to reference Agarwal, P.K., et al.: Efficient indexes for diverse top-k range queries. PODS pp. 213–227 (2020) Agarwal, P.K., et al.: Efficient indexes for diverse top-k range queries. PODS pp. 213–227 (2020)
5.
go back to reference Agrawal, R., et al.: Diversifying search results. WSDM 5–14 (2009) Agrawal, R., et al.: Diversifying search results. WSDM 5–14 (2009)
6.
go back to reference Angel, A., Koudas, N.: Efficient diversity-aware search. SIGMOD pp. 781–792 (2011) Angel, A., Koudas, N.: Efficient diversity-aware search. SIGMOD pp. 781–792 (2011)
7.
go back to reference Balog, K., et al.: Transparent, scrutable and explainable user models for personalized recommendation. SIGIR (2019) Balog, K., et al.: Transparent, scrutable and explainable user models for personalized recommendation. SIGIR (2019)
8.
go back to reference Bayer, R.: The universal b-tree for multidimensional indexing: General concepts. In: ICWCA, Springer, pp 198–209 (1997) Bayer, R.: The universal b-tree for multidimensional indexing: General concepts. In: ICWCA, Springer, pp 198–209 (1997)
9.
go back to reference Beckmann, N., et al.: The r*-tree: An efficient and robust access method for points and rectangles. SIGMOD, pp 322–331 (1990) Beckmann, N., et al.: The r*-tree: An efficient and robust access method for points and rectangles. SIGMOD, pp 322–331 (1990)
10.
go back to reference Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefMATH Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefMATH
11.
go back to reference Berchtold, S., et al.: The x-tree: An efficient and robust access method for points and rectangles. VLDB, pp 28–39 (1996) Berchtold, S., et al.: The x-tree: An efficient and robust access method for points and rectangles. VLDB, pp 28–39 (1996)
12.
go back to reference Beygelzimer. A., et al.: Cover trees for nearest neighbor. ICML (2006) Beygelzimer. A., et al.: Cover trees for nearest neighbor. ICML (2006)
13.
go back to reference Cai, Z., et al.: Diversified spatial keyword search on rdf data. The VLDB Journal pp. 1–19 (2020) Cai, Z., et al.: Diversified spatial keyword search on rdf data. The VLDB Journal pp. 1–19 (2020)
14.
go back to reference Carbonell, J., et al.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. SIGIR pp. 335–336 (1998) Carbonell, J., et al.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. SIGIR pp. 335–336 (1998)
15.
go back to reference Ciaccia, P., et al.: M-tree: An efficient access method for similarity search in metric spaces. Vldb 97, 426–435 (1997) Ciaccia, P., et al.: M-tree: An efficient access method for similarity search in metric spaces. Vldb 97, 426–435 (1997)
16.
go back to reference Cormen, T.H., et al.: Introduction to algorithms. MIT press (2009) Cormen, T.H., et al.: Introduction to algorithms. MIT press (2009)
17.
go back to reference Drosou, M., et al.: Diversity over continuous data. IEEE Data Eng Bull 32(4), 49–56 (2009) Drosou, M., et al.: Diversity over continuous data. IEEE Data Eng Bull 32(4), 49–56 (2009)
18.
go back to reference Drosou, M., et al.: Disc diversity: result diversification based on dissimilarity and coverage (2012). arXiv preprint arXiv:1208.3533 Drosou, M., et al.: Disc diversity: result diversification based on dissimilarity and coverage (2012). arXiv preprint arXiv:​1208.​3533
19.
go back to reference Drosou, M., et al.: Diverse set selection over dynamic data. TKDE 26 (2013) Drosou, M., et al.: Diverse set selection over dynamic data. TKDE 26 (2013)
20.
go back to reference Esfandiari, M., et al.: Multi-session diversity to improve user satisfaction in web applications. TWC, pp 1928–1936 (2021) Esfandiari, M., et al.: Multi-session diversity to improve user satisfaction in web applications. TWC, pp 1928–1936 (2021)
21.
go back to reference Fraternali, P., et al.: Top-k bounded diversification. SIGMOD, pp 421–432 (2012) Fraternali, P., et al.: Top-k bounded diversification. SIGMOD, pp 421–432 (2012)
22.
go back to reference Gollapudi, S., et al.: An axiomatic approach for result diversification. WWW pp. 381–390 (2009) Gollapudi, S., et al.: An axiomatic approach for result diversification. WWW pp. 381–390 (2009)
24.
go back to reference Guttman, A.: R-trees: A dynamic index structure for spatial searching, ACM 14(2), (1984) Guttman, A.: R-trees: A dynamic index structure for spatial searching, ACM 14(2), (1984)
25.
go back to reference Han, J., et al.: Data mining concepts and techniques third edition. Morgan Kaufmann Series 5(4), 83–124 (2011) Han, J., et al.: Data mining concepts and techniques third edition. Morgan Kaufmann Series 5(4), 83–124 (2011)
26.
go back to reference Hope, T., et al.: Accelerating innovation through analogy mining. SIGKDD (2017) Hope, T., et al.: Accelerating innovation through analogy mining. SIGKDD (2017)
27.
go back to reference Katayama, N., et al.: The sr-tree: An index structure for high-dimensional nearest neighbor queries. Sigmod Record 26(2), 369–380 (1997)CrossRef Katayama, N., et al.: The sr-tree: An index structure for high-dimensional nearest neighbor queries. Sigmod Record 26(2), 369–380 (1997)CrossRef
28.
go back to reference Knuth, D.E.: The Art of Computer Programming, Fundamental Algorithms, vol 1, 3rd edn. Addison Wesley Longman Publishing Co., Inc., (book) (1998) Knuth, D.E.: The Art of Computer Programming, Fundamental Algorithms, vol 1, 3rd edn. Addison Wesley Longman Publishing Co., Inc., (book) (1998)
29.
go back to reference Kumar, N., et al.: What is a good nearest neighbors algorithm for finding similar patches in images? In: European conference on computer vision, Springer, pp 364–378 (2008) Kumar, N., et al.: What is a good nearest neighbors algorithm for finding similar patches in images? In: European conference on computer vision, Springer, pp 364–378 (2008)
30.
go back to reference Mafrur, R., et al.: Dive: diversifying view recommendation for visual data exploration. CIKM pp. 1123–1132 (2018) Mafrur, R., et al.: Dive: diversifying view recommendation for visual data exploration. CIKM pp. 1123–1132 (2018)
31.
go back to reference Maropaki, S., et al.: Diversifying top-k point-of-interest queries via collective social reach. In: CIKM pp. 2149–2152 (2020) Maropaki, S., et al.: Diversifying top-k point-of-interest queries via collective social reach. In: CIKM pp. 2149–2152 (2020)
32.
go back to reference Mouratidis, K.: Geometric aspects and auxiliary features to top-k processing. In: MDM (2016) Mouratidis, K.: Geometric aspects and auxiliary features to top-k processing. In: MDM (2016)
33.
go back to reference Nikookar, S., et al.: Diversifying recommendations on sequences of sets. VLDB Journal (2022) Nikookar, S., et al.: Diversifying recommendations on sequences of sets. VLDB Journal (2022)
35.
go back to reference Puthiya Parambath, S.A., et al.: A coverage-based approach to recommendation diversity on similarity graph. In: RecSys pp. 15–22 (2016) Puthiya Parambath, S.A., et al.: A coverage-based approach to recommendation diversity on similarity graph. In: RecSys pp. 15–22 (2016)
37.
go back to reference Ren, P., et al.: Leveraging contextual sentence relations for extractive summarization using a neural attention model. SIGIR, pp 95–104 (2017) Ren, P., et al.: Leveraging contextual sentence relations for extractive summarization using a neural attention model. SIGIR, pp 95–104 (2017)
38.
go back to reference Robinson, J.T.: The kdb-tree: a search structure for large multidimensional dynamic indexes. SIGMOD, pp 10–18 (1981) Robinson, J.T.: The kdb-tree: a search structure for large multidimensional dynamic indexes. SIGMOD, pp 10–18 (1981)
39.
go back to reference Singh, A., et al.: Fairness of exposure in rankings. In: SIGKDD pp. 2219–2228 (2018) Singh, A., et al.: Fairness of exposure in rankings. In: SIGKDD pp. 2219–2228 (2018)
40.
go back to reference Tsai, C.H., et al.: Beyond the ranked list: User-driven exploration and diversification of social recommendation. In: 23rd ICIUI pp 239–250 (2018) Tsai, C.H., et al.: Beyond the ranked list: User-driven exploration and diversification of social recommendation. In: 23rd ICIUI pp 239–250 (2018)
41.
go back to reference Vargas, S., et al.: Rank and relevance in novelty and diversity metrics for recommender systems. RecSys (2011) Vargas, S., et al.: Rank and relevance in novelty and diversity metrics for recommender systems. RecSys (2011)
42.
go back to reference Vargas, S., et al.: Coverage, redundancy and size-awareness in genre diversity for recommender systems. RecSys pp. 209–216 (2014) Vargas, S., et al.: Coverage, redundancy and size-awareness in genre diversity for recommender systems. RecSys pp. 209–216 (2014)
43.
go back to reference Wang, D., et al.: Sequence-based context-aware music recommendation. Information Retrieval Journal pp. 230–252 (2018) Wang, D., et al.: Sequence-based context-aware music recommendation. Information Retrieval Journal pp. 230–252 (2018)
44.
go back to reference Wang, L., et al.: Diversified and scalable service recommendation with accuracy guarantee. IEEE TCSS (2020) Wang, L., et al.: Diversified and scalable service recommendation with accuracy guarantee. IEEE TCSS (2020)
45.
go back to reference White, D.A., et al.: Similarity indexing with the ss-tree. In: ICDE pp. 516–523 (1996) White, D.A., et al.: Similarity indexing with the ss-tree. In: ICDE pp. 516–523 (1996)
46.
go back to reference Wu, W., et al.: Personalizing recommendation diversity based on user personality. UMUAI 28(3), 237–276 (2018) Wu, W., et al.: Personalizing recommendation diversity based on user personality. UMUAI 28(3), 237–276 (2018)
47.
go back to reference Wu, Y., et al.: Beyond greedy search: pruned exhaustive search for diversified result ranking. SIGIR, pp 99–106 (2018b) Wu, Y., et al.: Beyond greedy search: pruned exhaustive search for diversified result ranking. SIGIR, pp 99–106 (2018b)
48.
go back to reference Jg, Y., et al.: Recent advances in document summarization. KIS 53(2), 297–336 (2017) Jg, Y., et al.: Recent advances in document summarization. KIS 53(2), 297–336 (2017)
49.
go back to reference Yu, C., et al.: It takes variety to make a world: diversification in recommender systems. EDBT pp. 368–378 (2009) Yu, C., et al.: It takes variety to make a world: diversification in recommender systems. EDBT pp. 368–378 (2009)
50.
go back to reference Zanitti, M., et al.: A user-centric diversity by design recommender system for the movie application domain. In: Companion Proceedings of WWW, pp 1381–1389 (2018) Zanitti, M., et al.: A user-centric diversity by design recommender system for the movie application domain. In: Companion Proceedings of WWW, pp 1381–1389 (2018)
51.
go back to reference Zehlike, M., et al.: Fa* ir: A fair top-k ranking algorithm. In: CIKM pp. 1569–1578 (2017) Zehlike, M., et al.: Fa* ir: A fair top-k ranking algorithm. In: CIKM pp. 1569–1578 (2017)
Metadata
Title
A generic framework for efficient computation of top-k diverse results
Authors
Md Mouinul Islam
Mahsa Asadi
Sihem Amer-Yahia
Senjuti Basu Roy
Publication date
28-11-2022
Publisher
Springer Berlin Heidelberg
Published in
The VLDB Journal / Issue 4/2023
Print ISSN: 1066-8888
Electronic ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-022-00770-0

Other articles of this Issue 4/2023

The VLDB Journal 4/2023 Go to the issue

Premium Partner