Skip to main content

16.11.2023 | Special Issue Paper

Efficient and robust active learning methods for interactive database exploration

verfasst von: Enhui Huang, Yanlei Diao, Anna Liu, Liping Peng, Luciano Di Palma

Erschienen in: The VLDB Journal

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

There is an increasing gap between fast growth of data and the limited human ability to comprehend data. Consequently, there has been a growing demand of data management tools that can bridge this gap and help the user retrieve high-value content from data more effectively. In this work, we propose an interactive data exploration system as a new database service, using an approach called “explore-by-example.” Our new system is designed to assist the user in performing highly effective data exploration while reducing the human effort in the process. We cast the explore-by-example problem in a principled “active learning” framework. However, traditional active learning suffers from two fundamental limitations: slow convergence and lack of robustness under label noise. To overcome the slow convergence and label noise problems, we bring the properties of important classes of database queries to bear on the design of new algorithms and optimizations for active learning-based database exploration. Evaluation results using real-world datasets and user interest patterns show that our new system, both in the noise-free case and in the label noise case, significantly outperforms state-of-the-art active learning techniques and data exploration systems in accuracy while achieving the desired efficiency for interactive data exploration.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In our work, we use the term, user interest query to refer to the final query that represents the user interest, and user interest model to refer to an immediate model before it converges to the true user interest.
 
2
This work considers a dataset that consists of a single table. It is left to our future work to study the extension for join queries.
 
3
Feature selection to filter irrelevant attributes is addressed in our previous paper [40]. Due to space constraints, in this paper we assume that feature selection is performed the same as [40] and focus on other data exploration issues.
 
4
Ray provides fast distributed computing at https://​ray.​io/​.
 
6
The content is extracted from http://​www.​teoalida.​com/​.
 
Literatur
1.
Zurück zum Zitat Abouzied, A., Angluin, D., et al.: Learning and verifying quantified boolean queries by example. In: PODS, pp. 49–60 (2013) Abouzied, A., Angluin, D., et al.: Learning and verifying quantified boolean queries by example. In: PODS, pp. 49–60 (2013)
2.
Zurück zum Zitat Abouzied, A., Hellerstein, J.M., Silberschatz, A.: Playful query specification with dataplay. PVLDB 5(12), 1938–1941 (2012) Abouzied, A., Hellerstein, J.M., Silberschatz, A.: Playful query specification with dataplay. PVLDB 5(12), 1938–1941 (2012)
3.
Zurück zum Zitat Agarwal, A., Garg, R., Chaudhury, S.: Greedy search for active learning of OCR. In: ICDAR, pp. 837–841 (2013) Agarwal, A., Garg, R., Chaudhury, S.: Greedy search for active learning of OCR. In: ICDAR, pp. 837–841 (2013)
4.
Zurück zum Zitat Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: ICDT, pp. 420–434. Springer, Berlin (2001) Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: ICDT, pp. 420–434. Springer, Berlin (2001)
5.
Zurück zum Zitat Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: ECML, pp. 39–50 (2004) Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: ECML, pp. 39–50 (2004)
6.
Zurück zum Zitat Amer-Yahia, S., et al.: INODE: building an end-to-end data exploration system in practice. SIGMOD Rec. 50(4), 23–29 (2021)CrossRef Amer-Yahia, S., et al.: INODE: building an end-to-end data exploration system in practice. SIGMOD Rec. 50(4), 23–29 (2021)CrossRef
7.
Zurück zum Zitat Amer-Yahia, S., Milo, T., Youngmann, B.: Exploring ratings in subjective databases. In: SIGMOD, pp. 62–75. ACM (2021) Amer-Yahia, S., Milo, T., Youngmann, B.: Exploring ratings in subjective databases. In: SIGMOD, pp. 62–75. ACM (2021)
8.
Zurück zum Zitat Bach, S.H., He, B., Ratner, A., Ré, C.: Learning the structure of generative models without labeled data. In: ICML, pp. 273–282 (2017) Bach, S.H., He, B., Ratner, A., Ré, C.: Learning the structure of generative models without labeled data. In: ICML, pp. 273–282 (2017)
10.
Zurück zum Zitat Bellare, K., Iyengar, S., Parameswaran, A., Rastogi, V.: Active sampling for entity matching with guarantees. ACM Trans. Knowl. Discov. Data 7(3), 12:1-12:24 (2013)CrossRef Bellare, K., Iyengar, S., Parameswaran, A., Rastogi, V.: Active sampling for entity matching with guarantees. ACM Trans. Knowl. Discov. Data 7(3), 12:1-12:24 (2013)CrossRef
11.
Zurück zum Zitat Berthon, A., Han, B., Niu, G., Liu, T., Sugiyama, M.: Confidence scores make instance-dependent label-noise learning possible. arXiv preprint arXiv:2001.03772 (2020) Berthon, A., Han, B., Niu, G., Liu, T., Sugiyama, M.: Confidence scores make instance-dependent label-noise learning possible. arXiv preprint arXiv:​2001.​03772 (2020)
12.
Zurück zum Zitat Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT, pp. 92–100 (1998) Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT, pp. 92–100 (1998)
13.
Zurück zum Zitat Bordes, A., Ertekin, S., Weston, J., et al.: Fast kernel classifiers with online and active learning. JMLR 6, 1579–1619 (2005)MathSciNetMATH Bordes, A., Ertekin, S., Weston, J., et al.: Fast kernel classifiers with online and active learning. JMLR 6, 1579–1619 (2005)MathSciNetMATH
14.
Zurück zum Zitat Bouguelia, M., Nowaczyk, S., Santosh, K.C., Verikas, A.: Agreeing to disagree: active learning with noisy labels without crowdsourcing. IJMLC 9(8), 1307–1319 (2018) Bouguelia, M., Nowaczyk, S., Santosh, K.C., Verikas, A.: Agreeing to disagree: active learning with noisy labels without crowdsourcing. IJMLC 9(8), 1307–1319 (2018)
15.
Zurück zum Zitat Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)CrossRefMATH Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)CrossRefMATH
16.
Zurück zum Zitat Campbell, C., Cristianini, N., Smola, A.J.: Query learning with large margin classifiers. In: ICML, pp. 111–118 (2000) Campbell, C., Cristianini, N., Smola, A.J.: Query learning with large margin classifiers. In: ICML, pp. 111–118 (2000)
17.
Zurück zum Zitat Cheng, J., Liu, T., et al.: Learning with bounded instance and label-dependent label noise. In: ICML, pp. 1789–1799. PMLR (2020) Cheng, J., Liu, T., et al.: Learning with bounded instance and label-dependent label noise. In: ICML, pp. 1789–1799. PMLR (2020)
18.
Zurück zum Zitat Cheung, A., Solar-Lezama, A.: Computer-assisted query formulation. Found. Trends Program. Lang. 3(1), 1–94 (2016)CrossRef Cheung, A., Solar-Lezama, A.: Computer-assisted query formulation. Found. Trends Program. Lang. 3(1), 1–94 (2016)CrossRef
19.
Zurück zum Zitat Cheung, A., Solar-Lezama, A., et al.: Using program synthesis for social recommendations. In: CIKM, pp. 1732–1736. ACM (2012) Cheung, A., Solar-Lezama, A., et al.: Using program synthesis for social recommendations. In: CIKM, pp. 1732–1736. ACM (2012)
20.
Zurück zum Zitat Cuendet, S., Hakkani-Tür, D., et al.: Automatic labeling inconsistencies detection and correction for sentence unit segmentation in conversational speech. In: International Workshop on MLMI, pp. 144–155 (2007) Cuendet, S., Hakkani-Tür, D., et al.: Automatic labeling inconsistencies detection and correction for sentence unit segmentation in conversational speech. In: International Workshop on MLMI, pp. 144–155 (2007)
21.
Zurück zum Zitat Dash, D., Rao, J., Megiddo, N., et al.: Dynamic faceted search for discovery-driven analysis. In: CIKM, pp. 3–12 (2008) Dash, D., Rao, J., Megiddo, N., et al.: Dynamic faceted search for discovery-driven analysis. In: CIKM, pp. 3–12 (2008)
22.
Zurück zum Zitat Diao, Y., et al.: AIDE: an automatic user navigation system for interactive data exploration. PVLDB 8(12), 1964–1967 (2015) Diao, Y., et al.: AIDE: an automatic user navigation system for interactive data exploration. PVLDB 8(12), 1964–1967 (2015)
23.
Zurück zum Zitat Dimitriadou, K., Papaemmanouil, O., Diao, Y.: Explore-by-example: an automatic query steering framework for interactive data exploration. In: SIGMOD, pp. 517–528 (2014) Dimitriadou, K., Papaemmanouil, O., Diao, Y.: Explore-by-example: an automatic query steering framework for interactive data exploration. In: SIGMOD, pp. 517–528 (2014)
24.
Zurück zum Zitat Dimitriadou, K., Papaemmanouil, O., Diao, Y.: Aide: an active learning-based approach for interactive data exploration. TKDE 28(11), 2842–2856 (2016) Dimitriadou, K., Papaemmanouil, O., Diao, Y.: Aide: an active learning-based approach for interactive data exploration. TKDE 28(11), 2842–2856 (2016)
25.
Zurück zum Zitat Du, J., Cai, Z.: Modelling class noise with symmetric and asymmetric distributions. In: AAAI, pp. 2589–2595 (2015) Du, J., Cai, Z.: Modelling class noise with symmetric and asymmetric distributions. In: AAAI, pp. 2589–2595 (2015)
26.
Zurück zum Zitat El, O.B., Milo, T., Somech, A.: Automatically generating data exploration sessions using deep reinforcement learning. In: SIGMOD, pp. 1527–1537. ACM (2020) El, O.B., Milo, T., Somech, A.: Automatically generating data exploration sessions using deep reinforcement learning. In: SIGMOD, pp. 1527–1537. ACM (2020)
27.
Zurück zum Zitat El-Yaniv, R., Wiener, Y.: Active learning via perfect selective classification. JMLR 13(1), 255–279 (2012)MathSciNetMATH El-Yaniv, R., Wiener, Y.: Active learning via perfect selective classification. JMLR 13(1), 255–279 (2012)MathSciNetMATH
28.
Zurück zum Zitat Ertekin, S., Huang, J., et al.: Learning on the border: active learning in imbalanced data classification. In: CIKM, pp. 127–136 (2007) Ertekin, S., Huang, J., et al.: Learning on the border: active learning in imbalanced data classification. In: CIKM, pp. 127–136 (2007)
29.
Zurück zum Zitat Esmailoghli, M., Quiané-Ruiz, J., et al.: COCOA: correlation coefficient-aware data augmentation. In: EDBT, pp. 331–336 (2021) Esmailoghli, M., Quiané-Ruiz, J., et al.: COCOA: correlation coefficient-aware data augmentation. In: EDBT, pp. 331–336 (2021)
30.
Zurück zum Zitat Fernandez, R.C., Abedjan, Z., et al.: Aurum: a data discovery system. In: ICDE, pp. 1001–1012 (2018) Fernandez, R.C., Abedjan, Z., et al.: Aurum: a data discovery system. In: ICDE, pp. 1001–1012 (2018)
31.
Zurück zum Zitat Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)CrossRefMATH Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)CrossRefMATH
32.
Zurück zum Zitat Gamberger, D., Lavrač, N., Džeroski, S.: Noise elimination in inductive concept learning: a case study in medical diagnosis. In: International workshop on ALT, pp. 199–212. Springer (1996) Gamberger, D., Lavrač, N., Džeroski, S.: Noise elimination in inductive concept learning: a case study in medical diagnosis. In: International workshop on ALT, pp. 199–212. Springer (1996)
33.
Zurück zum Zitat Garnett, R., et al.: Bayesian optimal active search and surveying. In: ICML, pp. 843–850 (2012) Garnett, R., et al.: Bayesian optimal active search and surveying. In: ICML, pp. 843–850 (2012)
34.
Zurück zum Zitat Grünbaum, B.: Convex polytopes, 2nd edn. In: Convex Polytopes. Springer, New York (2003) Grünbaum, B.: Convex polytopes, 2nd edn. In: Convex Polytopes. Springer, New York (2003)
36.
Zurück zum Zitat Hanneke, S.: Theory of disagreement-based active learning. Found. Trends Mach. Learn. 7(2–3), 131–309 (2014)CrossRefMATH Hanneke, S.: Theory of disagreement-based active learning. Found. Trends Mach. Learn. 7(2–3), 131–309 (2014)CrossRefMATH
37.
Zurück zum Zitat Hanneke, S.: Refined error bounds for several learning algorithms. J. Mach. Learn. Res. 17(1), 4667–4721 (2016)MathSciNetMATH Hanneke, S.: Refined error bounds for several learning algorithms. J. Mach. Learn. Res. 17(1), 4667–4721 (2016)MathSciNetMATH
38.
Zurück zum Zitat Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)CrossRefMATH Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)CrossRefMATH
39.
Zurück zum Zitat Huang, E.: Active Learning Methods for Interactive Exploration on Large Databases. Theses, Institut Polytechnique de Paris (2021) Huang, E.: Active Learning Methods for Interactive Exploration on Large Databases. Theses, Institut Polytechnique de Paris (2021)
40.
Zurück zum Zitat Huang, E., Peng, L., et al.: Optimization for active learning-based interactive database exploration. PVLDB 12(1), 71–84 (2018) Huang, E., Peng, L., et al.: Optimization for active learning-based interactive database exploration. PVLDB 12(1), 71–84 (2018)
41.
42.
Zurück zum Zitat Jacobs, B.E., Walczak, C.A.: A generalized query-by-example data manipulation language based on database logic. IEEE Trans. Softw. Eng. 9(1), 40–57 (1983)CrossRef Jacobs, B.E., Walczak, C.A.: A generalized query-by-example data manipulation language based on database logic. IEEE Trans. Softw. Eng. 9(1), 40–57 (1983)CrossRef
43.
Zurück zum Zitat Kahng, M., et al.: Interactive browsing and navigation in relational databases. PVLDB 9(12), 1017–1028 (2016) Kahng, M., et al.: Interactive browsing and navigation in relational databases. PVLDB 9(12), 1017–1028 (2016)
44.
Zurück zum Zitat Kalinin, A., Cetintemel, U., Zdonik, S.: Interactive data exploration using semantic windows. In: SIGMOD, pp. 505–516 (2014) Kalinin, A., Cetintemel, U., Zdonik, S.: Interactive data exploration using semantic windows. In: SIGMOD, pp. 505–516 (2014)
45.
Zurück zum Zitat Kamat, N., Jayachandran, P., Tunga, K., Nandi, A.: Distributed and interactive cube exploration. In: ICDE, pp. 472–483 (2014) Kamat, N., Jayachandran, P., Tunga, K., Nandi, A.: Distributed and interactive cube exploration. In: ICDE, pp. 472–483 (2014)
46.
Zurück zum Zitat Lay, S.R.: Convex Sets and Their Applications. Dover Publications, Mineola (2007) Lay, S.R.: Convex Sets and Their Applications. Dover Publications, Mineola (2007)
47.
Zurück zum Zitat Li, H., Chan, C.-Y., Maier, D.: Query from examples: an iterative, data-driven approach to query construction. PVLDB 8(13), 2158–2169 (2015) Li, H., Chan, C.-Y., Maier, D.: Query from examples: an iterative, data-driven approach to query construction. PVLDB 8(13), 2158–2169 (2015)
48.
Zurück zum Zitat Lin, C. H., Mausam, Weld, D. S.: Re-active learning: Active learning with relabeling. In AAAI, pages 1845–1852, 2016 Lin, C. H., Mausam, Weld, D. S.: Re-active learning: Active learning with relabeling. In AAAI, pages 1845–1852, 2016
49.
Zurück zum Zitat Liu, W., Diao, Y., Liu, A.: An analysis of query-agnostic sampling for interactive data exploration. Commun. Stat. Theory Methods 47(16), 3820–3837 (2018)MathSciNetCrossRefMATH Liu, W., Diao, Y., Liu, A.: An analysis of query-agnostic sampling for interactive data exploration. Commun. Stat. Theory Methods 47(16), 3820–3837 (2018)MathSciNetCrossRefMATH
50.
Zurück zum Zitat Ma, Y., Garnett, R., Schneider, J.G.: \(\Sigma \)-optimality for active learning on Gaussian random fields. In: NIPS, pp. 2751–2759 (2013) Ma, Y., Garnett, R., Schneider, J.G.: \(\Sigma \)-optimality for active learning on Gaussian random fields. In: NIPS, pp. 2751–2759 (2013)
51.
52.
Zurück zum Zitat Miranda, A.L.B., et al.: Use of classification algorithms in noise detection and elimination. In: HAIS, pp. 417–424 (2009) Miranda, A.L.B., et al.: Use of classification algorithms in noise detection and elimination. In: HAIS, pp. 417–424 (2009)
53.
Zurück zum Zitat Mottin, D., et al.: Exemplar queries: give me an example of what you need. PVLDB 7(5), 365–376 (2014) Mottin, D., et al.: Exemplar queries: give me an example of what you need. PVLDB 7(5), 365–376 (2014)
54.
Zurück zum Zitat Neamtu, R., et al.: Interactive time series analytics powered by ONEX. In: SIGMOD, pp. 1595–1598 (2017) Neamtu, R., et al.: Interactive time series analytics powered by ONEX. In: SIGMOD, pp. 1595–1598 (2017)
55.
Zurück zum Zitat Omidvar-Tehrani, B., Personnaz, A., et al.: Guided text-based item exploration. In: CIKM, pp. 3410–3420. ACM (2022) Omidvar-Tehrani, B., Personnaz, A., et al.: Guided text-based item exploration. In: CIKM, pp. 3410–3420. ACM (2022)
56.
Zurück zum Zitat Özsoyoglu, G., Wang, H.: Example-based graphical database query languages. Computer 26(5), 25–38 (1993)CrossRef Özsoyoglu, G., Wang, H.: Example-based graphical database query languages. Computer 26(5), 25–38 (1993)CrossRef
57.
Zurück zum Zitat Palma, L.D.: New Algorithms and Optimizations for Human-in-the-Loop Model Development. Ph.D. thesis, Polytechnic Institute of Paris, France (2021) Palma, L.D.: New Algorithms and Optimizations for Human-in-the-Loop Model Development. Ph.D. thesis, Polytechnic Institute of Paris, France (2021)
58.
Zurück zum Zitat Qin, X., et al.: Interactively discovering and ranking desired tuples by data exploration. VLDB J. 31(4), 753–777 (2022)CrossRef Qin, X., et al.: Interactively discovering and ranking desired tuples by data exploration. VLDB J. 31(4), 753–777 (2022)CrossRef
59.
Zurück zum Zitat Ratner, A., et al.: Snorkel: rapid training data creation with weak supervision. Proc. VLDB Endow. 11(3), 269–282 (2017)CrossRef Ratner, A., et al.: Snorkel: rapid training data creation with weak supervision. Proc. VLDB Endow. 11(3), 269–282 (2017)CrossRef
60.
Zurück zum Zitat Ratner, A.J., et al.: Data programming: creating large training sets, quickly. In: NeurIPS, pp. 3567–3575 (2016) Ratner, A.J., et al.: Data programming: creating large training sets, quickly. In: NeurIPS, pp. 3567–3575 (2016)
61.
Zurück zum Zitat Roy, S.B., et al.: Minimum-effort driven dynamic faceted search in structured databases. In: CIKM, pp. 13–22 (2008) Roy, S.B., et al.: Minimum-effort driven dynamic faceted search in structured databases. In: CIKM, pp. 13–22 (2008)
62.
Zurück zum Zitat Roy, S.B., et al.: DynaCet: building dynamic faceted search systems over databases. In: ICDE, pp. 1463–1466 (2009) Roy, S.B., et al.: DynaCet: building dynamic faceted search systems over databases. In: ICDE, pp. 1463–1466 (2009)
63.
Zurück zum Zitat Santos, A.S.R., et al.: A sketch-based index for correlated dataset search. In: ICDE, pp. 2928–2941 (2022) Santos, A.S.R., et al.: A sketch-based index for correlated dataset search. In: ICDE, pp. 2928–2941 (2022)
64.
Zurück zum Zitat Scott, C., et al.: Classification with asymmetric label noise: consistency and maximal denoising. In: COLT, pp. 489–511 (2013) Scott, C., et al.: Classification with asymmetric label noise: consistency and maximal denoising. In: COLT, pp. 489–511 (2013)
65.
Zurück zum Zitat Seleznova, M., et al.: Guided exploration of user groups. PVLDB 13(9), 1469–1482 (2020) Seleznova, M., et al.: Guided exploration of user groups. PVLDB 13(9), 1469–1482 (2020)
66.
Zurück zum Zitat Settles, B.: Active Learning. Synthesis Lectures on Artificial Intelligence & Machine Learning. Morgan Claypool Publishers, New York (2012)MATH Settles, B.: Active Learning. Synthesis Lectures on Artificial Intelligence & Machine Learning. Morgan Claypool Publishers, New York (2012)MATH
67.
Zurück zum Zitat Shen, Y., et al.: Discovering queries based on example tuples. In: SIGMOD, pp. 493–504 (2014) Shen, Y., et al.: Discovering queries based on example tuples. In: SIGMOD, pp. 493–504 (2014)
69.
Zurück zum Zitat Szalay, A., et al.: Designing and mining multi-terabyte astronomy archives: the Sloan digital sky survey. SIGMOD 451–462 (2000) Szalay, A., et al.: Designing and mining multi-terabyte astronomy archives: the Sloan digital sky survey. SIGMOD 451–462 (2000)
70.
Zurück zum Zitat Tang, B., et al.: Determining the impact regions of competing options in preference space. In: SIGMOD, pp. 805–820 (2017) Tang, B., et al.: Determining the impact regions of competing options in preference space. In: SIGMOD, pp. 805–820 (2017)
71.
Zurück zum Zitat Tang, F.: Bidirectional active learning with gold-instance-based human training. In: IJCAI, pp. 5989–5996 (2019) Tang, F.: Bidirectional active learning with gold-instance-based human training. In: IJCAI, pp. 5989–5996 (2019)
72.
Zurück zum Zitat Tang, Y., et al.: Svms modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B 39(1), 281–288 (2009)CrossRef Tang, Y., et al.: Svms modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B 39(1), 281–288 (2009)CrossRef
73.
Zurück zum Zitat Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. JMLR 2, 45–66 (2002)MATH Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. JMLR 2, 45–66 (2002)MATH
74.
Zurück zum Zitat Tran, Q.T., Chan, C.-Y., Parthasarathy, S.: Query by output. In: SIGMOD, pp. 535–548 (2009) Tran, Q.T., Chan, C.-Y., Parthasarathy, S.: Query by output. In: SIGMOD, pp. 535–548 (2009)
75.
Zurück zum Zitat Vanchinathan, H.P., et al.: Discovering valuable items from massive data. In: SIGKDD, pp. 1195–1204 (2015) Vanchinathan, H.P., et al.: Discovering valuable items from massive data. In: SIGKDD, pp. 1195–1204 (2015)
76.
Zurück zum Zitat Varma, P., et al.: Inferring generative model structure with static analysis. In: NeurIPS, pp. 240–250 (2017) Varma, P., et al.: Inferring generative model structure with static analysis. In: NeurIPS, pp. 240–250 (2017)
77.
Zurück zum Zitat Varma, P., Ré, C.: Snuba: automating weak supervision to label training data. Proc. VLDB Endow. 12(3), 223–236 (2018)CrossRef Varma, P., Ré, C.: Snuba: automating weak supervision to label training data. Proc. VLDB Endow. 12(3), 223–236 (2018)CrossRef
78.
Zurück zum Zitat Ventura, F., et al.: Expand your training limits! generating training data for ml-based data management. SIGMOD 1865–1878 (2021) Ventura, F., et al.: Expand your training limits! generating training data for ml-based data management. SIGMOD 1865–1878 (2021)
79.
Zurück zum Zitat Wallace, B.C., Dahabreh, I.J.: Class probability estimates are unreliable for imbalanced data (and how to fix them). In: ICDM, pp. 695–704. IEEE Computer Society (2012) Wallace, B.C., Dahabreh, I.J.: Class probability estimates are unreliable for imbalanced data (and how to fix them). In: ICDM, pp. 695–704. IEEE Computer Society (2012)
80.
Zurück zum Zitat Wu, G., Chang, E.Y.: KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17(6), 786–795 (2005)CrossRef Wu, G., Chang, E.Y.: KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17(6), 786–795 (2005)CrossRef
81.
Zurück zum Zitat Youngmann, B., Amer-Yahia, S., Personnaz, A.: Guided exploration of data summaries. PVLDB 15(9), 1798–1807 (2022) Youngmann, B., Amer-Yahia, S., Personnaz, A.: Guided exploration of data summaries. PVLDB 15(9), 1798–1807 (2022)
82.
Zurück zum Zitat Zhang, J., Wu, X., et al.: Active learning with imbalanced multiple noisy labeling. IEEE Trans. Cybern. 45(5), 1081–1093 (2015) Zhang, J., Wu, X., et al.: Active learning with imbalanced multiple noisy labeling. IEEE Trans. Cybern. 45(5), 1081–1093 (2015)
83.
Zurück zum Zitat Zhang, X., Wang, S., Yun, X.: Bidirectional active learning: a two-way exploration into unlabeled and labeled data set. IEEE Trans. Neural Netw. Learn. Syst. 26(12), 3034–3044 (2015)MathSciNetCrossRef Zhang, X., Wang, S., Yun, X.: Bidirectional active learning: a two-way exploration into unlabeled and labeled data set. IEEE Trans. Neural Netw. Learn. Syst. 26(12), 3034–3044 (2015)MathSciNetCrossRef
84.
Zurück zum Zitat Zhao, Z., et al.: Controlling false discoveries during interactive data exploration. In: SIGMOD, pp. 527–540 (2017) Zhao, Z., et al.: Controlling false discoveries during interactive data exploration. In: SIGMOD, pp. 527–540 (2017)
85.
Zurück zum Zitat Zhu, X., et al.: Budget constrained interactive search for multiple targets. PVLDB 14(6), 890–902 (2021) Zhu, X., et al.: Budget constrained interactive search for multiple targets. PVLDB 14(6), 890–902 (2021)
Metadaten
Titel
Efficient and robust active learning methods for interactive database exploration
verfasst von
Enhui Huang
Yanlei Diao
Anna Liu
Liping Peng
Luciano Di Palma
Publikationsdatum
16.11.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-023-00816-x