research-article

SeeDB: efficient data-driven visualization recommendations to support visual analytics

Authors:
Manasi Vartak

MIT

MIT
View Profile

,
Sajjadur Rahman

University of Illinois (UIUC)

University of Illinois (UIUC)
View Profile

,
Samuel Madden

MIT

MIT
View Profile

,
Aditya Parameswaran

University of Illinois (UIUC)

University of Illinois (UIUC)
View Profile

,
Neoklis Polyzotis

Google

Google
View Profile

Proceedings of the VLDB Endowment Volume 8 Issue 13pp 2182–2193https://doi.org/10.14778/2831360.2831371

Published:01 September 2015Publication History

Proceedings of the VLDB Endowment

Abstract

Data analysts often build visualizations as the first step in their analytical workflow. However, when working with high-dimensional datasets, identifying visualizations that show relevant or desired trends in data can be laborious. We propose SeeDB, a visualization recommendation engine to facilitate fast visual analysis: given a subset of data to be studied, SeeDB intelligently explores the space of visualizations, evaluates promising visualizations for trends, and recommends those it deems most "useful" or "interesting". The two major obstacles in recommending interesting visualizations are (a) scale: evaluating a large number of candidate visualizations while responding within interactive time scales, and (b) utility: identifying an appropriate metric for assessing interestingness of visualizations. For the former, SeeDB introduces pruning optimizations to quickly identify high-utility visualizations and sharing optimizations to maximize sharing of computation across visualizations. For the latter, as a first step, we adopt a deviation-based metric for visualization utility, while indicating how we may be able to generalize it to other factors influencing utility. We implement SeeDB as a middleware layer that can run on top of any DBMS. Our experiments show that our framework can identify interesting visualizations with high accuracy. Our optimizations lead to multiple orders of magnitude speedup on relational row and column stores and provide recommendations at interactive time scales. Finally, we demonstrate via a user study the effectiveness of our deviation-based utility metric and the value of recommendations in supporting visual analytics.

References

Tableau public, www.tableaupublic.com. {Online; accessed 3-March-2014}.Google Scholar
S. Agarwal, R. Agrawal, P. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. VLDB '96, pages 506--521, 1996. Google ScholarDigital Library
C. Ahlberg. Spotfire: An information exploration environment. SIGMOD Rec., 25(4):25--29, Dec. 1996. Google ScholarDigital Library
J.-Y. Audibert, S. Bubeck, et al. Best arm identification in multi-armed bandits. COLT 2010-Proceedings, 2010.Google Scholar
S. Bubeck, T. Wang, and N. Viswanathan. Multiple identifications in multi-armed bandits. In Proceedings of the Thirtieth International Conference on Machine Learning, JMLR '13, 2013.Google Scholar
P. R. Doshi, E. A. Rundensteiner, and M. O. Ward. Prefetching for visual data exploration. In DASFAA 2003, pages 195--202. IEEE, 2003. Google ScholarDigital Library
P. M. Fernandez. Red brick warehouse: A read-mostly rdbms for open smp platforms. SIGMOD Rec., 23(2):492-, May 1994. Google ScholarDigital Library
H. Gonzalez et al. Google fusion tables: web-centered data management and collaboration. In SIGMOD Conference, pages 1061--1066, 2010. Google ScholarDigital Library
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov., 1(1):29--53, Jan. 1997. Google ScholarDigital Library
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. SIGMOD '96, pages 205--216, 1996. Google ScholarDigital Library
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301):13--30, 1963.Google Scholar
E. Horvitz. Principles of mixed-initiative user interfaces. CHI'99, pages 159--166. ACM, 1999. Google ScholarDigital Library
I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. The VLDB Journal, 13(3):207--221, Sept. 2004. Google ScholarDigital Library
D. S. Johnson, A. Demers, J. D. Ullman, M. R. Garey, and R. L. Graham. Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM Journal on Computing, 3(4):299--325, 1974.Google ScholarDigital Library
S. Kandel et al. Profiler: integrated statistical analysis and visualization for data quality assessment. In AVI, pages 547--554, 2012. Google ScholarDigital Library
A. Key, B. Howe, D. Perry, and C. Aragon. Vizdeck: Self-organizing dashboards for visual analytics. SIGMOD '12, pages 681--684, 2012. Google ScholarDigital Library
A. Kim, E. Blais, A. G. Parameswaran, P. Indyk, S. Madden, and R. Rubinfeld. Rapid sampling for visualizations with ordering guarantees. CoRR, abs/1412.3040, 2014. Google ScholarDigital Library
M. Garey et. al. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1990. Google ScholarDigital Library
J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Trans. Graph., 5(2):110--141, Apr. 1986. Google ScholarDigital Library
J. D. Mackinlay et al. Show me: Automatic presentation for visual analysis. IEEE Trans. Vis. Comput. Graph., 13(6):1137--1144, 2007. Google ScholarDigital Library
P. Marcel and E. Negre. A survey of query recommendation techniques for data warehouse exploration. In EDA, pages 119--134, 2011.Google Scholar
K. Morton, M. Balazinska, D. Grossman, and J. D. Mackinlay. Support the data enthusiast: Challenges for next-generation data-analysis systems. PVLDB, 7(6):453--456, 2014. Google ScholarDigital Library
C. Ordonez and Z. Chen. Exploration and visualization of olap cubes with statistical tests. VAKD '09, pages 46--55, 2009. Google ScholarDigital Library
A. Parameswaran, N. Polyzotis, and H. Garcia-Molina. Seedb: Visualizing database queries efficiently. PVLDB, 7(4), 2013. Google ScholarDigital Library
J. Polowinski and M. Voigt. Viso: a shared, formal knowledge base as a foundation for semi-automatic infovis systems. In CHI'13 Extended Abstracts on Human Factors in Computing Systems, pages 1791--1796. ACM, 2013. Google ScholarDigital Library
Postgresql.org. Postgresql: Number of database connections, 2014. {Online; accessed 20-May-2014}.Google Scholar
C. Re, N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In in ICDE, pages 886--895, 2007.Google ScholarCross Ref
U. M. L. Repository. Uci machine learning repository, 2015. {Online; accessed 29-June-2015}.Google Scholar
S. Sarawagi. Explaining differences in multidimensional aggregates. In VLDB, pages 42--53, 1999. Google ScholarDigital Library
S. Sarawagi. User-adaptive exploration of multidimensional data. In VLDB, pages 307--316, 2000.Google Scholar
S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of olap data cubes. EDBT '98, pages 168--182, 1998. Google ScholarDigital Library
G. Sathe and S. Sarawagi. Intelligent rollups in multidimensional olap data. In VLDB, pages 531--540, 2001. Google ScholarDigital Library
T. K. Sellis. Multiple-query optimization. ACM TODS, 13(1):23--52, 1988. Google ScholarDigital Library
R. J. Serfling et al. Probability inequalities for the sum in sampling without replacement. The Annals of Statistics, 2(1):39--48, 1974.Google ScholarCross Ref
C. Stolte et al. Polaris: a system for query, analysis, and visualization of multidimensional databases. CACM, 51(11):75--84, 2008. Google ScholarDigital Library
M. Vartak, S. Madden, A. G. Parameswaran, and N. Polyzotis. SEEDB: automatically generating query visualizations. PVLDB, 7(13):1581--1584, 2014. Google ScholarDigital Library
M. Vartak, S. Rahman, S. Madden, A. G. Parameswaran, and N. Polyzotis. Seedb: Efficient data-driven visualization recommendations to support visual analytics. Technical Report, data-people.cs.illinois.edu/seedb-tr.pdf.Google Scholar
J. Vermorel and M. Mohri. Multi-armed bandit algorithms and empirical evaluation. In ECML, pages 437--448, 2005. Google ScholarDigital Library
K. Wongsuphasawat, D. Moritz, A. Anand, J. Mackinlay, B. Howe, and J. Heer. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE Trans. Visualization & Comp. Graphics, 2015.Google Scholar
E. Wu and S. Madden. Scorpion: Explaining away outliers in aggregate queries. Proc. VLDB Endow., 6(8):553--564, June 2013. Google ScholarDigital Library

Index Terms

SeeDB: efficient data-driven visualization recommendations to support visual analytics
1. Information systems

Recommendations

SeeDB: automatically generating query visualizations

Data analysts operating on large volumes of data often rely on visualizations to interpret the results of queries. However, finding the right visualization for a query is a laborious and time-consuming task. We demonstrate SeeDB, a system that partially ...
Read More
SeeDB: visualizing database queries efficiently

Data scientists rely on visualizations to interpret the data returned by queries, but finding the right visualization remains a manual task that is often laborious. We propose a DBMS that partially automates the task of finding the right visualizations ...
Read More
A Model and Framework for Visualization Exploration

Visualization exploration is the process of extracting insight from data via interaction with visual depictions of that data. Visualization exploration is more than presentation; the interaction with both the data and its depiction is as important as ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 8, Issue 13
Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii
September 2015
144 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2015
Published in pvldb Volume 8, Issue 13
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 126
  Total Citations
  View Citations
- 719
  Total Downloads
- Downloads (Last 12 months)129
- Downloads (Last 6 weeks)82
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SeeDB: efficient data-driven visualization recommendations to support visual analytics

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

SeeDB: automatically generating query visualizations

SeeDB: visualizing database queries efficiently

A Model and Framework for Visualization Exploration

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SeeDB: efficient data-driven visualization recommendations to support visual analytics

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

SeeDB: automatically generating query visualizations

SeeDB: visualizing database queries efficiently

A Model and Framework for Visualization Exploration

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media