Abstract
Data analysts often build visualizations as the first step in their analytical workflow. However, when working with high-dimensional datasets, identifying visualizations that show relevant or desired trends in data can be laborious. We propose SeeDB, a visualization recommendation engine to facilitate fast visual analysis: given a subset of data to be studied, SeeDB intelligently explores the space of visualizations, evaluates promising visualizations for trends, and recommends those it deems most "useful" or "interesting". The two major obstacles in recommending interesting visualizations are (a) scale: evaluating a large number of candidate visualizations while responding within interactive time scales, and (b) utility: identifying an appropriate metric for assessing interestingness of visualizations. For the former, SeeDB introduces pruning optimizations to quickly identify high-utility visualizations and sharing optimizations to maximize sharing of computation across visualizations. For the latter, as a first step, we adopt a deviation-based metric for visualization utility, while indicating how we may be able to generalize it to other factors influencing utility. We implement SeeDB as a middleware layer that can run on top of any DBMS. Our experiments show that our framework can identify interesting visualizations with high accuracy. Our optimizations lead to multiple orders of magnitude speedup on relational row and column stores and provide recommendations at interactive time scales. Finally, we demonstrate via a user study the effectiveness of our deviation-based utility metric and the value of recommendations in supporting visual analytics.
- Tableau public, www.tableaupublic.com. {Online; accessed 3-March-2014}.Google Scholar
- S. Agarwal, R. Agrawal, P. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. VLDB '96, pages 506--521, 1996. Google ScholarDigital Library
- C. Ahlberg. Spotfire: An information exploration environment. SIGMOD Rec., 25(4):25--29, Dec. 1996. Google ScholarDigital Library
- J.-Y. Audibert, S. Bubeck, et al. Best arm identification in multi-armed bandits. COLT 2010-Proceedings, 2010.Google Scholar
- S. Bubeck, T. Wang, and N. Viswanathan. Multiple identifications in multi-armed bandits. In Proceedings of the Thirtieth International Conference on Machine Learning, JMLR '13, 2013.Google Scholar
- P. R. Doshi, E. A. Rundensteiner, and M. O. Ward. Prefetching for visual data exploration. In DASFAA 2003, pages 195--202. IEEE, 2003. Google ScholarDigital Library
- P. M. Fernandez. Red brick warehouse: A read-mostly rdbms for open smp platforms. SIGMOD Rec., 23(2):492-, May 1994. Google ScholarDigital Library
- H. Gonzalez et al. Google fusion tables: web-centered data management and collaboration. In SIGMOD Conference, pages 1061--1066, 2010. Google ScholarDigital Library
- J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov., 1(1):29--53, Jan. 1997. Google ScholarDigital Library
- V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. SIGMOD '96, pages 205--216, 1996. Google ScholarDigital Library
- W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301):13--30, 1963.Google Scholar
- E. Horvitz. Principles of mixed-initiative user interfaces. CHI'99, pages 159--166. ACM, 1999. Google ScholarDigital Library
- I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. The VLDB Journal, 13(3):207--221, Sept. 2004. Google ScholarDigital Library
- D. S. Johnson, A. Demers, J. D. Ullman, M. R. Garey, and R. L. Graham. Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM Journal on Computing, 3(4):299--325, 1974.Google ScholarDigital Library
- S. Kandel et al. Profiler: integrated statistical analysis and visualization for data quality assessment. In AVI, pages 547--554, 2012. Google ScholarDigital Library
- A. Key, B. Howe, D. Perry, and C. Aragon. Vizdeck: Self-organizing dashboards for visual analytics. SIGMOD '12, pages 681--684, 2012. Google ScholarDigital Library
- A. Kim, E. Blais, A. G. Parameswaran, P. Indyk, S. Madden, and R. Rubinfeld. Rapid sampling for visualizations with ordering guarantees. CoRR, abs/1412.3040, 2014. Google ScholarDigital Library
- M. Garey et. al. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1990. Google ScholarDigital Library
- J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Trans. Graph., 5(2):110--141, Apr. 1986. Google ScholarDigital Library
- J. D. Mackinlay et al. Show me: Automatic presentation for visual analysis. IEEE Trans. Vis. Comput. Graph., 13(6):1137--1144, 2007. Google ScholarDigital Library
- P. Marcel and E. Negre. A survey of query recommendation techniques for data warehouse exploration. In EDA, pages 119--134, 2011.Google Scholar
- K. Morton, M. Balazinska, D. Grossman, and J. D. Mackinlay. Support the data enthusiast: Challenges for next-generation data-analysis systems. PVLDB, 7(6):453--456, 2014. Google ScholarDigital Library
- C. Ordonez and Z. Chen. Exploration and visualization of olap cubes with statistical tests. VAKD '09, pages 46--55, 2009. Google ScholarDigital Library
- A. Parameswaran, N. Polyzotis, and H. Garcia-Molina. Seedb: Visualizing database queries efficiently. PVLDB, 7(4), 2013. Google ScholarDigital Library
- J. Polowinski and M. Voigt. Viso: a shared, formal knowledge base as a foundation for semi-automatic infovis systems. In CHI'13 Extended Abstracts on Human Factors in Computing Systems, pages 1791--1796. ACM, 2013. Google ScholarDigital Library
- Postgresql.org. Postgresql: Number of database connections, 2014. {Online; accessed 20-May-2014}.Google Scholar
- C. Re, N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In in ICDE, pages 886--895, 2007.Google ScholarCross Ref
- U. M. L. Repository. Uci machine learning repository, 2015. {Online; accessed 29-June-2015}.Google Scholar
- S. Sarawagi. Explaining differences in multidimensional aggregates. In VLDB, pages 42--53, 1999. Google ScholarDigital Library
- S. Sarawagi. User-adaptive exploration of multidimensional data. In VLDB, pages 307--316, 2000.Google Scholar
- S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of olap data cubes. EDBT '98, pages 168--182, 1998. Google ScholarDigital Library
- G. Sathe and S. Sarawagi. Intelligent rollups in multidimensional olap data. In VLDB, pages 531--540, 2001. Google ScholarDigital Library
- T. K. Sellis. Multiple-query optimization. ACM TODS, 13(1):23--52, 1988. Google ScholarDigital Library
- R. J. Serfling et al. Probability inequalities for the sum in sampling without replacement. The Annals of Statistics, 2(1):39--48, 1974.Google ScholarCross Ref
- C. Stolte et al. Polaris: a system for query, analysis, and visualization of multidimensional databases. CACM, 51(11):75--84, 2008. Google ScholarDigital Library
- M. Vartak, S. Madden, A. G. Parameswaran, and N. Polyzotis. SEEDB: automatically generating query visualizations. PVLDB, 7(13):1581--1584, 2014. Google ScholarDigital Library
- M. Vartak, S. Rahman, S. Madden, A. G. Parameswaran, and N. Polyzotis. Seedb: Efficient data-driven visualization recommendations to support visual analytics. Technical Report, data-people.cs.illinois.edu/seedb-tr.pdf.Google Scholar
- J. Vermorel and M. Mohri. Multi-armed bandit algorithms and empirical evaluation. In ECML, pages 437--448, 2005. Google ScholarDigital Library
- K. Wongsuphasawat, D. Moritz, A. Anand, J. Mackinlay, B. Howe, and J. Heer. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE Trans. Visualization & Comp. Graphics, 2015.Google Scholar
- E. Wu and S. Madden. Scorpion: Explaining away outliers in aggregate queries. Proc. VLDB Endow., 6(8):553--564, June 2013. Google ScholarDigital Library
Index Terms
- SeeDB: efficient data-driven visualization recommendations to support visual analytics
Recommendations
SeeDB: automatically generating query visualizations
Data analysts operating on large volumes of data often rely on visualizations to interpret the results of queries. However, finding the right visualization for a query is a laborious and time-consuming task. We demonstrate SeeDB, a system that partially ...
SeeDB: visualizing database queries efficiently
Data scientists rely on visualizations to interpret the data returned by queries, but finding the right visualization remains a manual task that is often laborious. We propose a DBMS that partially automates the task of finding the right visualizations ...
A Model and Framework for Visualization Exploration
Visualization exploration is the process of extracting insight from data via interaction with visual depictions of that data. Visualization exploration is more than presentation; the interaction with both the data and its depiction is as important as ...
Comments