skip to main content
research-article

SeeDB: efficient data-driven visualization recommendations to support visual analytics

Published:01 September 2015Publication History
Skip Abstract Section

Abstract

Data analysts often build visualizations as the first step in their analytical workflow. However, when working with high-dimensional datasets, identifying visualizations that show relevant or desired trends in data can be laborious. We propose SeeDB, a visualization recommendation engine to facilitate fast visual analysis: given a subset of data to be studied, SeeDB intelligently explores the space of visualizations, evaluates promising visualizations for trends, and recommends those it deems most "useful" or "interesting". The two major obstacles in recommending interesting visualizations are (a) scale: evaluating a large number of candidate visualizations while responding within interactive time scales, and (b) utility: identifying an appropriate metric for assessing interestingness of visualizations. For the former, SeeDB introduces pruning optimizations to quickly identify high-utility visualizations and sharing optimizations to maximize sharing of computation across visualizations. For the latter, as a first step, we adopt a deviation-based metric for visualization utility, while indicating how we may be able to generalize it to other factors influencing utility. We implement SeeDB as a middleware layer that can run on top of any DBMS. Our experiments show that our framework can identify interesting visualizations with high accuracy. Our optimizations lead to multiple orders of magnitude speedup on relational row and column stores and provide recommendations at interactive time scales. Finally, we demonstrate via a user study the effectiveness of our deviation-based utility metric and the value of recommendations in supporting visual analytics.

References

  1. Tableau public, www.tableaupublic.com. {Online; accessed 3-March-2014}.Google ScholarGoogle Scholar
  2. S. Agarwal, R. Agrawal, P. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. VLDB '96, pages 506--521, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Ahlberg. Spotfire: An information exploration environment. SIGMOD Rec., 25(4):25--29, Dec. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J.-Y. Audibert, S. Bubeck, et al. Best arm identification in multi-armed bandits. COLT 2010-Proceedings, 2010.Google ScholarGoogle Scholar
  5. S. Bubeck, T. Wang, and N. Viswanathan. Multiple identifications in multi-armed bandits. In Proceedings of the Thirtieth International Conference on Machine Learning, JMLR '13, 2013.Google ScholarGoogle Scholar
  6. P. R. Doshi, E. A. Rundensteiner, and M. O. Ward. Prefetching for visual data exploration. In DASFAA 2003, pages 195--202. IEEE, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. M. Fernandez. Red brick warehouse: A read-mostly rdbms for open smp platforms. SIGMOD Rec., 23(2):492-, May 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Gonzalez et al. Google fusion tables: web-centered data management and collaboration. In SIGMOD Conference, pages 1061--1066, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov., 1(1):29--53, Jan. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. SIGMOD '96, pages 205--216, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301):13--30, 1963.Google ScholarGoogle Scholar
  12. E. Horvitz. Principles of mixed-initiative user interfaces. CHI'99, pages 159--166. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. The VLDB Journal, 13(3):207--221, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. S. Johnson, A. Demers, J. D. Ullman, M. R. Garey, and R. L. Graham. Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM Journal on Computing, 3(4):299--325, 1974.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Kandel et al. Profiler: integrated statistical analysis and visualization for data quality assessment. In AVI, pages 547--554, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Key, B. Howe, D. Perry, and C. Aragon. Vizdeck: Self-organizing dashboards for visual analytics. SIGMOD '12, pages 681--684, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Kim, E. Blais, A. G. Parameswaran, P. Indyk, S. Madden, and R. Rubinfeld. Rapid sampling for visualizations with ordering guarantees. CoRR, abs/1412.3040, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Garey et. al. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Trans. Graph., 5(2):110--141, Apr. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. D. Mackinlay et al. Show me: Automatic presentation for visual analysis. IEEE Trans. Vis. Comput. Graph., 13(6):1137--1144, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Marcel and E. Negre. A survey of query recommendation techniques for data warehouse exploration. In EDA, pages 119--134, 2011.Google ScholarGoogle Scholar
  22. K. Morton, M. Balazinska, D. Grossman, and J. D. Mackinlay. Support the data enthusiast: Challenges for next-generation data-analysis systems. PVLDB, 7(6):453--456, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Ordonez and Z. Chen. Exploration and visualization of olap cubes with statistical tests. VAKD '09, pages 46--55, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Parameswaran, N. Polyzotis, and H. Garcia-Molina. Seedb: Visualizing database queries efficiently. PVLDB, 7(4), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Polowinski and M. Voigt. Viso: a shared, formal knowledge base as a foundation for semi-automatic infovis systems. In CHI'13 Extended Abstracts on Human Factors in Computing Systems, pages 1791--1796. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Postgresql.org. Postgresql: Number of database connections, 2014. {Online; accessed 20-May-2014}.Google ScholarGoogle Scholar
  27. C. Re, N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In in ICDE, pages 886--895, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  28. U. M. L. Repository. Uci machine learning repository, 2015. {Online; accessed 29-June-2015}.Google ScholarGoogle Scholar
  29. S. Sarawagi. Explaining differences in multidimensional aggregates. In VLDB, pages 42--53, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Sarawagi. User-adaptive exploration of multidimensional data. In VLDB, pages 307--316, 2000.Google ScholarGoogle Scholar
  31. S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of olap data cubes. EDBT '98, pages 168--182, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. G. Sathe and S. Sarawagi. Intelligent rollups in multidimensional olap data. In VLDB, pages 531--540, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. K. Sellis. Multiple-query optimization. ACM TODS, 13(1):23--52, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. J. Serfling et al. Probability inequalities for the sum in sampling without replacement. The Annals of Statistics, 2(1):39--48, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  35. C. Stolte et al. Polaris: a system for query, analysis, and visualization of multidimensional databases. CACM, 51(11):75--84, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Vartak, S. Madden, A. G. Parameswaran, and N. Polyzotis. SEEDB: automatically generating query visualizations. PVLDB, 7(13):1581--1584, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Vartak, S. Rahman, S. Madden, A. G. Parameswaran, and N. Polyzotis. Seedb: Efficient data-driven visualization recommendations to support visual analytics. Technical Report, data-people.cs.illinois.edu/seedb-tr.pdf.Google ScholarGoogle Scholar
  38. J. Vermorel and M. Mohri. Multi-armed bandit algorithms and empirical evaluation. In ECML, pages 437--448, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. Wongsuphasawat, D. Moritz, A. Anand, J. Mackinlay, B. Howe, and J. Heer. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE Trans. Visualization & Comp. Graphics, 2015.Google ScholarGoogle Scholar
  40. E. Wu and S. Madden. Scorpion: Explaining away outliers in aggregate queries. Proc. VLDB Endow., 6(8):553--564, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SeeDB: efficient data-driven visualization recommendations to support visual analytics

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the VLDB Endowment
          Proceedings of the VLDB Endowment  Volume 8, Issue 13
          Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii
          September 2015
          144 pages

          Publisher

          VLDB Endowment

          Publication History

          • Published: 1 September 2015
          Published in pvldb Volume 8, Issue 13

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader