Abstract
Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume time series data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Existing solutions for lowering the volume of time series data disregard the semantics of visualizations and result in visualization errors.
In this work, we introduce M4, an aggregation-based time series dimensionality reduction technique that provides error-free visualizations at high data reduction rates. Focusing on line charts, as the predominant form of time series visualization, we explain in detail the drawbacks of existing data reduction techniques and how our approach outperforms state of the art, by respecting the process of line rasterization.
We describe how to incorporate aggregation-based dimensionality reduction at the query level in a visualization-driven query rewriting system. Our approach is generic and applicable to any visualization system that uses an RDBMS as data source. Using real world data sets from high tech manufacturing, stock markets, and sports analytics domains we demonstrate that our visualization-oriented data aggregation can reduce data volumes by up to two orders of magnitude, while preserving perfect visualizations.
- S. Agarwal, A. Panda, B. Mozafari, A. P. Iyer, S. Madden, and I. Stoica. Blink and it's done: Interactive queries on very large data. PVLDB, 5(12):1902--1905, 2012. Google ScholarDigital Library
- J. E. Bresenham. Algorithm for computer control of a digital plotter. IBM Systems journal, 4(1):25--30, 1965. Google ScholarDigital Library
- G. Burtini, S. Fazackerley, and R. Lawrence. Time series compression for adaptive chart generation. In CCECE, pages 1--6. IEEE, 2013.Google Scholar
- J. X. Chen and X. Wang. Approximate line scan-conversion and antialiasing. In Computer Graphics Forum, pages 69--78. Wiley, 1999.Google ScholarCross Ref
- David Salomon. Data Compression. Springer, 2007. Google ScholarDigital Library
- D. H. Douglas and T. K. Peucker. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica Journal, 10(2):112--122, 1973.Google ScholarCross Ref
- Q. Duan, P. Wang, M. Wu, W. Wang, and S. Huang. Approximate query on historical stream data. In DEXA, pages 128--135. Springer, 2011. Google ScholarDigital Library
- S. G. Eick and A. F. Karr. Visual scalability. Journal of Computational and Graphical Statistics, 11(1):22--43, 2002.Google ScholarCross Ref
- P. Esling and C. Agon. Time-series data mining. ACM Computing Surveys, 45(1):12--34, 2012. Google ScholarDigital Library
- F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner. SAP HANA Database-Data Management for Modern Business Applications. SIGMOD Record, 40(4):45--51, 2012. Google ScholarDigital Library
- T. Fu. A review on time series data mining. EAAI Journal, 24(1):164--181, 2011. Google ScholarDigital Library
- T. Fu, F. Chung, R. Luk, and C. Ng. Representing financial time series based on data point importance. EAAI Journal, 21(2):277--300, 2008. Google ScholarDigital Library
- S. Gandhi, L. Foschini, and S. Suri. Space-efficient online approximation of time series data: Streams, amnesia, and out-of-order. In ICDE, pages 924--935. IEEE, 2010.Google ScholarCross Ref
- J. Hershberger and J. Snoeyink. Speeding up the Douglas-Peucker line-simplification algorithm. University of British Columbia, Department of Computer Science, 1992.Google ScholarDigital Library
- Z. Jerzak, T. Heinze, M. Fehr, D. Gröber, R. Hartung, and N. Stojanovic. The DEBS 2012 Grand Challenge. In DEBS, pages 393--398. ACM, 2012. Google ScholarDigital Library
- U. Jugel and V. Markl. Interactive visualization of high-velocity event streams. In VLDB PhD Workshop. VLDB Endowment, 2012.Google Scholar
- D. A. Keim, C. Panse, J. Schneidewind, M. Sips, M. C. Hao, and U. Dayal. Pushing the limit in visual data exploration: Techniques and applications. Lecture notes in artificial intelligence, (2821):37--51, 2003.Google Scholar
- E. J. Keogh and Pazzani. A simple dimensionality reduction technique for fast similarity search in large time series databases. In PAKDD, pages 122--133. Springer, 2000. Google ScholarDigital Library
- A. Kolesnikov. Efficient algorithms for vectorization and polygonal approximation. University of Joensuu, 2003.Google Scholar
- P. Lindstrom and M. Isenburg. Fast and efficient compression of floating-point data. In TVCG, volume 12, pages 1245--1250. IEEE, 2006. Google ScholarDigital Library
- W.-Y. Ma, I. Bedner, G. Chang, A. Kuchinsky, and H. Zhang. A framework for adaptive content delivery in heterogeneous network environments. In Proc. SPIE, Multimedia Computing and Networking, volume 3969, pages 86--100. SPIE, 2000.Google Scholar
- C. Mutschler, H. Ziekow, and Z. Jerzak. The DEBS 2013 Grand Challenge. In DEBS, pages 289--294. ACM, 2013. Google ScholarDigital Library
- P. Przymus, A. Boniewicz, M. Burzańska, and K. Stencel. Recursive query facilities in relational databases: a survey. In DTA and BSBT, pages 89--99. Springer, 2010.Google ScholarCross Ref
- K. Reumann and A. P. M. Witkam. Optimizing curve segmentation in computer graphics. In Proceedings of the International Computing Symposium, pages 467--472. North-Holland Publishing Company, 1974.Google Scholar
- W. Shi and C. Cheung. Performance evaluation of line simplification algorithms for vector generalization. The Cartographic Journal, 43(1):27--44, 2006.Google ScholarCross Ref
- M. Visvalingam and J. Whyatt. Line generalisation by repeated elimination of points. The Cartographic Journal, 30(1):46--51, 1993.Google ScholarCross Ref
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600--612, 2004. Google ScholarDigital Library
- R. Wesley, M. Eldridge, and P. Terlecki. An analytic data engine for visualization in tableau. In SIGMOD, pages 1185--1194. ACM, 2011. Google ScholarDigital Library
- Y. Wu, D. Agrawal, and A. El Abbadi. A comparison of DFT and DWT based similarity search in timeseries databases. In CIKM, pages 488--495. ACM, 2000. Google ScholarDigital Library
Recommendations
VDDA: automatic visualization-driven data aggregation in relational databases
Contemporary RDBMS-based systems for visualization of high-volume numerical data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Existing solutions for lowering the volume of large data ...
Empirical Guidance on Scatterplot and Dimension Reduction Technique Choices
To verify cluster separation in high-dimensional data, analysts often reduce the data with a dimension reduction (DR) technique, and then visualize it with 2D Scatterplots, interactive 3D Scatterplots, or Scatterplot Matrices (SPLOMs). With the goal of ...
Visualization schemas and a web-based architecture for custom multiple-view visualization of multiple-table databases
Relational databases provide significant flexibility to organize, store, and manipulate an infinite variety of complex data collections. This flexibility is enabled by the concept of relational data schemas, which allow data owners to easily design ...
Comments