skip to main content
research-article

M4: a visualization-oriented time series data aggregation

Published:01 June 2014Publication History
Skip Abstract Section

Abstract

Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume time series data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Existing solutions for lowering the volume of time series data disregard the semantics of visualizations and result in visualization errors.

In this work, we introduce M4, an aggregation-based time series dimensionality reduction technique that provides error-free visualizations at high data reduction rates. Focusing on line charts, as the predominant form of time series visualization, we explain in detail the drawbacks of existing data reduction techniques and how our approach outperforms state of the art, by respecting the process of line rasterization.

We describe how to incorporate aggregation-based dimensionality reduction at the query level in a visualization-driven query rewriting system. Our approach is generic and applicable to any visualization system that uses an RDBMS as data source. Using real world data sets from high tech manufacturing, stock markets, and sports analytics domains we demonstrate that our visualization-oriented data aggregation can reduce data volumes by up to two orders of magnitude, while preserving perfect visualizations.

References

  1. S. Agarwal, A. Panda, B. Mozafari, A. P. Iyer, S. Madden, and I. Stoica. Blink and it's done: Interactive queries on very large data. PVLDB, 5(12):1902--1905, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. E. Bresenham. Algorithm for computer control of a digital plotter. IBM Systems journal, 4(1):25--30, 1965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Burtini, S. Fazackerley, and R. Lawrence. Time series compression for adaptive chart generation. In CCECE, pages 1--6. IEEE, 2013.Google ScholarGoogle Scholar
  4. J. X. Chen and X. Wang. Approximate line scan-conversion and antialiasing. In Computer Graphics Forum, pages 69--78. Wiley, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  5. David Salomon. Data Compression. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. H. Douglas and T. K. Peucker. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica Journal, 10(2):112--122, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  7. Q. Duan, P. Wang, M. Wu, W. Wang, and S. Huang. Approximate query on historical stream data. In DEXA, pages 128--135. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. G. Eick and A. F. Karr. Visual scalability. Journal of Computational and Graphical Statistics, 11(1):22--43, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  9. P. Esling and C. Agon. Time-series data mining. ACM Computing Surveys, 45(1):12--34, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner. SAP HANA Database-Data Management for Modern Business Applications. SIGMOD Record, 40(4):45--51, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Fu. A review on time series data mining. EAAI Journal, 24(1):164--181, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Fu, F. Chung, R. Luk, and C. Ng. Representing financial time series based on data point importance. EAAI Journal, 21(2):277--300, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Gandhi, L. Foschini, and S. Suri. Space-efficient online approximation of time series data: Streams, amnesia, and out-of-order. In ICDE, pages 924--935. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. Hershberger and J. Snoeyink. Speeding up the Douglas-Peucker line-simplification algorithm. University of British Columbia, Department of Computer Science, 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Z. Jerzak, T. Heinze, M. Fehr, D. Gröber, R. Hartung, and N. Stojanovic. The DEBS 2012 Grand Challenge. In DEBS, pages 393--398. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. U. Jugel and V. Markl. Interactive visualization of high-velocity event streams. In VLDB PhD Workshop. VLDB Endowment, 2012.Google ScholarGoogle Scholar
  17. D. A. Keim, C. Panse, J. Schneidewind, M. Sips, M. C. Hao, and U. Dayal. Pushing the limit in visual data exploration: Techniques and applications. Lecture notes in artificial intelligence, (2821):37--51, 2003.Google ScholarGoogle Scholar
  18. E. J. Keogh and Pazzani. A simple dimensionality reduction technique for fast similarity search in large time series databases. In PAKDD, pages 122--133. Springer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Kolesnikov. Efficient algorithms for vectorization and polygonal approximation. University of Joensuu, 2003.Google ScholarGoogle Scholar
  20. P. Lindstrom and M. Isenburg. Fast and efficient compression of floating-point data. In TVCG, volume 12, pages 1245--1250. IEEE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W.-Y. Ma, I. Bedner, G. Chang, A. Kuchinsky, and H. Zhang. A framework for adaptive content delivery in heterogeneous network environments. In Proc. SPIE, Multimedia Computing and Networking, volume 3969, pages 86--100. SPIE, 2000.Google ScholarGoogle Scholar
  22. C. Mutschler, H. Ziekow, and Z. Jerzak. The DEBS 2013 Grand Challenge. In DEBS, pages 289--294. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Przymus, A. Boniewicz, M. Burzańska, and K. Stencel. Recursive query facilities in relational databases: a survey. In DTA and BSBT, pages 89--99. Springer, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  24. K. Reumann and A. P. M. Witkam. Optimizing curve segmentation in computer graphics. In Proceedings of the International Computing Symposium, pages 467--472. North-Holland Publishing Company, 1974.Google ScholarGoogle Scholar
  25. W. Shi and C. Cheung. Performance evaluation of line simplification algorithms for vector generalization. The Cartographic Journal, 43(1):27--44, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  26. M. Visvalingam and J. Whyatt. Line generalisation by repeated elimination of points. The Cartographic Journal, 30(1):46--51, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  27. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600--612, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Wesley, M. Eldridge, and P. Terlecki. An analytic data engine for visualization in tableau. In SIGMOD, pages 1185--1194. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Wu, D. Agrawal, and A. El Abbadi. A comparison of DFT and DWT based similarity search in timeseries databases. In CIKM, pages 488--495. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader