research-article

Gorilla: a fast, scalable, in-memory time series database

Authors:
Tuomas Pelkonen

Facebook, Inc., Menlo Park, CA

Facebook, Inc., Menlo Park, CA
View Profile

,
Scott Franklin

Facebook, Inc., Menlo Park, CA

Facebook, Inc., Menlo Park, CA
View Profile

,
Justin Teller

Facebook, Inc., Menlo Park, CA

Facebook, Inc., Menlo Park, CA
View Profile

,
Paul Cavallaro

Facebook, Inc., Menlo Park, CA

Facebook, Inc., Menlo Park, CA
View Profile

,
Qi Huang

Facebook, Inc., Menlo Park, CA

Facebook, Inc., Menlo Park, CA
View Profile

,
Justin Meza

Facebook, Inc., Menlo Park, CA

Facebook, Inc., Menlo Park, CA
View Profile

,
Kaushik Veeraraghavan

Facebook, Inc., Menlo Park, CA

Facebook, Inc., Menlo Park, CA
View Profile

Proceedings of the VLDB Endowment Volume 8 Issue 12pp 1816–1827https://doi.org/10.14778/2824032.2824078

Published:01 August 2015Publication History

Proceedings of the VLDB Endowment

Abstract

Large-scale internet services aim to remain highly available and responsive in the presence of unexpected failures. Providing this service often requires monitoring and analyzing tens of millions of measurements per second across a large number of systems, and one particularly effective solution is to store and query such measurements in a time series database (TSDB).

A key challenge in the design of TSDBs is how to strike the right balance between efficiency, scalability, and reliability. In this paper we introduce Gorilla, Facebook's in-memory TSDB. Our insight is that users of monitoring systems do not place much emphasis on individual data points but rather on aggregate analysis, and recent data points are of much higher value than older points to quickly detect and diagnose the root cause of an ongoing problem. Gorilla optimizes for remaining highly available for writes and reads, even in the face of failures, at the expense of possibly dropping small amounts of data on the write path. To improve query efficiency, we aggressively leverage compression techniques such as delta-of-delta timestamps and XOR'd floating point values to reduce Gorilla's storage footprint by 10x. This allows us to store Gorilla's data in memory, reducing query latency by 73x and improving query throughput by 14x when compared to a traditional database (HBase)-backed time series data. This performance improvement has unlocked new monitoring and debugging tools, such as time series correlation search and more dense visualization tools. Gorilla also gracefully handles failures from a single-node to entire regions with little to no operational overhead.

References

Graphite - Scalable Realtime Graphing. http://graphite.wikidot.com/. Accessed March 20, 2015.Google Scholar
Influxdb.com: InfluxDB - Open Source Time Series, Metrics, and Analytics Database. http://influxdb.com/. Accessed March 20, 2015.Google Scholar
L. Abraham, J. Allen, O. Barykin, V. R. Borkar, B. Chopra, C. Gerea, D. Merl, J. Metzler, D. Reiss, S. Subramanian, J. L. Wiener, and O. Zed. Scuba: Diving into Data at Facebook. PVLDB, 6(11):1057--1067, 2013. Google Scholar
E. B. Boyer, M. C. Broomfield, and T. A. Perrotti. GlusterFS One Storage Server to Rule Them All. Technical report, Los Alamos National Laboratory (LANL), 2012.Google Scholar
N. Bronson, T. Lento, and J. L. Wiener. Open Data Challenges at Facebook. In Workshops Proceedings of the 31st International Conference on Data Engineering Workshops, ICDE Seoul, Korea. IEEE, 2015.Google Scholar
T. D. Chandra, R. Griesemer, and J. Redstone. Paxos Made Live: An Engineering Perspective. In Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing, pages 398--407. ACM, 2007. Google Scholar
H. Chen, J. Li, and P. Mohapatra. RACE: Time Series Compression with Rate Adaptivity and Error Bound for Sensor Networks. In Mobile Ad-hoc and Sensor Systems, 2004 IEEE International Conference on, pages 124--133. IEEE, 2004.Google Scholar
B. Hu, Y. Chen, and E. J. Keogh. Time Series Classification under More Realistic Assumptions. In SDM, pages 578--586, 2013.Google Scholar
E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. ACM SIGMOD Record, 30(2):151--162, 2001. Google Scholar
E. Keogh, S. Lonardi, and B.-c. Chiu. Finding Surprising Patterns in a Time Series Database in Linear Time and Space. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 550--556. ACM, 2002. Google Scholar
E. Keogh, S. Lonardi, and C. A. Ratanamahatana. Towards Parameter-Free Data Mining. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 206--215. ACM, 2004. Google Scholar
E. Keogh and C. A. Ratanamahatana. Exact Indexing of Dynamic Time Warping. Knowledge and information systems, 7(3):358--386, 2005. Google Scholar
I. Lazaridis and S. Mehrotra. Capturing Sensor-Generated Time Series with Quality Guarantees. In Data Engineering, 2003. Proceedings. 19th International Conference on, pages 429--440. IEEE, 2003.Google Scholar
Leslie Lamport. Paxos Made Simple. SIGACT News, 32(4):51--58, December 2001.Google Scholar
J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pages 2--11. ACM, 2003. Google Scholar
J. Lin, E. Keogh, S. Lonardi, J. P. Lankford, and D. M. Nystrom. Visually Mining and Monitoring Massive Time Series. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 460--469. ACM, 2004. Google Scholar
P. Lindstrom and M. Isenburg. Fast and Efficient Compression of Floating-Point Data. Visualization and Computer Graphics, IEEE Transactions on, 12(5):1245--1250, 2006. Google Scholar
A. Mueen, S. Nath, and J. Liu. Fast Approximate Correlation for Massive Time-Series Data. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 171--182. ACM, 2010. Google Scholar
R. Nishtala. Learning from Mistakes and Outages. Presented at SREcon, Santa Clara, CA, March 2015.Google Scholar
R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, et al. Scaling Memcache at Facebook. In nsdi, volume 13, pages 385--398, 2013. Google Scholar
J. Parikh. Keynote speech. Presented at @Scale Conference, San Francisco, CA, September 2014.Google Scholar
K. Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58(347--352):240--242, 1895.Google Scholar
F. Petitjean, G. Forestier, G. Webb, A. Nicholson, Y. Chen, and E. Keogh. Dynamic Time Warping Averaging of Time Series Allows Faster and More Accurate Classification. In IEEE International Conference on Data Mining, 2014. Google Scholar
T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, and E. Keogh. Searching and Mining Trillions of Time Series Subsequences Under Dynamic Time Warping. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 262--270. ACM, 2012. Google Scholar
P. Ratanaworabhan, J. Ke, and M. Burtscher. Fast Lossless Compression of Scientific Floating-Point Data. In DCC, pages 133--142. IEEE Computer Society, 2006. Google Scholar
L. Tang, V. Venkataraman, and C. Thayer. Facebook's Large Scale Monitoring System Built on HBase. Presented at Strata Conference, New York, 2012.Google Scholar
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: A Warehousing Solution Over a Map-Reduce Framework. PVLDB, 2(2):1626--1629, 2009. Google Scholar
T. W. Wlodarczyk. Overview of Time Series Storage and Processing in a Cloud Environment. In Proceedings of the 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), pages 625--628. IEEE Computer Society, 2012. Google Scholar

Index Terms

Gorilla: a fast, scalable, in-memory time series database

Index terms have been assigned to the content through auto-classification.

Recommendations

The Virtual Reality Gorilla Exhibit

The Virtual Reality Gorilla Exhibit teaches users about gorilla behaviors and social interactions. Building a virtual environment for educational purposes requires accurate representation, since it is difficult to predict ahead of time what aspects of ...
Read More
Big Data Analytics with R and Hadoop
Read More
Big Data Analytics
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 8, Issue 12
Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii
August 2015
728 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2015
Published in pvldb Volume 8, Issue 12
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 91
  Total Citations
  View Citations
- 1,346
  Total Downloads
- Downloads (Last 12 months)254
- Downloads (Last 6 weeks)28
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Gorilla: a fast, scalable, in-memory time series database

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

The Virtual Reality Gorilla Exhibit

Big Data Analytics with R and Hadoop

Big Data Analytics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Gorilla: a fast, scalable, in-memory time series database

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

The Virtual Reality Gorilla Exhibit

Big Data Analytics with R and Hadoop

Big Data Analytics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media