Abstract
Performance-monitoring timeseries systems such as Prometheus and InfluxDB play a critical role in assuring reliability and operationally. These systems commonly adopt a column-oriented storage model, by which timeseries samples from different time-series are separated, and all samples (with both numeric values and timestamps) in one timeseries are grouped into chunks and stored together. As a group of timeseries are often collected from the same source with the same timestamps, managing timestamps and metrics in a group manner provides more opportunities for query and insertion optimization but posts new challenges as well. Besides, for performance monitoring systems, to support better compression and efficient queries for most recent data that are most likely accessed by users, huge volumes of data are first cached in memory and then periodically flushed to disks. Periodic data flushing incurs high IO overhead, and simply discarding flushed data, which can still serve queries, not only is a waste but also brings huge memory reclamation cost. In this paper, we propose Heracles which integrates two techniques - (1) a new storage model, which enables efficient queries on compressed data by utilizing the shared timestamp column to easily locate corresponding metric values; (2) a novel two-level epoch-based memory manager, which allows the system to gradually flush and reclaim in-memory data while unreclaimed data can still serve queries. Heracles is implemented as a standalone module that can be easily integrated into existing performance monitoring timeseries systems. We have implemented a fully functional prototype with Heracles based on Prometheus tsdb, a representative open-source performance monitoring system, and conducted extensive experiments with real and synthetic timeseries data. Experimental results show that, compared with Prometheus, Heracles can improve the insertion throughput by 171%, and reduce the query latency and space usage by 32% and 30%, respectively, on average. Besides, to compare with other state-of-the-art storage techniques, we have integrated LevelDB (for LSM-tree-based structure) and Parquet (for column stores) into Prometheus tsdb, respectively, and experimental results show Heracles outperform these two integrations. We have released the open-source code of Heracles for public access.
- Daniel Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating Compression and Execution in Column-oriented Database Systems. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (Chicago, IL, USA) (SIGMOD '06). ACM, New York, NY, USA, 671--682. Google ScholarDigital Library
- Wail Y. Alkowaileet, Sattam Alsubaiee, and Michael J. Carey. 2020. An LSM-Based Tuple Compaction Framework for Apache AsterixDB. Proc. VLDB Endow. 13, 9 (May 2020), 1388--1400. Google ScholarDigital Library
- Sattam Alsubaiee, Yasser Altowim, Hotham Altwaijry, Alexander Behm, Vinayak Borkar, Yingyi Bu, Michael Carey, Inci Cetindil, Madhusudan Cheelangi, Khurram Faraaz, Eugenia Gabrielova, Raman Grover, Zachary Heilbron, Young-Seok Kim, Chen Li, Guangqiang Li, Ji Mahn Ok, Nicola Onose, Pouria Pirzadeh, Vassilis Tsotras, Rares Vernica, Jian Wen, and Till Westmann. 2014. AsterixDB: A Scalable, Open Source BDMS. Proc. VLDB Endow. 7, 14 (Oct. 2014), 1905--1916. Google ScholarDigital Library
- Sattam Alsubaiee, Alexander Behm, Vinayak Borkar, Zachary Heilbron, Young-Seok Kim, Michael J. Carey, Markus Dreseler, and Chen Li. 2014. Storage Management in AsterixDB. Proc. VLDB Endow. 7, 10 (June 2014), 841--852. Google ScholarDigital Library
- Michael P Andersen and David E. Culler. 2016. BTrDB: Optimizing Storage System Design for Timeseries Processing. In 14th USENIX Conference on File and Storage Technologies (FAST 16). USENIX Association, Santa Clara, CA, 39--52. https://www.usenix.org/conference/fast16/technical-sessions/presentation/andersen Google ScholarDigital Library
- Vo Ngoc Anh and Alistair Moffat. 2010. Index compression using 64-bit words. Software: Practice and Experience 40, 2 (2010), 131--147. Google ScholarDigital Library
- Andrea Arcangeli, Mingming Cao, Paul E McKenney, and Dipankar Sarma. 2003. Using Read-Copy-Update Techniques for System V IPC in the Linux 2.5 Kernel.. In USENIX Annual Technical Conference, FREENIX Track. 297--309.Google Scholar
- Philip A. Bernstein and Nathan Goodman. 1983. Multiversion Concurrency Control---Theory and Algorithms. ACM Trans. Database Syst. 8, 4 (Dec. 1983), 465--483. Google ScholarDigital Library
- Wei Cao, Yusong Gao, Feifei Li, Sheng Wang, Bingchen Lin, Ke Xu, Xiaojie Feng, Yucong Wang, Zhenjun Liu, and Gejin Zhang. 2020. Timon: A Timestamped Event Database for Efficient Telemetry Data Processing and Analytics. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 739--753. Google ScholarDigital Library
- Justin DeBrabant, Andrew Pavlo, Stephen Tu, Michael Stonebraker, and Stan Zdonik. 2013. Anti-caching: A new approach to database management system architecture. Proceedings of the VLDB Endowment 6, 14 (2013), 1942--1953. Google ScholarDigital Library
- Google Developers. 2020. Protocol Buffers - Base 128 Varints. https://developers.google.com/protocol-buffers/docs/encoding#varints.Google Scholar
- Cristian Diaconu, Craig Freedman, Erik Ismert, Paul Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL Server's Memory-Optimized OLTP Engine. In ACM International Conference on Management of Data 2013 (sigmod 2013 ed.). https://www.microsoft.com/en-us/research/publication/hekaton-sql-servers-memory-optimized-oltp-engine/ Google ScholarDigital Library
- Fackbook. 2020. Beringei - A high performance, in memory time series storage engine. https://github.com/facebookarchive/beringei.Google Scholar
- Apache Software Foundation. 2020. Apache Parquet. https://parquet.apache.org/.Google Scholar
- The Apache Software Foundation. 2020. Apache HBase Project. https://hbase.apache.org/.Google Scholar
- Keir Fraser. 2004. Practical lock-freedom. Technical Report. University of Cambridge, Computer Laboratory.Google Scholar
- Sanjay Ghemawat and Jeff Dean. 2020. LevelDB. https://github.com/google/leveldb.Google Scholar
- Anders Gidenstam, Marina Papatriantafilou, Håkan Sundell, and Philippas Tsigas. 2008. Efficient and reliable lock-free memory reclamation based on reference counting. IEEE Transactions on Parallel and Distributed Systems 20, 8 (2008), 1173--1187. Google ScholarDigital Library
- M. Großmann and C. Klug. 2017. Monitoring Container Services at the Network Edge. In 2017 29th International Teletraffic Congress (ITC 29), Vol. 1. 130--133. Google ScholarCross Ref
- Thomas Hart, Paul Mckenney, Angela Brown, and Jonathan Walpole. 2007. Performance of memory reclamation for lockless synchronization. J. Parallel and Distrib. Comput. 67 (12 2007), 1270--1285. Google ScholarDigital Library
- Stratos Idreos, F. Groffen, Niels Nes, Stefan Manegold, Sjoerd Mullender, and Martin Kersten. 2012. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Eng. Bull. 35.Google Scholar
- Docker Inc. 2020. Collect Docker metrics with Prometheus. https://docs.docker.com/config/thirdparty/prometheus/.Google Scholar
- InfluxData Inc. 2020. Flux data scripting language. https://docs.influxdata.com/influxdb/v2.0/reference/flux/.Google Scholar
- LogicMonitor Inc. 2020. LogicMonitor Case Studies. https://www.logicmonitor.com/case-studies.Google Scholar
- influxdata. 2020. InfluxDB 1.7 Documentation. https://docs.influxdata.com/influxdb/.Google Scholar
- Influxdata. 2020. InfluxDB Query. https://docs.influxdata.com/influxdb/v2.0/api/#operation/PatchDashboardsIDCellsIDView.Google Scholar
- Jing Han, Haihong E, Guan Le, and Jian Du. 2011. Survey on NoSQL database. In 2011 6th International Conference on Pervasive Computing and Applications. 363--366. Google ScholarCross Ref
- William Kennedy. 2020. Scheduling In Go : Part II - Go Scheduler. https://www.ardanlabs.com/blog/2018/08/scheduling-in-go-part2.html.Google Scholar
- Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, and Chuck Bear. 2012. The Vertica Analytic Database: C-Store 7 Years Later. Proc. VLDB Endow 5, 12 (Aug. 2012), 1790--1801. Google ScholarDigital Library
- Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018. IEEE Computer Society, 185--196. Google ScholarCross Ref
- Maged M Michael. 2002. Safe memory reclamation for dynamic lock-free objects using atomic reads and writes. In Proceedings of the twenty-first annual symposium on Principles of distributed computing. 21--30. Google ScholarDigital Library
- OKLog. 2020. Universally Unique Lexicographically Sortable Identifier. https://github.com/oklog/ulid.Google Scholar
- Tuomas Pelkonen, Scott Franklin, Paul Cavallaro, Qi Huang, Justin Meza, Justin Teller, and Kaushik Veeraraghavan. 2015. Gorilla: A Fast, Scalable, In-Memory Time Series Database. PVLDB 8, 12 (2015), 1816--1827. Google ScholarDigital Library
- Bartlomiej Plotka. 2020. Thanos - Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project. https://github.com/thanos-io/thanos.Google Scholar
- Prometheus. 2020. Exporter for MySQL server metrics. https://github.com/prometheus/mysqld_exporter.Google Scholar
- Prometheus. 2020. Node exporter - Exporter for machine metrics. https://github.com/prometheus/node_exporter.Google Scholar
- Prometheus. 2020. Prometheus - Defining recording rules. https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/.Google Scholar
- Prometheus. 2020. Prometheus - From metrics to insight, power your metrics and alerting with a leading open-source monitoring solution. https://prometheus.io/.Google Scholar
- Prometheus. 2020. Prometheus Range Queries. https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries.Google Scholar
- Prometheus. 2020. PromQL. https://prometheus.io/docs/prometheus/latest/querying/basics/.Google Scholar
- A. H. Robinson and C. Cherry. 1967. Results of a prototype television bandwidth compression scheme. Proc. IEEE 55, 3 (March 1967), 356--364. Google ScholarCross Ref
- Michael Scott and Maged Michael. 1995. Correction of a Memory Management Method for Lock-Free Data Structures. (1995). Google ScholarDigital Library
- Chris B. Sears. 2000. The Elements of Cache Programming Style. In Proceedings of the 4th Annual Linux Showcase & Conference - Volume 4 (Atlanta, Georgia) (ALS'00). USENIX Association, Berkeley, CA, USA, 18--18. http://dl.acm.org/citation.cfm?id=1268379.1268397 Google ScholarDigital Library
- Mike Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran, and Stan Zdonik. 2005. C-store: A Column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases (Trondheim, Norway) (VLDB '05). VLDB Endowment, 553--564. http://dl.acm.org/citation.cfm?id=1083592.1083658 Google ScholarDigital Library
- Yandex ClickHouse team. 2020. ClickHouse. https://clickhouse.tech/.Google Scholar
- Timescale. 2020. Time Series Benchmark Suite, a tool for comparing and evaluating databases for time series data. https://github.com/timescale/tsbs.Google Scholar
- John D Valois. 1995. Lock-free linked lists using compare-and-swap. In Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing. 214--222. Google ScholarDigital Library
- Alexander Visheratin, Alexey Struckov, Semen Yufa, Alexey Muratov, Denis Nasonov, Nikolay Butakov, Yury Kuznetsov, and Michael May. 2020. Peregreen - modular database for efficient storage of historical time series in cloud environments. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 589--601. https://www.usenix.org/conference/atc20/presentation/visheratinGoogle Scholar
- Matt Welsh, David Culler, and Eric Brewer. 2001. SEDA: An Architecture for Well-Conditioned, Scalable Internet Services. SIGOPS Oper. Syst. Rev. 35, 5 (Oct. 2001), 230--243. Google ScholarDigital Library
- Jason Wilder. 2020. simple8b Golang implementation. https://github.com/jwilder/encoding/tree/master/simple8b.Google Scholar
- xitongsys. 2020. Pure golang library for reading/writing parquet file. https://github.com/xitongsys/parquet-go.Google Scholar
Recommendations
Heracles: Fully Synthesizable Parameterized MIPS-Based Multicore System
FPL '11: Proceedings of the 2011 21st International Conference on Field Programmable Logic and ApplicationsHeracles is an open-source complete multicore system written in Verilog. It is fully parameterized and can be reconfigured and synthesized into different topologies and sizes. Each processing node has a fully bypassed, 7-stage pipelined microprocessor ...
Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications
GRID '11: Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid ComputingMapReduce is a promising parallel programming model for processing large data sets. Hadoop is an up-and-coming open-source implementation of MapReduce. It uses the Hadoop Distributed File System (HDFS) to store input and output data. Due to a lack of ...
A high-performance and endurable SSD cache for parity-based RAID
Solid-state drives (SSDs) have been widely used as caching tier for disk-based RAID systems to speed up data-intensive applications. However, traditional cache schemes fail to effectively boost the parity-based RAID storage systems (e.g., RAID-5/6), ...
Comments