Elsevier

Applied Soft Computing

Volume 22, September 2014, Pages 605-621
Applied Soft Computing

Database workload management through CBR and fuzzy based characterization

https://doi.org/10.1016/j.asoc.2014.04.030Get rights and content

Highlights

  • The proposed research introduces a way to manage the database workload in the DBMSs on the basis of the workload type that may be either OLTP or DSS.

  • The main goal of the research is to manage the workload in DBMSs through characterization, scheduler and idleness detection modules.

  • The database workload management is performed by using the case based reasoning characterization; Fuzzy logic based scheduling and finally detection of CPU Idleness.

Abstract

Database Management System (DBMS) is used as a data source with financial, educational, web and other applications from last many years. Users are connected with the DBMS to update existing records and retrieving reports by executing workloads that consist of complex queries. In order to get the sufficient level of performance, arrangement of workloads is necessary. Rapid growth in data, maximum functionality and changing behavior tends the database workload to be more complex and tricky. Each DBMS experiences complex workloads that are difficult to manage by the humans; human experts take much time to manage database workload efficiently; even in some cases it may become impossible and leads toward malnourishment. This problem leads database practitioners, vendors and researchers toward new challenges. To achieve a satisfactory level of performance, either Database Administrator (DBA) or DBMSs must have the knowledge about the workload shifts. Efficient execution and resource allocation of workload is dependent on the workload type that may be either On Line Transaction Processing (OLTP) or Decision Support System (DSS). The research introduces a way to manage the workload in DBMSs on the basis of the workload type. The main goal of the research is to manage the workload in DBMSs through characterization, scheduler and idleness detection modules. The database workload management is performed by using the case based reasoning characterization; Fuzzy logic based scheduling and finally detection of CPU Idleness. Results are validated through experiments that are performed on real time and benchmark workload to reveal effectiveness and efficiency.

Introduction

Complexity in Database Management System increases due to various factors such as functionality demands from the users, complex data types, diverse workload and huge data volume which is increasing with the passage of time. These factors cause brittleness and unmanageability in the DBMSs. Today's DBMSs also have to work as a Data Warehouse (DW), i.e., providing summarized and consolidated reports; heterogeneous, complex and ad-hoc workloads that involve number of aggregations, table joins, sorts and concurrent I/O's. To handle the database tasks, organizations hire number of expert Database Administrators and spending lot of money to get expected improvement, throughput and response. Usually DBAs have to take care of all the database tasks such as making policy for workload priorities, memory and storage configuration and such other tasks. The cost of hardware is decreasing but the cost of management is increasing. Performing workload management activities manually, by hiring experts causes increase in Total Cost of Ownership (TCO) [1], [2], [3]. Moreover, with the advent of the distributed systems and DWs, it become difficult and even in some cases impossible for DBA to manually organize, optimize and configure all the database tasks. For efficient workload management various approaches are adopted like workload or queries may be stopped for a while and resumed later on. However, when queries will be stopped during their execution then the executed part will be stored for a while in memory and used later on.

There are three units in workload management, which are workload, resources and objectives. All these are co-related with each other. The workload uses some resources to meet the objectives of an organization or resources are allocated through different approaches to workload which has some management objectives. The workload has been evolved through three phases, i.e., capacity planning, resource sharing and performance oriented workload [4]. Capacity planning workload management is based on cost sharing; the idea behind the resource oriented approach is maximum resource utilization while in performance oriented approach focus is on business goals and objectives. In workload management, the main functions are workload frequency patterns, composition, intensity and required resources.

Identification of workload type is not an easy task due to change in size and type at the different spans of the day, week and months. For example, stock exchange experiences more DSS workload than OLTP workload at the start of the day, turns to more OLTP than DSS during mid of the day and finally once again tends to DSS overnight to observe the whole day analysis and reporting. In general, workload is detected through two methods that are performance monitoring and characterization. Former is the reactive approach that takes action when performance has been degraded while the later is proactive approach to track the workload changes.

There are certain problems with previous characterization approaches. For example, Elnaffer's characterization model [5] is based on the decision tree where handling of large decision trees is difficult and often there is a need of tree pruning to get the result that may lead to inaccuracy and lastly these are also difficult to set up. The research discussed in [6] classifies the workload on the basis of status variables; but problem with the approach is that status variable's values are recorded during workload execution that slows down the classification process. Most of the previous techniques are based on the assumptions or taking values from the query optimizers rather than calculating their actual values. However, in reality there is a large difference between guessed and actual values. Moreover, in previous work, workload classification is performed with execution of workload while our approach completes this task before execution starts.

Main focus of the proposed research is to design and develop an Autonomous Workload Management (AWM) framework that can manage the DB workload in an autonomic way without any human intervention by knowing itself (resources, limitations, operating environment etc.). The concept and importance of the Autonomic DBMS (ADBMS) is already recognized by the DB researchers, practitioners and vendors. The research proposes three components which are workload characterization, scheduler and idleness detector to handle the DB workload proactively. Unique features of the research are workload characterization technique based on the CBR and Fuzzy Logic that characterizes and classifies the workload without affecting the execution time; impact based scheduling that arranges the workload with respect to its type and importance; and finally the CBR based idleness detection mechanism to use the free CPU cycles. The proposed characterization works in parallel with the user input and saves the precious execution time that ultimately enhances the DBMS efficiency. None of the techniques for workload scheduling in DBMSs has identified the percentage of the OLTP and DSS workload through Fuzzy logic. According to our survey, there is no published research that elaborates the concept of the Autonomous Workload Management in DBMSs or DWs. The proposed research is initiated with the aim that it would be a milestone toward the development of Autonomous Workload Management in DBMSs and DWs. The proposed research is validated through various experiments over real-time industrial and benchmark workloads. These experiments are performed by executing the OLTP and DSS like workloads in MySQL.

Rest of the paper is organized as: Section 2 presents related work that includes the workload characterization, classification, scheduling and other workload management techniques. Section 3 elaborates the proposed solution for workload management where detailed steps of the workload characterization, scheduling and idleness detection are discussed. Section 4 presents experiments and results that have performed on the proposed as well as other well known workload management technique. Finally, Section 5 concludes the paper with key future directions.

Section snippets

Related work

The research on the workload management in DBMSs and DWs has been carried out from the last many years. A large number of white papers, research papers and case studies are surveyed which are related with the database workload characterization, classification, scheduling and management in DBMSs and DWs. Following section elaborates the research that is closely related with the proposed research. The workload management literature is divided into two sections which are workload characterization

Proposed workload management solution

Database researchers, practitioners and vendors are working to manage the workload in DBMSs using different techniques. However, there are certain problems with the previous workload management techniques which are already discussed in first chapter at page 16. The workload management techniques that are based on the decision tree have some problems like handling of decision trees become difficult with the increase of size. In this case tree pruning is required to get the result which may lead

Results and discussion

The experimental setup is same as described in previous section. To check the effectiveness of the proposed workload management technique, we have performed a comparison with other three most commonly used techniques which are First In First Out (FIFO), Priority Based (PB) workload management, and Shortest Job First (SJF). The comparison is based on the average waiting time that is taken by executing the various OLTP and DSS like workloads over these four scheduling techniques. FIFO technique

Conclusion

In this research after the introduction and background of the database workload, existing workload management techniques and algorithms has been discussed. These include the workload classification, characterization and scheduling of the database workload. The discussion also highlighted the problems which exist in the previous workload management techniques and algorithms. The proposed solution consists of the characterization, scheduler and idleness detector components. The workload

Acknowledgement

We are grateful to the Higher Education Commission (HEC) of Pakistan and International Islamic University, Pakistan who are supporting this research work.

References (27)

  • M.C. Huebscher et al.

    A Survey of Autonomic Computing – Degrees, Models, and Applications

    (2008)
  • M. Parashar et al.

    Autonomic Computing: An Overview

    (2005)
  • S.S. Lightstone et al.

    Toward autonomic computing with DB2 universal database

    SIGMOD

    (2002)
  • B. Niu et al.

    Workload adaptation in autonomic DBMs

  • S. Elnaffar et al.

    Automatically classifying database workloads

  • Z. Zewdu et al.

    Workload Characterization of Autonomic DBMSs Using Statistical and Data Mining Techniques

    (2009)
  • D.A. Menasce et al.

    A methodology for workload characterization of E-commerce sites

  • P. Yu et al.

    On workload characterization of relational database environments

    IEEE Trans. Softw. Eng.

    (1992)
  • T.J. Wasserman et al.

    Developing a characterization of business intelligence workloads for sizing new database systems

  • C. Surajit, K. Raghav, R. Ravishankar, P. Abhijit, Stop-and-Restart Style Execution for Long Running Decision Support...
  • C. Surajit et al.

    When can we trust progress estimators For SQL queries?

  • Mumtaz et al.

    Exploiting query interactions in database systems

  • A. Mumtaz et al.

    QShuffler Getting the Query Mix Right

  • Cited by (12)

    • Feedback control loop design for workload change detection in self-tuning NoSQL wide column stores

      2020, Expert Systems with Applications
      Citation Excerpt :

      The results of this approach can be used for the prediction of incoming queries based on the queries already submitted, which can be used to improve the database performance by cache replacement strategies. The presented work in Abdul, Muhammad, Mustapha, Muhammad and Ahmad (2014) introduces a way to manage the workload based on OLTP and DSS workload types in the relational database system. The main goal of the research is to design and develop an autonomous workload management (AWM) framework that can manage the database workload without human intervention.

    • A fuzzy ontology modeling for case base knowledge in diabetes mellitus domain

      2017, Engineering Science and Technology, an International Journal
      Citation Excerpt :

      Vagueness in medical domains can be handled using fuzzy logic [63], which has been used in diabetes diagnosis rule-based systems [45]. Moreover, fuzzy logic has been integrated with CBR in hybrid systems [1] and used for calculating the fuzzy similarity between cases [41]. Recently, Sohn et al. [59] integrated fuzzy CBR reasoning with crisp ontology reasoning for personalized service in a smart home environment.

    • A fuzzy-ontology-oriented case-based reasoning framework for semantic diabetes diagnosis

      2015, Artificial Intelligence in Medicine
      Citation Excerpt :

      FL facilitates the knowledge elicitation from a domain expert, eases the transfer of knowledge between domains, and enhances the similarity measurement. Fuzzy logic has been integrated with CBR in hybrid systems [37,38] and used for calculating the fuzzy similarity between cases [22]. However, there are no real studies in the literature for fuzzy-CBR systems for diabetes diagnosis.

    • DB Workload Management Through Characterization and Idleness Detection

      2024, International Conference on Advanced Communication Technology, ICACT
    • Workload-aware performance tuning for autonomous DBMSs

      2021, Proceedings - International Conference on Data Engineering
    View all citing articles on Scopus
    View full text