Database workload management through CBR and fuzzy based characterization

doi:10.1016/j.asoc.2014.04.030

Applied Soft Computing

Volume 22, September 2014, Pages 605-621

https://doi.org/10.1016/j.asoc.2014.04.030 Get rights and content

Highlights

•
The proposed research introduces a way to manage the database workload in the DBMSs on the basis of the workload type that may be either OLTP or DSS.
•
The main goal of the research is to manage the workload in DBMSs through characterization, scheduler and idleness detection modules.
•
The database workload management is performed by using the case based reasoning characterization; Fuzzy logic based scheduling and finally detection of CPU Idleness.

Abstract

Database Management System (DBMS) is used as a data source with financial, educational, web and other applications from last many years. Users are connected with the DBMS to update existing records and retrieving reports by executing workloads that consist of complex queries. In order to get the sufficient level of performance, arrangement of workloads is necessary. Rapid growth in data, maximum functionality and changing behavior tends the database workload to be more complex and tricky. Each DBMS experiences complex workloads that are difficult to manage by the humans; human experts take much time to manage database workload efficiently; even in some cases it may become impossible and leads toward malnourishment. This problem leads database practitioners, vendors and researchers toward new challenges. To achieve a satisfactory level of performance, either Database Administrator (DBA) or DBMSs must have the knowledge about the workload shifts. Efficient execution and resource allocation of workload is dependent on the workload type that may be either On Line Transaction Processing (OLTP) or Decision Support System (DSS). The research introduces a way to manage the workload in DBMSs on the basis of the workload type. The main goal of the research is to manage the workload in DBMSs through characterization, scheduler and idleness detection modules. The database workload management is performed by using the case based reasoning characterization; Fuzzy logic based scheduling and finally detection of CPU Idleness. Results are validated through experiments that are performed on real time and benchmark workload to reveal effectiveness and efficiency.

Graphical abstract

Introduction

Complexity in Database Management System increases due to various factors such as functionality demands from the users, complex data types, diverse workload and huge data volume which is increasing with the passage of time. These factors cause brittleness and unmanageability in the DBMSs. Today's DBMSs also have to work as a Data Warehouse (DW), i.e., providing summarized and consolidated reports; heterogeneous, complex and ad-hoc workloads that involve number of aggregations, table joins, sorts and concurrent I/O's. To handle the database tasks, organizations hire number of expert Database Administrators and spending lot of money to get expected improvement, throughput and response. Usually DBAs have to take care of all the database tasks such as making policy for workload priorities, memory and storage configuration and such other tasks. The cost of hardware is decreasing but the cost of management is increasing. Performing workload management activities manually, by hiring experts causes increase in Total Cost of Ownership (TCO) [1], [2], [3]. Moreover, with the advent of the distributed systems and DWs, it become difficult and even in some cases impossible for DBA to manually organize, optimize and configure all the database tasks. For efficient workload management various approaches are adopted like workload or queries may be stopped for a while and resumed later on. However, when queries will be stopped during their execution then the executed part will be stored for a while in memory and used later on.

There are three units in workload management, which are workload, resources and objectives. All these are co-related with each other. The workload uses some resources to meet the objectives of an organization or resources are allocated through different approaches to workload which has some management objectives. The workload has been evolved through three phases, i.e., capacity planning, resource sharing and performance oriented workload [4]. Capacity planning workload management is based on cost sharing; the idea behind the resource oriented approach is maximum resource utilization while in performance oriented approach focus is on business goals and objectives. In workload management, the main functions are workload frequency patterns, composition, intensity and required resources.

Identification of workload type is not an easy task due to change in size and type at the different spans of the day, week and months. For example, stock exchange experiences more DSS workload than OLTP workload at the start of the day, turns to more OLTP than DSS during mid of the day and finally once again tends to DSS overnight to observe the whole day analysis and reporting. In general, workload is detected through two methods that are performance monitoring and characterization. Former is the reactive approach that takes action when performance has been degraded while the later is proactive approach to track the workload changes.

There are certain problems with previous characterization approaches. For example, Elnaffer's characterization model [5] is based on the decision tree where handling of large decision trees is difficult and often there is a need of tree pruning to get the result that may lead to inaccuracy and lastly these are also difficult to set up. The research discussed in [6] classifies the workload on the basis of status variables; but problem with the approach is that status variable's values are recorded during workload execution that slows down the classification process. Most of the previous techniques are based on the assumptions or taking values from the query optimizers rather than calculating their actual values. However, in reality there is a large difference between guessed and actual values. Moreover, in previous work, workload classification is performed with execution of workload while our approach completes this task before execution starts.

Main focus of the proposed research is to design and develop an Autonomous Workload Management (AWM) framework that can manage the DB workload in an autonomic way without any human intervention by knowing itself (resources, limitations, operating environment etc.). The concept and importance of the Autonomic DBMS (ADBMS) is already recognized by the DB researchers, practitioners and vendors. The research proposes three components which are workload characterization, scheduler and idleness detector to handle the DB workload proactively. Unique features of the research are workload characterization technique based on the CBR and Fuzzy Logic that characterizes and classifies the workload without affecting the execution time; impact based scheduling that arranges the workload with respect to its type and importance; and finally the CBR based idleness detection mechanism to use the free CPU cycles. The proposed characterization works in parallel with the user input and saves the precious execution time that ultimately enhances the DBMS efficiency. None of the techniques for workload scheduling in DBMSs has identified the percentage of the OLTP and DSS workload through Fuzzy logic. According to our survey, there is no published research that elaborates the concept of the Autonomous Workload Management in DBMSs or DWs. The proposed research is initiated with the aim that it would be a milestone toward the development of Autonomous Workload Management in DBMSs and DWs. The proposed research is validated through various experiments over real-time industrial and benchmark workloads. These experiments are performed by executing the OLTP and DSS like workloads in MySQL.

Rest of the paper is organized as: Section 2 presents related work that includes the workload characterization, classification, scheduling and other workload management techniques. Section 3 elaborates the proposed solution for workload management where detailed steps of the workload characterization, scheduling and idleness detection are discussed. Section 4 presents experiments and results that have performed on the proposed as well as other well known workload management technique. Finally, Section 5 concludes the paper with key future directions.

Section snippets

Related work

The research on the workload management in DBMSs and DWs has been carried out from the last many years. A large number of white papers, research papers and case studies are surveyed which are related with the database workload characterization, classification, scheduling and management in DBMSs and DWs. Following section elaborates the research that is closely related with the proposed research. The workload management literature is divided into two sections which are workload characterization

Proposed workload management solution

Database researchers, practitioners and vendors are working to manage the workload in DBMSs using different techniques. However, there are certain problems with the previous workload management techniques which are already discussed in first chapter at page 16. The workload management techniques that are based on the decision tree have some problems like handling of decision trees become difficult with the increase of size. In this case tree pruning is required to get the result which may lead

Results and discussion

The experimental setup is same as described in previous section. To check the effectiveness of the proposed workload management technique, we have performed a comparison with other three most commonly used techniques which are First In First Out (FIFO), Priority Based (PB) workload management, and Shortest Job First (SJF). The comparison is based on the average waiting time that is taken by executing the various OLTP and DSS like workloads over these four scheduling techniques. FIFO technique

Conclusion

In this research after the introduction and background of the database workload, existing workload management techniques and algorithms has been discussed. These include the workload classification, characterization and scheduling of the database workload. The discussion also highlighted the problems which exist in the previous workload management techniques and algorithms. The proposed solution consists of the characterization, scheduler and idleness detector components. The workload

Acknowledgement

We are grateful to the Higher Education Commission (HEC) of Pakistan and International Islamic University, Pakistan who are supporting this research work.

References (27)

M.C. Huebscher et al.
A Survey of Autonomic Computing – Degrees, Models, and Applications
(2008)
M. Parashar et al.
Autonomic Computing: An Overview
(2005)
S.S. Lightstone et al.
Toward autonomic computing with DB2 universal database
SIGMOD
(2002)
B. Niu et al.
Workload adaptation in autonomic DBMs
S. Elnaffar et al.
Automatically classifying database workloads
Z. Zewdu et al.
Workload Characterization of Autonomic DBMSs Using Statistical and Data Mining Techniques
(2009)
D.A. Menasce et al.
A methodology for workload characterization of E-commerce sites
P. Yu et al.
On workload characterization of relational database environments
IEEE Trans. Softw. Eng.
(1992)
T.J. Wasserman et al.
Developing a characterization of business intelligence workloads for sizing new database systems
C. Surajit, K. Raghav, R. Ravishankar, P. Abhijit, Stop-and-Restart Style Execution for Long Running Decision Support...

C. Surajit et al.

When can we trust progress estimators For SQL queries?

Mumtaz et al.

Exploiting query interactions in database systems

A. Mumtaz et al.

QShuffler Getting the Query Mix Right

Cited by (12)

Feedback control loop design for workload change detection in self-tuning NoSQL wide column stores
2020, Expert Systems with Applications
Citation Excerpt :
The results of this approach can be used for the prediction of incoming queries based on the queries already submitted, which can be used to improve the database performance by cache replacement strategies. The presented work in Abdul, Muhammad, Mustapha, Muhammad and Ahmad (2014) introduces a way to manage the workload based on OLTP and DSS workload types in the relational database system. The main goal of the research is to design and develop an autonomous workload management (AWM) framework that can manage the database workload without human intervention.
Database management systems are the main part of information systems that the size and complexity of these systems are increased in recent years. Due to the growing complexity of DBMSs, database administrators (DBAs) face increasingly more problems and challenges, and so managing these systems are difficult and laborious. More over the main part of the total cost of ownership includes the cost of expert database administrator who can manage these large and complicated systems. Autonomic database, by providing self-management functionality, leads to a reduction in the total cost of ownership for the database system. Self-management decisions such as automated schema database tuning are dependent on the database workload. Therefore, one of the important issues in realizing the database automated tuning is workload monitoring and analysis for changes detection and schema re-tuning with this changes. In this paper, a feedback control loop is designed for continuous monitoring and light-weight workload analysis in NoSQL wide column stores. This loop describes a design pattern for the self-tuning feature and it is used to detect workload changes that are necessary for the automated schema database re-tuning. Our concept is based on workload model construction using reconfigurable colored petri-net model. The results of the experiments show the effectiveness of the proposed approach in discovering significant workload changes.
A fuzzy ontology modeling for case base knowledge in diabetes mellitus domain
2017, Engineering Science and Technology, an International Journal
Citation Excerpt :
Vagueness in medical domains can be handled using fuzzy logic [63], which has been used in diabetes diagnosis rule-based systems [45]. Moreover, fuzzy logic has been integrated with CBR in hybrid systems [1] and used for calculating the fuzzy similarity between cases [41]. Recently, Sohn et al. [59] integrated fuzzy CBR reasoning with crisp ontology reasoning for personalized service in a smart home environment.
Knowledge-Intensive Case-Based Reasoning Systems (KI-CBR) mainly depend on ontologies. Ontology can play the role of case-base knowledge. The combination of ontology and fuzzy logic reasoning is critical in the medical domain. Case-base representation based on fuzzy ontology is expected to enhance the semantic and storage of CBR knowledge-base. This paper provides an advancement to the research of diabetes diagnosis CBR by proposing a novel case-base fuzzy OWL2 ontology (CBRDiabOnto). This ontology can be considered as the first fuzzy case-base ontology in the medical domain. It is based on a case-base fuzzy Extended Entity Relation (EER) data model. It contains 63 (fuzzy) classes, 54 (fuzzy) object properties, 138 (fuzzy) datatype properties, and 105 fuzzy datatypes. We populated the ontology with 60 cases and used SPARQL-DL for its query. The evaluation of CBRDiabOnto shows that it is accurate, consistent, and cover terminologies and logic of diabetes mellitus diagnosis.
A fuzzy-ontology-oriented case-based reasoning framework for semantic diabetes diagnosis
2015, Artificial Intelligence in Medicine
Citation Excerpt :
FL facilitates the knowledge elicitation from a domain expert, eases the transfer of knowledge between domains, and enhances the similarity measurement. Fuzzy logic has been integrated with CBR in hybrid systems [37,38] and used for calculating the fuzzy similarity between cases [22]. However, there are no real studies in the literature for fuzzy-CBR systems for diabetes diagnosis.
Case-based reasoning (CBR) is a problem-solving paradigm that uses past knowledge to interpret or solve new problems. It is suitable for experience-based and theory-less problems. Building a semantically intelligent CBR that mimic the expert thinking can solve many problems especially medical ones.
Knowledge-intensive CBR using formal ontologies is an evolvement of this paradigm. Ontologies can be used for case representation and storage, and it can be used as a background knowledge. Using standard medical ontologies, such as SNOMED CT, enhances the interoperability and integration with the health care systems. Moreover, utilizing vague or imprecise knowledge further improves the CBR semantic effectiveness. This paper proposes a fuzzy ontology-based CBR framework. It proposes a fuzzy case-base OWL2 ontology, and a fuzzy semantic retrieval algorithm that handles many feature types.
This framework is implemented and tested on the diabetes diagnosis problem. The fuzzy ontology is populated with 60 real diabetic cases. The effectiveness of the proposed approach is illustrated with a set of experiments and case studies.
The resulting system can answer complex medical queries related to semantic understanding of medical concepts and handling of vague terms. The resulting fuzzy case-base ontology has 63 concepts, 54 (fuzzy) object properties, 138 (fuzzy) datatype properties, 105 fuzzy datatypes, and 2640 instances. The system achieves an accuracy of 97.67%. We compare our framework with existing CBR systems and a set of five machine-learning classifiers; our system outperforms all of these systems.
Building an integrated CBR system can improve its performance. Representing CBR knowledge using the fuzzy ontology and building a case retrieval algorithm that treats different features differently improves the accuracy of the resulting systems.
Tuning Database Parameters Using Query Perception and Evolutionary Reinforcement Learning
2024, ACM International Conference Proceeding Series
DB Workload Management Through Characterization and Idleness Detection
2024, International Conference on Advanced Communication Technology, ICACT
Workload-aware performance tuning for autonomous DBMSs
2021, Proceedings - International Conference on Data Engineering

View all citing articles on Scopus

View full text

Database workload management through CBR and fuzzy based characterization

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Related work

Proposed workload management solution

Results and discussion

Conclusion

Acknowledgement

A Survey of Autonomic Computing – Degrees, Models, and Applications

Autonomic Computing: An Overview

Toward autonomic computing with DB2 universal database

SIGMOD

Workload adaptation in autonomic DBMs

Automatically classifying database workloads

Workload Characterization of Autonomic DBMSs Using Statistical and Data Mining Techniques

A methodology for workload characterization of E-commerce sites

On workload characterization of relational database environments

IEEE Trans. Softw. Eng.

Developing a characterization of business intelligence workloads for sizing new database systems

When can we trust progress estimators For SQL queries?

Exploiting query interactions in database systems

QShuffler Getting the Query Mix Right