A game theory based framework for materialized view selection in data warehouses

https://doi.org/10.1016/j.engappai.2018.02.018Get rights and content

Abstract

Data warehouses exploit On-Line Analytical Processing (OLAP) to make rapid answers for analytical queries. Huge amount of aggregated data within a data warehouse on the one hand, and complex analytical queries raised in a data warehouse on the other hand, increase response time to queries extremely. To solve this problem, a number of views are derived and extracted from original base tables and queries have been answered using them. Since materialization of all possible views is not effective because of limitation of storage and maintenance overhead, selecting an optimal set of views for materialization is crucial to maximize data warehouse performance.

In this paper, a game theory based framework for the materialized view selection is proposed. In the proposed framework, query processing and view maintenance costs play a game against each other as two players and continue the game until reach the equilibrium. According to the framework, a new static method, called Game Theory based Materialized View selection (GTMV), has been proposed. Verification of proposed approach has been evaluated using several synthetic and real world datasets. Experimental results show that the GTMV method has better performance comparing previous algorithms and substantially outperform former methods.

Introduction

Great importance of data analysis in today’s information based and knowledge based world is evident. On-Line Analytical Processing (OLAP) is one of the involved technologies for analyzing data which is employed by data warehouses (Mansmann et al., 2014). Data warehouse is an integrated repository of information collected from several operational databases, with some important characteristics such as subject-orientation, integrity, time variance, and non volatility, which is used to obtain analytical and statistical information for decision support systems (Lechtenbörger and Vossen, 2003). Massive amount of data and complex queries lead to unacceptable increase of response time and emphasize needing to create an efficient mechanism to rapid response to such queries. Materialized view selection is the most important and the most applicable solution. An optimal set of views, according to some constraints such as memory space limitation and maintenance cost, are selected and materialized in this approach. Materialized views are employed to efficient processing of complex queries. Finding this optimal set of views is a challenging issue that has attracted a lot of recent studies (Huang et al., 2014).

From a general point of view, materialized view selection methods can be divided into two categories: static view selection and dynamic view selection. In static view selection methods, selecting and materializing of views take place before executing the first query and the set of materialized view does not change until the end of execution of last query. In dynamic view selection methods, the set of materialized views changes during execution of queries, and some of the selected views have been removed and are replaced by other ones. The dynamic methods are also used in two ways. In the first type, the set of all queries and their execution order are known in advance, and in the second type, queries are not known from the beginning and added to the workload gradually (Mami and Bellahsene, 2012). Dynamic algorithms are more successful than static ones and most of efficient materialized view selection methods exploit dynamic algorithms to find the optimal set of appropriate views.

Most of materialized view selection methods organize the set of all views as a unified structure firstly, and then select the set of optimal view for materialization using this structure. Multiple View Processing Plan (MVPP) (Yang et al., 1997), AND–OR DAG (Roy et al., 2000), and Data Cube Lattice (Harinarayan et al., 1996) are the most well known applicable types of these structures. MVPP is a directed acyclic graph, in which, roots are queries, leaves are base relations, and other nodes are relational operators, such as selection, projection, join, and data aggregation, which are used to write a query. In fact, query processing plan of multiple queries are combined as the MVPP. AND–OR DAG is also a directed acyclic graph which is composed of two types of nodes, namely operation nodes and equivalent nodes. Each operation node contains a relational algebra expression, and each equivalent node displays a set of equivalent logical expressions. Root of AND–OR graph is the result of the query execution and its leaves are base relations. Equivalent nodes are the main difference between MVPP and AND–OR DAG. In data cube lattice, each node shows a view or a query. Edges of the lattice demonstrate the relationship between the views. In this structure, a query can be answered and update using other queries. There also exists some strategies in which the process of materialized view selection is done without a particular structure. Some methods have used query rewriting to modify queries so that more common views can be obtained from them and they could exploit each other’s results (Halevy, 2001). Furthermore, there exist some methods which select the optimal views using syntactical analysis of the workload. These methods investigate and evaluate the workload and select a subset of base relations such that for them and their corresponding selected views for materialization, the cost of the workload is significantly reduced. In this case, the search space for finding the set of optimal views is very large (Mami and Bellahsene, 2012).

In term of limitations, methods are provided to solving the problem of materialized view selection can be divided into different categories. Some methods select the appropriate set of view for materialization without any limitation. These views are selected so that the costs of query processing and view maintenance are minimal. In some methods, view selection has been carried out under storage space restriction. A specified given memory space is supposed in these methods and an optimal set of selected views, so that the summation of their required storage space to materialization is less than or equal to the available memory space, is desired. In some other methods, the view selection problem should satisfy the view maintenance cost constraint. In this category of methods, a limited time is given, in which all selected materialized views should be updated if a subset of base relations has been modified. There exist a few materialized view selection algorithms which consider both memory space and view maintenance cost limitations.

Games theory is a subset of mathematics that attempts to use the design and analysis of scenarios to predict behaviors and decision results of autonomous interacting entities (Nisan et al., 2007). Game theory is used in various disciplines, including computer science (Shoham, 2008). Distributed computing (Grosu and Chronopoulos, 2005), game-based learning (Tobias et al., 2014), Networks’ routing (Vallet et al., 2016), network security (Liang and Xiao, 2013), decision support systems (Carreras et al., 2011), and data mining (Wang, 2006) are some of the applications of game theory in the field of computer science. In data related issues, game theory can be used in two ways. On the one hand, data related problems, such as extracting hidden patterns from a dataset, can be solved using game theory techniques, and on the other hand, with the help of data mining and knowledge discovery techniques, game theory can provide useful information for solving various problems. As an example, a finding and extracting data problem can be considered as a game that uses some of its factors as players, and solve the problem by playing the game. Collecting and using of data is also one of the most important issues in the game theory. This is very useful especially in iterative games. By processing and analyzing the obtained results of the opposing player in previous games, useful information can be extracted to use in subsequent decisions in iterative games. In fact, data mining can be used as a key technique for creation of a valid knowledge base for players of a game.

In this paper, a new framework is proposed to solve the materialized view selection problem using game theory. Cost of query processing and cost of view maintenance will be facing each other as two players of this game. Players’ strategy can be one of the proposed methods to solve the materialized view selection problem. In this framework, the game continues until equilibrium is reached. After reaching equilibrium, the set of appropriate views will be specified directly or indirectly. Based on the proposed framework, a new Game Theory based Materialized View selection method (GTMV) is presented. The proposed method is a greedy algorithm that exploits MVPP structure to evaluate and select appropriate views. GTMV method uses static strategy for view selection. Since no limitation is considered in the problem, static view selection is very appropriate. It should be noted that the framework and the proposed method are flexible and expandable, and with some modification, the proposed method can be used to solve the dynamic view selection problem. Using game theory, the proposed method achieves better results in terms of time.

The rest of paper is organized as follows. In Section 2 the most important materialized view selection strategies are reviewed. Some preliminaries and problem definition is given in Section 3. The proposed framework and the new game theory based materialized view selection method is raised in Section 4 and is explained using an example. The proposed method is compared with other existing methods and experimental results are shown and evaluated in Section 5. Finally, the work is concluded in Section 6.

Section snippets

Related works

In this section, we first represent important existing works which are related to materialize view selection. Then, we explain important game theory applications in data science.

Preliminaries and problem definition

The key terms of materialized view selection are defined formally in this section using the standard conventions followed in the literature (Sohrabi and Azgomi, 2017b). Suppose a data warehouse with r base relations. Let R={R1,R2,,Rr} be the set of all base relations of this data warehouse. An update frequency ufi is assigned to each base relation Ri. The workload is composed of q queries run on the base relations. The set of queries is shown as Q={Q1,Q2,,Qq}. Each query has its own execution

Game theory framework for materialized view selection

There are two conflicting concepts in the materialized view selection problem, namely query processing cost and view maintenance cost. Increasing the query processing cost of a view leads to reduction of its maintenance cost and vice versa. These two costs are used to model the materialized view selection problem as a game. We consider the materialized view selection problem as a game in which these two costs are two opposite players. The first player is greedy to the views with high

Experimental results

For evaluation of GTMV and the proposed framework, we investigate it in practical experiments. Several experiments in different situations have been carried out on different datasets. In this section, we present and evaluate the results of practical experiments.

Evaluation environment: All experiments have been carried out in Oracle 12i on a PC with an Intel 2410 M 2.3 GHz Core i5 2.9 GHz processor, 4 GB DDR3 RAM, and 600 GB SATA hard disk. There is no assumption about storage space limitation

Conclusion

Rapid response to complex queries is one the most important issues raised in the online analytical processing (OLAP). Enormous volume of data in data warehouses, which are used by OLAP services, is one of the main barriers to find the response of queries, quickly. Furthermore, analytical queries are usually very complex and time- and space-consuming. To overcome such problems, some common views of base relations should be selected and materialized to be exploited in processing queries. Since

References (75)

  • LeboucherC. et al.

    Novel evolutionary game based multi-objective optimisation for dynamic weapon target assignment

    IFAC Proc. Vol.

    (2014)
  • LeboucherC. et al.

    Convergence proof of an enhanced particle swarm optimisation method integrated with evolutionary game theory

    Inform. Sci.

    (2016)
  • LechtenbörgerJ. et al.

    Multidimensional normal forms for data warehouse design

    Inf. Syst.

    (2003)
  • LiX. et al.

    Application of game theory based hybrid algorithm for multi-objective integrated process planning and scheduling

    Expert Syst. Appl.

    (2012)
  • LiuW.B. et al.

    An evolutionary game based particle swarm optimization algorithm

    J. Comput. Appl. Math.

    (2008)
  • MansmannS. et al.

    Discovering OLAP dimensions in semi-structured data

    Inf. Syst.

    (2014)
  • MovassaghM. et al.

    Game theory based node scheduling as a distributed solution for coverage control in wireless sensor networks

    Eng. Appl. Artif. Intell.

    (2017)
  • ShamshirbandS. et al.

    Cooperative game theoretic approach using fuzzy Q-learning for detecting and preventing intrusions in wireless sensor networks

    Eng. Appl. Artif. Intell.

    (2014)
  • SohrabiM.K. et al.

    A comprehensive study on the effects of using data mining techniques to predict tie strength

    Comput. Hum. Behav.

    (2016)
  • SohrabiM.K. et al.

    Efficient colossal pattern mining in high dimensional datasets

    Knowl. Based Syst.

    (2012)
  • SohrabiM.K. et al.

    Frequent itemset mining using cellular learning automata

    Comput. Hum. Behav.

    (2017)
  • SohrabiM.K. et al.

    Multi-objective feature selection for warfarin dose prediction

    Comput. Biol. Chem.

    (2017)
  • TuylsK. et al.

    What evolutionary game theory tells us about multiagent learning

    Artificial Intelligence

    (2007)
  • WuX. et al.

    Optimizing XML queries: Bitmapped materialized views vs. indexes

    Inf. Syst.

    (2013)
  • XiaoZ. et al.

    A solution of dynamic VMs placement problem for energy consumption optimization based on evolutionary game theory

    J. Syst. Softw.

    (2015)
  • XiaoM. et al.

    A new methodology for multi-objective multidisciplinary design optimization problems based on game theory

    Expert Syst. Appl.

    (2015)
  • XuY. et al.

    Lattice-valued matrix game with mixed strategies for intelligent decision support

    Knowl.-Based Syst.

    (2012)
  • ZhaoG. et al.

    A matrix approach to modeling and optimization for dynamic games with random entrance

    Appl. Math. Comput.

    (2016)
  • ZhouL. et al.

    An approach for overlapping and hierarchical community detection in social networks based on coalition formation game theory

    Expert Syst. Appl.

    (2015)
  • AouicheK. et al.

    Data mining-based materialized view and index selection in data warehouses

    J. Intell. Inf. Syst.

    (2009)
  • AouicheK. et al.

    Clustering-based materialized view selection in data warehouses

  • ArabM. et al.

    Proposing a new clustering method to detect phishing websites

    Turk. J. Electr. Eng. Comput. Sci.

    (2017)
  • BabuS. et al.

    An integrated approach to evaluating sustainability in supply chains using evolutionary game theory

    Comput. Oper. Res.

    (2017)
  • ChaudhariM. et al.

    Dynamic materialized view selection algorithm: a clustering approach

    Data Eng. Manag.

    (2012)
  • ChunlinL. et al.

    Collaborative content dissemination based on game theory in multimedia cloud

    Knowl.-Based Syst.

    (2017)
  • DerakhshanR. et al.

    Simulated annealing for materialized view selection in data warehousing environment

  • FiccoM. et al.

    A coral-reefs and game theory-based approach for optimizing elastic cloud resource allocation

    Future Gener. Comput. Syst.

    (2016)
  • Cited by (29)

    • ZigZag<sup>+</sup>: A global optimization algorithm to solve the view selection problem for large-scale workload optimization

      2022, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      However, significant challenges in quickly answering analytic queries are often their complex nature, their large number, and the size of the data warehouse that keeps increasing periodically. Thus, to accelerate query processing and serve instantly decision-support applications, much research has integrated materialized views into the data warehouse design (Theodoratos and Sellis, 1997; Agrawal et al., 2000; Bellatreche et al., 2000a; Mami et al., 2011; Prakash and Kumar, 2019; Azgomi and Sohrabi, 2019; Sohrabi and Azgomi, 2019; Azgomi and Sohrabi, 2018; Munir et al., 2020; Yuan et al., 2020; Vijay Kumar and Kumar, 2012; Ordonez-Ante et al., 2020; Kechar et al., 2021). But, despite their benefits, the use of materialized views is not for free.

    • Proactive and intelligent evaluation of big data queries in edge clouds with materialized views

      2022, Computer Networks
      Citation Excerpt :

      Munir et al. [10] considered the selection of materialization of intermediate results in data-intensive flows, and choose the storage data format for selected materialized intermediate results. Azgomi et al. [4] proposed a game theory based framework to select the materialized views, where query processing and view maintenance cost are two players in a game, they play each other until an equilibrium is reached. Liang et al. [8] proposed a materialized view selection method based on reinforcement learning technique to minimize the query processing time.

    • Evolutionary game theory approach to materialized view selection in data warehouses

      2019, Knowledge-Based Systems
      Citation Excerpt :

      MVS problem has been solved using game theory only in [51]. A game theory-based framework and a corresponding deterministic method are proposed in [51] to model the MVS problem as a non-cooperative game. The player of this game are the query processing cost and the view maintenance cost.

    View all citing articles on Scopus
    View full text