Skip to main content
Top

2020 | Book

Data Science and Productivity Analytics

insite
SEARCH

About this book

This book includes a spectrum of concepts, such as performance, productivity, operations research, econometrics, and data science, for the practically and theoretically important areas of ‘productivity analysis/data envelopment analysis’ and ‘data science/big data’. Data science is defined as the collection of scientific methods, processes, and systems dedicated to extracting knowledge or insights from data and it develops on concepts from various domains, containing mathematics and statistical methods, operations research, machine learning, computer programming, pattern recognition, and data visualisation, among others.

Examples of data science techniques include linear and logistic regressions, decision trees, Naïve Bayesian classifier, principal component analysis, neural networks, predictive modelling, deep learning, text analysis, survival analysis, and so on, all of which allow using the data to make more intelligent decisions. On the other hand, it is without a doubt that nowadays the amount of data is exponentially increasing, and analysing large data sets has become a key basis of competition and innovation, underpinning new waves of productivity growth. This book aims to bring a fresh look onto the various ways that data science techniques could unleash value and drive productivity from these mountains of data.

Researchers working in productivity analysis/data envelopment analysis will benefit from learning about the tools available in data science/big data that can be used in their current research analyses and endeavours. The data scientists, on the other hand, will also get benefit from learning about the plethora of applications available in productivity analysis/data envelopment analysis.

Table of Contents

Frontmatter
Chapter 1. Data Envelopment Analysis and Big Data: Revisit with a Faster Method
Abstract
Khezrimotlagh et al. (Eur J Oper Res 274(3):1047–1054, 2019) propose a new framework to deal with large-scale data envelopment analysis (DEA). The framework provides the fastest available technique in the DEA literature to deal with big data. It is well known that as the number of decision-making units (DMUs) or the number of inputs–outputs increases, the size of DEA linear programming problems increases; and thus, the elapsed time to evaluate the performance of DMUs sharply increases. The framework selects a subsample of DMUs and identifies the set of all efficient DMUs. After that, users can apply DEA models with known efficient DMUs to evaluate the performance of inefficient DMUs or benchmark them. In this study, we elucidate their proposed method with transparent examples and illustrate how the framework is applied. Additional simulation exercises are designed to evaluate the performance of the framework in comparison with the performance of the two former methods: build hull (BH) and hierarchical decomposition (DH). The disadvantages of BH and HD are transparently demonstrated. A single computer with two different CPUs is used to run the methods. For the first time in the literature, we consider the cardinalities, 200,000, 500,000 and 1,000,000 DMUs.
Dariush Khezrimotlagh, Joe Zhu
Chapter 2. Data Envelopment Analysis (DEA): Algorithms, Computations, and Geometry
Abstract
Data Envelopment Analysis (DEA) has matured but remains vibrant and relevant, in part, because its algorithms, computational experience, and geometry have a broad impact within and beyond the field. Algorithmic, computational, and geometric results in DEA allow us to solve larger problems faster; they also contribute to various other fields including computational geometry, statistics, and machine learning. This chapter reviews these topics from a historical viewpoint, as they currently stand, and as to how they will evolve in the future.
José H. Dulá
Chapter 3. An Introduction to Data Science and Its Applications
Abstract
Data science has become a fundamental discipline, both in the field of basic research and in the resolution of applied problems, where statistics and computer science intersect. Thus, from the perspective of the data itself, machine learning, operation research, methods and algorithms, and data mining techniques are aligned to address new challenges characterised by the complexity, volume and heterogeneous nature of data.
Alex Rabasa, Ciara Heavin
Chapter 4. Identification of Congestion in DEA
Abstract
Productivity is a common descriptive measure for characterizing the resource-utilization performance of a production unit, or decision making unit (DMU). The challenge of improving productivity is closely related to a particular form of congestion, which reflects waste (overuse) of input resources at the production unit level. Specifically, the productivity of a production unit can be improved not only by reducing some of its inputs but also simultaneously by increasing some of its outputs, when such input congestion is present. There is thus a need first for identifying the presence of congestion, and then for developing congestion-treatment strategies to enhance productivity by reducing the input wastes and the output shortfalls associated with such congestion. Data envelopment analysis (DEA) has been considered a very effective method in evaluating input congestion. Because the assumption of strong input disposability precludes congestion, it should not be incorporated into the axiomatic modeling of the true technology involving congestion. Given this fact, we first develop a production technology in this contribution by imposing no input disposability assumption. Then we define both weak and strong forms of congestion based on this technology. Although our definitions are made initially for the output-efficient DMUs, they are well extended in the sequel for the output-inefficient DMUs. We also propose in this contribution a method for identifying congestion. The essential tool for devising this method is the technique of finding a maximal element of a non-negative polyhedral set. To our knowledge, our method is the only reliable method for precisely detecting both weak and strong forms of congestion. This method is computationally more efficient than the other congestion-identification methods developed in the literature. This is due to the fact that, unlike the others, our method involves solving a single linear program. Unlike the other methods, the proposed method also deals effectively with the presence of negative data, and with the occurrence of multiple projections for the output-inefficient DMUs. Based on our theoretical results, three computational algorithms are developed for testing the congestion of any finite-size sample of observed DMUs. The superiority of these algorithms over the other congestion-identification methods is demonstrated using four numerical examples, one of which is newly introduced in this contribution.
Mahmood Mehdiloo, Biresh K. Sahoo, Joe Zhu
Chapter 5. Data Envelopment Analysis and Non-parametric Analysis
Abstract
This chapter gives an introduction to Data Envelopment Analysis (DEA), presenting an overview of the basic concepts and models used. Emphasis is made on the non-parametric derivation of the Production Possibility Set (PPS), on the multiplicity of DEA models and on how to handle different types of situations, namely, undesirable outputs, ratio variables, multi-period data, negative data non-discretionary variables, and integer variables.
Gabriel Villa, Sebastián Lozano
Chapter 6. The Measurement of Firms’ Efficiency Using Parametric Techniques
Abstract
In this chapter we summarize the main features of the standard econometric approach to measuring firms’ inefficiency. We provide guidance on the options that are available using the Stochastic Frontier Analysis (SFA), the most popular parametric frontier technique. We start this chapter summarizing the main results of production theory using the concept of distance function. Next, we outline the most popular estimation methods: maximum likelihood, method-of-moments and distribution-free approaches. In the last section we discuss more advance topics and extend the previous models. For instance, we examine how to control for observed environmental variables or endogeneity issues. We also outline several empirical strategies to control for unobserved heterogeneity in panel data settings or using latent class and spatial stochastic frontier models. The last topics examined are dynamic efficiency measurement, production risk and uncertainty, and the decomposition of Malmquist productivity indices.
Luis Orea
Chapter 7. Fair Target Setting for Intermediate Products in Two-Stage Systems with Data Envelopment Analysis
Abstract
In a two-stage system with two divisions connected in series, fairly setting the target outputs for the first stage or equivalently the target inputs for the second stage is critical, in order to ensure that the two stages have incentives to collaborate with each other to achieve the best performance of the whole system. Data envelopment analysis (DEA) as a non-parametric approach for efficiency evaluation of multi-input, multi-output systems has drawn a lot of attention. Recently, many two-stage DEA models were developed for studying the internal structures of two-stage systems. However, there was no work studying the fair setting of the target intermediate products (or intermediate measures) although unreasonable setting will result in unfairness to the two stages because setting higher (fewer) intermediate measures means that the first (second) stage must make more efforts to achieve the overall production plan. In this chapter, a new DEA model taking account of fairness in the setting of the intermediate products is proposed, where the fairness is interpreted based on the Nash bargaining game model, in which the two stages negotiate their target efficiencies in the two-stage system based on their individual efficiencies. This approach is illustrated by an empirical application to insurance companies.
Qingxian An, Haoxun Chen, Beibei Xiong, Jie Wu, Liang Liang
Chapter 8. Fixed Cost and Resource Allocation Considering Technology Heterogeneity in Two-Stage Network Production Systems
Abstract
Many studies have concentrated on fixed cost allocation and resource allocation issues by using data envelopment analysis (DEA). Existing approaches allocate fixed cost and resource primary based on the efficiency maximization principle. However, due to the existing of technology heterogeneity among DMUs, it is impractical for all the DMUs to achieve a common technology level, especially when some DMUs are far from the efficient frontier. In this chapter, under the centralized decision environment, we present a new approach to deal with fixed cost and resource allocation issues for a two-stage production system by considering the factor of technology heterogeneity. Specifically, technology difference is analyzed in the performance evaluation framework firstly. Then, by taking the technology heterogeneity into account, the two-stage DEA-based fixed cost allocation and resource allocation models are proposed. In addition, two illustrated examples are calculated to show the feasibility of the two proposed models. Finally, this chapter is concluded.
Tao Ding, Feng Li, Liang Liang
Chapter 9. Efficiency Assessment of Schools Operating in Heterogeneous Contexts: A Robust Nonparametric Analysis Using PISA 2015
Abstract
The present study proposes an international comparison of education production efficiency using cross-country data on secondary schools from different countries participating in PISA 2015. Given that the context in which schools are operating might be heterogeneous, we need to account for those divergences in the environmental conditions when estimating the efficiency measures of school performance. In this way, each school can be benchmarked with units with similar characteristics regardless of the country they belong to. For this purpose, we use a robust nonparametric approach that allows us to clean the effect of contextual factors previously to the estimation of efficiency measures. Since this approach needs smoothing in the conditional variables in the middle of the sample and not at the frontier (where the number of units is smaller), it seems to be a better option than other nonparametric alternatives previously developed in the literature to deal with the effect of external factors. Likewise, by using this novel approach, we will also be able to explore how those contextual factors might affect both the attainable production set and the distribution of the efficiencies.
Jose Manuel Cordero, Cristina Polo, Rosa Simancas
Chapter 10. A DEA Analysis in Latin American Ports: Measuring the Performance of Guayaquil Contecon Port
Abstract
In this globalized era, the port sector has been a major influence in a country’s economic growth. Ports have become one of the main funnels to enhance competitiveness in emerging markets of Latin America. Therefore, it is relevant to carry out an analysis of their performance. A good approach to measure performance is DEA, a mathematical tool that handles a benchmark analysis by an evaluation of multiple factors that describes the nature of an entity. The research herein aims to evaluate and compare the performance of the Ecuadorian Guayaquil Contecon Port in comparison with 14 major ports in Latin American and the Caribbean by using DEA. As a result of the study, the efficiency scores of the ports are analyzed to propose best practices to improve the performance of Guayaquil Contecon Port.
Emilio J. Morales-Núñez, Xavier R. Seminario-Vergara, Sonia Valeria Avilés-Sacoto, Galo Eduardo Mosquera-Recalde
Chapter 11. Effects of Locus of Control on Bank’s Policy—A Case Study of a Chinese State-Owned Bank
Abstract
This paper investigates how Locus of Control (LOC) will impact the bank’s policies through a case study of a Chinese state-owned bank. At the end of 2008, the investigated bank implemented a personal business-preferred policy. We established two Data Envelopment Analysis (DEA) models to test the impacts of LOC on the implementation of the policy. The results show that internal-controlled branches tend to be more sensitive to the bank’s policy. When it is a positive policy, the internal-controlled branches tend to improve more than the external-controlled branches, while the regression of internal branches is also more significant when it is a negative policy. Location and managers’ personalities are identified as the two direct reasons that cause LOC effects. Several suggestions are also provided in this paper to alleviate the negative effects of LOC.
Cong Xu, Guo-liang Yang, Jian-bo Yang, Yu-wang Chen, Hua-ying Zhu
Chapter 12. A Data Scientific Approach to Measure Hospital Productivity
Abstract
This study is aimed at developing a holistic data analytic approach to measure and improve hospital productivity. It is achieved by proposing a fuzzy logic-based multi-criteria decision-making model so as to enhance business performance. Data Envelopment Analysis is utilized to analyze the productivity and then it is hybridized with the Fuzzy Analytic Hierarchy Process to formulate the decision-making model. The simultaneous hybrid use of these two methods is utilized to compile a ranked list of multiple proxies containing diverse input and output variables which occur in two stages. This hybrid methodology presents uniqueness in that it helps make the most suitable decision with the consideration of the weights determined by the data from the hybrid model.
Babak Daneshvar Rouyendegh (B. Erdebilli), Asil Oztekin, Joseph Ekong, Ali Dag
Chapter 13. Environmental Application of Carbon Abatement Allocation by Data Envelopment Analysis
Abstract
China’s commitment to significantly reducing carbon emissions faces the twin challenges of focusing on costly reduction efforts, whilst preserving the rapid growth that has defined the country’s recent past. However, little work has been able to meaningfully reflect the collaborative way in which provinces are assigned targets on a subnational regional basis. Suggesting a meta-frontier allocation approach by using data envelopment analysis (DEA), this chapter introduces the potential collaboration between heterogeneous industrial units to the modelling framework. Our theoretical work exposits the roles collectives of industrial decision making units may play in optimizing against multiple target functions, doing so whilst recognizing the two objectives of income maximization and pollution abatement cost minimization. Considering the period 2012–2014, we illustrate clearly how China’s three regional collaborations interact with the stated aims of national policy. Developed eastern China may take on greater abatement tasks in the short term, thus freeing central and western China to pursue the economic growth which will then support later abatement. Policymakers are thus given a tool through which an extra layer of implementation can be evaluated between the national allocation and setting targets for regional individual decision making units. China’s case perfectly exemplifies the conflicts which must be accounted for if the most economical and efficient outcomes are to be achieved.
Anyu Yu, Simon Rudkin, Jianxin You
Chapter 14. Pension Funds and Mutual Funds Performance Measurement with a New DEA (MV-DEA) Model Allowing for Missing Variables
Abstract
One of the assumptions in Data Envelopment Analysis (DEA) is that the active work units (Decision Making Units “DMU”) under study are operating under the same “culture”. However, in the real world, managers always want to compare their products/operations with similar entities (competitors), although, with some differences but in the same industry. It happens that there does not exist a model that can appropriately consider some aspects that are different in the DMU’s environments. This research introduces a novel DEA model, namely Mixed Variable DEA (MV-DEA), that provides a methodology where DMUs with some different cultural assumptions are examined relative to each other while retaining their own specific characteristics. The case examined here led us to evaluate private pension funds’ performance by considering the specific characteristics of such funds in comparison with mutual funds. Canadian private pension funds, regulated by the Federal Government of Canada, and Canadian open-ended mutual funds were studied. The results of the new MV-DEA model were compared to traditional DEA models and it was shown that the MV-DEA model provided more realistic results in our study.
Maryam Badrizadeh, Joseph C. Paradi
Chapter 15. Sharpe Portfolio Using a Cross-Efficiency Evaluation
Abstract
The Sharpe ratio is a way to compare the excess returns (over the risk-free asset) of portfolios for each unit of volatility that is generated by a portfolio. In this paper, we introduce a robust Sharpe ratio portfolio under the assumption that the risk-free asset is unknown. We propose a robust portfolio that maximizes the Sharpe ratio when the risk-free asset is unknown, but is within a given interval. To compute the best Sharpe ratio portfolio, all the Sharpe ratios for any risk-free asset are considered and compared by using the so-called cross-efficiency evaluation. An explicit expression of the Cross-Efficiency Sharpe Ratio portfolio is presented when short selling is allowed.
Mercedes Landete, Juan F. Monge, José L. Ruiz, José V. Segura
Metadata
Title
Data Science and Productivity Analytics
Editors
Vincent Charles
Juan Aparicio
Joe Zhu
Copyright Year
2020
Electronic ISBN
978-3-030-43384-0
Print ISBN
978-3-030-43383-3
DOI
https://doi.org/10.1007/978-3-030-43384-0