Skip to main content
main-content
Top

About this book

This book covers several new areas in the growing field of analytics with some innovative applications in different business contexts, and consists of selected presentations at the 6th IIMA International Conference on Advanced Data Analysis, Business Analytics and Intelligence. The book is conceptually divided in seven parts. The first part gives expository briefs on some topics of current academic and practitioner interests, such as data streams, binary prediction and reliability shock models. In the second part, the contributions look at artificial intelligence applications with chapters related to explainable AI, personalized search and recommendation, and customer retention management. The third part deals with credit risk analytics, with chapters on optimization of credit limits and mitigation of agricultural lending risks. In its fourth part, the book explores analytics and data mining in the retail context. In the fifth part, the book presents some applications of analytics to operations management. This part has chapters related to improvement of furnace operations, forecasting food indices and analytics for improving student learning outcomes. The sixth part has contributions related to adaptive designs in clinical trials, stochastic comparisons of systems with heterogeneous components and stacking of models. The seventh and final part contains chapters related to finance and economics topics, such as role of infrastructure and taxation on economic growth of countries and connectedness of markets with heterogenous agents, The different themes ensure that the book would be of great value to practitioners, post-graduate students, research scholars and faculty teaching advanced business analytics courses.

Table of Contents

Frontmatter

Machine Learning for Streaming Data: Overview, Applications and Challenges

Abstract
This chapter gives a brief overview of machine learning for streaming data by establishing the need for special algorithms suited for prediction tasks for data streams, why conventional batch learning methods are not adequate, followed by applications in various business domains.
Shikha Verma

Binary Prediction

Abstract
Binary prediction is one of the most widely used analytical techniques having many applications in multiple domains. In the business context, it is used to predict loan default, discontinuance of insurance policies, customer attrition, fraud detection, etc. Because of its huge importance, a number of methods have been developed to solve this problem. In this article, we discuss the well-known logistic regression predictor and compare its performance with a relatively less widely used predictor—the maximum score predictor—using two real-life unbalanced datasets. The maximum score predictor is observed to perform better than the logistic regression predictor for both these unbalanced datasets, indicating that the maximum score predictor can be a useful addition to the analysts toolkit when dealing with the binary prediction problem.
Arnab Kumar Laha

Reliability Shock Models: A Brief Excursion

Abstract
We attempt to provide a brief introduction to the extensive area of shock model research in reliability theory. Possible connections with application areas such as risk analysis, inventory control and biometry are indicated. Important concepts and tools for proving shock model results such as total positivity and variation diminishing property (VDP) are introduced. Most of the important results concerning nonparametric ageing classes arising from shock models are summarized, and some typical techniques of proof are emphasized. A variety of scenarios with diverse arrival processes such as homogeneous Poisson process, nonhomogeneous Poisson process, stationary and nonstationary pure birth processes are considered. A few interesting results related to cumulative damage models are also discussed.
Murari Mitra, Ruhul Ali Khan

Explainable Artificial Intelligence Model: Analysis of Neural Network Parameters

Abstract
In recent years, artificial neural network is becoming a popular technology to extract the extremely complex pattern in the data across different segments of research areas and industrial applications. Most of the artificial intelligence researchers are now focused to build smart and user-friendly applications which can assist humans to make the appropriate decision in the business. The aim to build these applications is mainly to reduce the human errors and minimize influence of individual perceptions in the decision-making process. There is no doubt that this technology will be able to lead to a world where we can enjoy AI-driven applications for our day-to-day life and making some important decisions more accurately. But what if we want to know the explanation and reason behind the decision of AI system. What if we want to understand the most important factors of the decision-making processes of such applications. Due to the intense complexity of inherent structure of AI algorithm, usually researchers define the artificial neural network as “black box” whereas the traditional statistical learning models are more transparent, interpretable and explainable with respect to data and underlying business hypothesis. In this article, we will present TRAnsparent Neural Network (TRANN) by examining and explaining the network structure (model size) using statistical methods. Our aim is to create a framework to derive the right size and relevant connections of network which can explain the data and address the business queries. In this paper, we will be restricting us to analyse the feed-forward neural network model through nonlinear regression model and analyse the parameter properties guided by statistical distribution, information theoretic criteria and simulation technique.
Sandip Kumar Pal, Amol A. Bhave, Kingshuk Banerjee

Style Scanner—Personalized Visual Search and Recommendations

Abstract
In this paper, we present a visual search and recommendation system that supports typical shopping behaviour. We present a unified convolutional neural network architecture, to learn embeddings, which is a way to capture notion of similarity. We will introduce the concept of embeddings with respect to similarity and show how we try to achieve required embeddings with various loss functions. We demonstrate various model architectures based on availability of data. We also demonstrate a semiautomatic way of creating labelled dataset for training. We will talk about the concept of accuracy with respect to similarity, which is complicated as similarity is subjective. Finally, we present an end-to-end system for deployment.
Abhishek Kushwaha, Saurav Chakravorty, Paulami Das

Artificial Intelligence-Based Cost Reduction for Customer Retention Management in the Indian Life Insurance Industry

Abstract
Customer retention, measured as percentage policies renewed every year (persistency ratio), is one of the most important metrics for any life insurer. Due to several factors, including complexity of life insurance products, gap in understanding the importance of policy renewals and lack of appropriate engagement with the customers, higher lapsation rates for life insurance policies have been observed globally, specifically in India. Typically for a life insurance company, policy renewal premiums drive close to 60–70% revenue and retaining customers for a period of more than 6–7 years is critical to business profitability. Customer retention operations primarily include engaging with customers through telephonic renewal calls or other mediums to pay renewal premiums on time. With close to 70% of total policies present in the premium renewal base, tracking, scheduling, execution of customer retentions calls and campaigns contribute to a major cost head for life insurers. In this paper, the authors present an advanced analytic solution to effectively manage customer retention costs and improve the overall persistency. The paper demonstrates the use of several machine learning and deep learning neural network-based models to classify the customers based on propensity of not paying renewal premiums on time. The study includes a comparative analysis of model performance with the deep learning neural network model showing the highest performance. The propensity scores were used to create a solution driving differentiated retention strategy, matching customer segment with appropriate renewal efforts to reduce customer retention cost and improve persistency.
Sanjay Thawakar, Vibhu Srivastava

Optimization of Initial Credit Limit Using Comprehensive Customer Features

Abstract
With a total of 36.24 million credit cards in operation with a spend of Rs. 41,437 crores in January’18 from a 28.85 million credit cards and usage of Rs. 32,691 crores in January’17, credit card market has shown an incredible growth in India. During the initial introduction of credit cards in the Indian market, the word credit did not go along with the Indian mentality, believing that credit cards would increase their liability and might lead to payment of huge interests, if not cleared on time.
Shreya Piplani, Geetika Bansal

Mitigating Agricultural Lending Risk: An Advanced Analytical Approach

Abstract
As per the Situation Assessment Survey (SAS) for Agricultural Households by NSSO 70th round, in 2012–13, almost 40% of the agricultural households still relied on non-institutional sources for their credit needs, an increase of almost 11% over 1990–91. Moneylenders still form a major part, around 26%, of that non-institutional credit. Even with the rising credit disbursements and loan waivers, we have not been able to improve the situation of our farmers. In FY 2018, banks disbursed only an additional 6.37% to this sector which is the lowest in a decade. Lack of sufficient information about the agricultural finance landscape and mounting NPAs have contributed to banks being reluctant to lend in this sector. In this study, we aim to build a credit risk assessment model for farmers to bridge the gap between them and the formal credit sector. We obtained a robust model that uses NSS data features and gives a better prediction of a farmer going to default than the one built on features currently captured by banks. These additional variables give necessary insights into farmer characteristics that could (1) help banks identify low-risk farmers and expand their lending and (2) help the government identify problem areas where their intervention is necessary to uplift farmers and bring them under the purview of formal credit.
Aditi Singh, Nishtha Jain

Application of Association Rule Mining in a Clothing Retail Store

Abstract
In this paper, an attempt has been made to understand the buying pattern of customer with the help of market basket analysis, which is an important tool in modern retailing industry. Retailing is defined as the timely delivery of goods and services demanded by end customers at prices that are competitive and affordable. Through association rule in data mining, we have tried to understand consumer behavior, brand importance, seasonality effect, buying pattern, product basket from the data. Data mining is the practice of analyzing database to gather and generate new information. Association rule mining is a process of finding rules that govern relation between sets of items. Market basket analysis is a modeling technique based upon the theory that explains the buying relation between certain groups of items. The transactional data is collected from the retail clothing store “Try Us” located in Indore, Madhya Pradesh, from the period November 26, 2017, to September 19, 2018, with the help of Point of Sale (POS) and bar code scanner. “Try Us” is a small retail store with multiple brands and is planning to upgrade store to a multilevel store. Data mining would benefit the overall store performance. Frontline Solver® Analytic Solver Data Mining (XLMiner) is used for simulations.
Akshay Jain, Shrey Jain, Nitin Merh

Improving Blast Furnace Operations Through Advanced Analytics

Abstract
Hot blast is an input to the blast furnace and is instrumental in blast furnace efficiency. The current state of stove operations is not standardized. Many times, operators based on their experience take critical decisions. To standardize the decision-making process in the most optimum way, an analytics research project with an iron and steel manufacturing industry was kick-started. The paper describes the present control system of the hot blast heating process and describes a model complementing the control system. The model is built by using K-means clustering algorithm and principal component analysis to recommend the operators the critical variable set point. The process variables in the plant are continuously changing and thus make impossible for operators to take the most optimum decision accounting all the variables. The model covers all the critical variables whether controllable or non-controllable and is aimed at increasing the heat recovery in stoves and increasing the temperature of the hot blast. The end result of the research would be to reduce the variability and increase the median temperature, eventually reducing the cost of hot metal. The dashboard displays model generated recommendations to the operators and also monitors the compliance by operators on week-by-week basis. Based on different business rules, the model is scheduled to be retuned in specified intervals which takes care of the change in efficiencies of process equipment.
Rishabh Agrawal, R. P. Suresh

Food Index Forecasting

Abstract
Designing efficient and robust algorithms for accurate forecast of price index is one of the most prevalent challenges in the food market business. With the exponential rate of development, evolution of sophisticated algorithms and the availability of fast computing platforms, it has now become possible to effectively and efficiently extract, store, process and analyze food price index data with diversity in its contents. One of the leading food processing companies in the USA approached us to use the price index data for over six food categories and forecast for the upcoming 18 months so that they can get an idea about the upcoming price trends, which will help in making more informed decisions in their business. In this paper, all the data used for analyzing purpose was external (freely available) data. We used the monthly price index data for USA for raw/processed food categories, published by United States Department of Labor for the period January 2010 till May 2018. Different univariate and multivariate time series modeling approaches were used to model the price data, and best-fitted model was chosen for each of the individual categories.
Kalyani Dacha, Ramya Cherukupalli, Abir Sinha

Implementing Learning Analytic Tools in Predicting Students’ Performance in a Business School

Abstract
In recent times, information technology and big data are two buzz words that have impacted all sectors including education. Research in the field of educational data mining and learning analytics is in its nascent stage. Applying analytics in education is the need of the hour, especially in the context of a developing economy like India. It is time for educational institutions to use machine learning tools to enhance teaching–learning experience. This study deploys learning analytics technique using the data of students undergoing a post-graduate management program and attempts to create a system of preventive feedback mechanism for faculty and students. In the first part, logistic regression was used to identify the academic status of foundation courses in the first semester. Six models were developed, and ‘specificity’ scores were used to test the validity of the models. In the second part of the study, the stepwise regression model was used to predict the marks of the student in the capstone course. The results showed that as the student progresses into second semester courses, the tenth and higher secondary board examination scores become irrelevant. Performance in the first semester courses greatly influences the results of the second semester. Deployment of the models developed in this study would go a long way in not only enhancing students’ performance but also more fruitful student–faculty engagement.
R. Sujatha, B. Uma Maheswari

An Optimal Response-Adaptive Design for Multi-treatment Clinical Trials with Circular Responses

Abstract
Circular or directional outcomes are natural responses in many clinical trials (e.g. orthopedics, ophthalmology, sports medicine). Traditionally, clinical trials use equal allocation throughout the trial without utilizing the information contained in the sequentially observed responses in the running trial. Adaptive allocation designs are data-dependent alternatives to assign a greater number of subjects to the better performing treatments using the available data of the running trial in a convenient way. However, adaptive designs for circular responses are rare and, therefore, in the current work, we develop a multi-treatment response-adaptive randomization for circular treatment outcomes considering both the ethics and efficiency requirements. Apart from assessing the proposed allocation design empirically, the applicability of the design in real clinical trials is also enumerated from a practitioner’s viewpoint.
Taranga Mukherjee, Rahul Bhattacharya, Atanu Biswas

Stochastic Comparisons of Systems with Heterogeneous Log-Logistic Components

Abstract
The log-logistic distribution is a resilient family of life distributions that have been applied in an enormous number of fields. In this paper, we study stochastic comparisons for both parallel and series systems having heterogeneous log-logistic distributed components. The comparisons are performed in the sense of stochastic, reversed hazard rate, hazard rate, and likelihood ratio orderings. The consequences of the changes in the scale parameters or the shape parameters on the magnitude of the smallest and largest order statistics are also investigated in the sense of the above-mentioned orderings.
Shyamal Ghosh, Priyanka Majumder, Murari Mitra

Stacking with Dynamic Weights on Base Models

Abstract
Stacking is used to combine models based on different techniques using a second-level model to come up with higher accuracy. The second-level model essentially uses the values predicted by different base-level models as independent variables, while the dependent variable remains the observed one. Though fit of the base-level models differ at various parts of the data, the second-level model uses same set of weights on base-level models on the whole data. We have derived two methods where we replace the second-level model by a linear combination of base model outputs where the weights vary. In our methods, we select a part of the data based on some predefined condition of proximity for classification of a new observation. Then, weights are assigned on different base models considering their accuracy in that part of the data. In one method, all points in the neighbourhood get equal importance, while in the other method, points get importance based on proximity. The algorithms apply same principle on each of the new observations which get their neighbourhoods in different parts of the data; thus, weights vary. The new ensemble methods are tried on different datasets from different fields and found to give better results than conventional stacking.
Biswaroop Mookherjee, Abhishek Halder

The Effect of Infrastructure and Taxation on Economic Growth: Insights from Middle-Income Countries

Abstract
We study the interactions between infrastructure and taxation on per capita economic growth in middle-income countries over 1960–2017. The dynamic panel data model is positioned to estimate this interface at three levels, (1) for upper middle-income countries (UMICs), (2) for lower middle-income countries (LMICs), and (3) for total middle-income countries (MICs), combined UMICs and LMICs. Each level has six different cases, depending upon the use of six infrastructure indicators. The findings show that the effect of infrastructure on per capita economic growth is positive and significant across the three subsets and in all six cases. On the contrary, the effect of taxation on economic growth is negative and the impact varies from UMICs to LMICs.
Rudra P. Pradhan

Response Prediction and Ranking Models for Large-Scale Ecommerce Search

Abstract
User response prediction is the bread and butter of an ecommerce site. Every ecommerce site which is popular is running a response prediction engine behind the scenes to improve user engagement and to minimize the number of hops or queries that a user must fire in order to reach the destination item page which best matches the user’s query.
Seinjuti Chatterjee, Ravi Shankar Mishra, Sagar Raichandani, Prasad Joshi

Connectedness of Markets with Heterogeneous Agents and the Information Cascades

Abstract
Macroeconomic integration of global financial markets is often characterized as complex systems where ever-increasing interactions among a vast number of agents make it difficult for the traditional economic theory to provide a realistic approximation of market dynamics. Economic systems are increasingly interdependent through cross-country networks of credit and investment, trade relations, or supply chains, and highlight the need for an integration of network theory and economic models to reduce the risk of global failure of financial systems. Our aim is to study the cross-holdings of entities in terms of input–output and look at a time-varying feature to examine the changes in the network. We also study the ripple effects caused due to the failure of entities inside the model. Using the World Input-Output Database (WIOD) dataset covering 28 countries from the European Union and 15 other major countries across 56 industries for the period from 2000 to 2014, we present evidence on the nature of interconnectedness that global markets exhibit in terms of their input–output representing the cross-holdings. The interdependence of some markets in a global network is strongly correlated with not only the size of the markets, but also the direction of trades/cross-holdings, and the type of industries that dominate in their input–output data. With growth model estimation, we are able to project the cascades of failures in the network significantly. Our findings employ innovative approaches such as network formation approach and graph theory to explain the interconnectedness of markets across the world, and contribute significantly to the theoretical issues related to market integration and risk spillover.
Avijit Ghosh, Aditya Chourasiya, Lakshay Bansal, Abhijeet Chandra
Additional information

Premium Partner

    Image Credits