A reinforcement learning methodology for a human resource planning problem considering knowledge-based promotion

doi:10.1016/j.simpat.2015.07.004

Simulation Modelling Practice and Theory

Volume 79, December 2017, Pages 87-99

https://doi.org/10.1016/j.simpat.2015.07.004 Get rights and content

Abstract

This paper addresses a combined problem of human resource planning (HRP) and production-inventory control for a high-tech industry, wherein the human resource plays a critical role. The main characteristics of this resource are the levels of “knowledge” and the learning process. The learning occurs during the production process in which a worker can promote to the upper knowledge level. Workers in upper levels have more productivity in the production. The objective is to maximize the expected profit by deciding on the optimal numbers of workers in various knowledge levels to fulfill both production and training requirement. As taking an action affects next periods’ decisions, the main problem is to find the optimal hiring policy of non-skilled workers in long-time horizon. Thus, we develop a reinforcement learning (RL) model to obtain the optimal decision for hiring workers under the demand uncertainty. The proposed interval-based policy of our RL model, in which for each state there are multiple choices, makes it more flexible. We also embed some managerial issues such as layoff and overtime-working hours into the model. To evaluate the proposed methodology, stochastic dynamic programming (SDP) and a conservative method implemented in a real case study are used. We study all these methods in terms of four criteria: average obtained profit, average obtained cost, the number of new-hired workers, and the standard deviation of hiring policies. The numerical results confirm that our developed method end up with satisfactory results compared to two other approaches.

Introduction

The main goal of a production plant or a service supplier in a developed and developing country is to have more shares in the internal or external markets especially where different competitors exist. To reach this valuable goal, a company should efficiently utilize all different resources such as workforces and facilities such that it could meet the required satisfaction of customers. As the level of satisfaction is usually changing with the enhancement of technology, all different operations accomplished need to be based on up-to-date knowledge. This is called knowledge intensive operations. Among different important resources employed in knowledge intensive operations, human resource is more critical because people and their knowledge are the most strategic resource for firms [2].

One of the main issues in human resource planning (HRP) is staffing and recruitment decision-making to provide enough qualified manpower for producing high quality products or giving superior services. Recruitment is usually a mid-term or even long-term decision which can really affect the near future of the company and its success. Furthermore, human resources, as a strategic and valuable asset, possess the knowledge and skills which are substantially necessary to move a company toward its predefined goals. In other words, one of the important aspects of HRP is to determine the required number of workers in different knowledge levels (e.g., new-hired, semi-skilled, and skilled workers) that should be utilized in various parts of the production process in a company. This is in fact a way of improving the utilization of knowledge resources toward a better efficiency.

There are few quantitative approaches employed to cope with staffing problems for knowledge-intensive operations. One of the pioneering works in the area of human resources planning in a knowledge-based situation has been proposed by Bordoloi and Matsuo [7]. They proposed a model obtaining the number of different needed knowledge level. They also embedded employee’s learning and turnover rate into their optimization model to find the better recruitment decisions where demand is non-deterministic. The learning occurs during the production process in which a worker can be transferred from a lower knowledge level to the upper one (e.g., from the first level to the second one) after some periods. Furthermore, turnover in a company is defined as the rate of losing its workers in each knowledge level (semi-skilled or skilled level) at the end of each period [20]. When a company loses skilled workers in the upper levels, it cannot be directly compensated. This means that the company is only able to do the demand satisfaction by recruiting workers in the first level (new-hired workers). They used the chance-constraint method to tackle the high uncertainty of demand and the high volatility of knowledge workers in the last two levels. Their method will fail if we want to address the production-inventory control problem. This also ends up with the static hiring policy (i.e., the hiring rate is constant for all periods in the real-time decision-making process) which is very conservative (i.e., the policy is obtained for a pessimistic situation). Furthermore, the layoff has not been considered in their model.

Given this fact that the demand is stochastic and unknown at the time of decision making, there is a possibility of not satisfying demand (we called this slack or shortfall hereafter) which is assumed to be lost sale in our paper. There is also another choice to construct a physical buffer to store the remaining goods for a situation in which the demand is more than the production level so that the extra demand can be met using the stored stock. Of course, it might be possible to compensate the stock-out using an overtime working shift with existing workers. By considering these managerial issues (i.e., overtime working hours, slack/shortfall, surplus, and layoff to the mathematical optimization model), the planning model based on the knowledge-intensive workers would be more compatible to what happens in reality and the final hiring policy would be more useful for managers and beneficial for the respective company.

To address all the aforementioned issues, this paper contributes three important goals. First, this paper proposes a new optimization model in which the inventory level is also taken into consideration. This consideration makes our model more compatible to the reality, so it would lead to more proper decisions. Second, in order to efficiently solve this model, we develop a reinforcement learning (RL) method. Furthermore, to have a more applicable decision policy we achieve optimal interval decisions instead of single ones for every state using a modified version of the value iteration technique as a well-known approach in stochastic dynamic programming (SDP). This makes the optimization model more flexible as it gives multi choices to the decision-maker. It is worth mentioning that all the respective information about the demand are used to find the optimal hiring policy while in the chance-constraints approach (i.e., the basic model proposed by Bordoloi and Matsuo [7]) only the mean and the standard deviation of the stochastic demand are used. We will refer to their proposed model in more details in the rest of the paper and compare the results of our two models and their model using the data obtained from a semiconductor equipment manufacturing company.

This paper is organized as follows: Section 2 provides a review on the related literature and introduces some papers related to the subject topic. Section 3 includes basic definitions of SDP and RL. In Section 4, we describe the production-inventory control problem and our proposed methods. Sections 5 Numerical results, 6 Conclusion and future work present numerical results and conclusion, respectively.

Section snippets

Related work

Workforce planning determines the required level of workers by which the strategic goals of an organization/company could be achieved. Bulla and Scott [9] defined it as a process in which the required level for the human resource in an organization is properly identified and efficient plans for satisfying those requirements are designed. Khoong [27] specified manpower planning as a core of HRP, which is supported by other aspects of HRP.

Different mathematical approaches have been used in HRP

Stochastic dynamic programming and reinforcement learning

Stochastic dynamic programming as a well-known optimization methodology which can cope with uncertain situation breaks down complex problems (e.g., non-linear, non-convex, and non-continues) into easier sub-problems [6]. However, it suffers from double curses of dimensionality and modeling for large-scale applications. Different techniques in SDP have been developed; however, value iteration (VI) approach is widely used in varied practical problems because it can approximate the true value

Optimization model

Bordoloi and Matsuo [7] addressed a linear program to model an assembly line for which the operations in its front-end stage, called stage A, should be completed prior to operations in the back-end stage, called stage B. Workers in the first knowledge level (new-hired) and in the second knowledge level (semi-experienced) perform the operations in stage A and B, respectively. Workers in the third knowledge level (fully experienced) are assigned to the production stages for training lower-level

Numerical results

To evaluate our proposed methods, we used a unique case study that is briefly explained in the next subsection.

Conclusion and future work

This paper focuses on a combined problem of human resource planning (HRP) and production-inventory control in knowledge-intensive industry. The main characteristics of human resource in this industry are levels of “knowledge” and learning process. Thus, the objective is to maximize the expected profit by finding the optimal numbers of workers in various knowledge levels to fulfill both production and training requirement. When a company loses skilled workers in the upper levels, it cannot be

References (34)

A.N. Avramidis et al.
Optimizing daily agent scheduling in a multi-skill call centre
Eur. J. Oper. Res.
(2010)
D. Barrera et al.
A network-based approach to the multi-activity combined timetabling and crew scheduling problem
Comput. Ind. Eng.
(2012)
S.K. Bordoloi et al.
Human resource planning in knowledge-intensive operations: a model for learning with stochastic turnover
Eur. J. Oper. Res.
(2001)
N. Celik et al.
Simulation-based workforce assignment in a multi-organizational social network for alliance-based software development
Simul. Modell. Pract. Theory
(2011)
Y.J. Chen et al.
empirical knowledge management framework for professional virtual community in knowledge-intensive service industries
Expert Syst. Appl.
(2012)
C.F.F. Costa Filho et al.
Using constraint satisfaction problem approach to solve human resource allocation problems in cooperative health services
Expert Syst. Appl.
(2012)
J.S. Edwards et al.
Using a simulation model for knowledge elicitation and knowledge management
Simul. Modell. Pract. Theory
(2004)
K. Ertogral et al.
Developing staff schedules for a bilingual telecommunication call center with flexible workers
Comput. Ind. Eng.
(2008)
C.Y. Fan et al.
Using hybrid data mining and machine learning clustering analysis to predict the turnover rate for technology professionals
Expert Syst. Appl.
(2012)
E. Fragnière et al.
Operations risk management by optimally planning the qualified workforce capacity
Eur. J. Oper. Res.
(2010)

E.J. Pinker et al.

Optimizing the use of contingent labour when demand is uncertain

Eur. J. Oper. Res.

(2003)

S.J. Sadjadi et al.

A new nonlinear stochastic staff scheduling model

Scientia Iranica E

(2011)

H.S. Ahn et al.

Staffing decisions for heterogeneous workers, with turnover

Math. Meth. Oper. Res.

(2005)

M. Armstrong, A Handbook of Human Resource Management Practice, tenth ed., London, UK,...

D.J. Bartholomew et al.

Statistical Techniques for Manpower Planning

second ed.

(1991)

R. Bellman

Dynamic Programming

(1957)

S.K. Bordoloi

A control rule for recruitment planning in engineering consultancy

J. Prod. Anal.

(2006)

Cited by (14)

A decision support tool for e-waste recycling operations using the hen-and-chicks bio-inspired optimization metaheuristic
2023, Decision Analytics Journal
E-waste from end-of-life electrical and electronic devices is one of the fastest growing waste streams from households and businesses. E-waste recycling yields environmental sustainability and economic benefits. Due to continuous changes in e-waste types and compositions, recycling businesses face challenges to optimize their operational configuration to achieve better economic and environmental performance. To help e-waste recyclers mitigate this problem, we have developed a modular decision support tool called the Comprehensive Manufacturing Assessment Tool (CMAT) that can simulate both e-waste recycling operations and economics. This tool can give valuable insights regarding the profitability of the entire operation and different e-waste types. In addition, a new bio-inspired metaheuristic optimization algorithm, hen-and-chicks optimization (HACO), was developed to assign manpower to different workstations to maximize operational efficiencies. According to the results of our case study, laptops, desktops, and computer peripherals are the three electronic waste products that produce the most profit. Our examination of the sensitivity of material prices shows that the price of steel has the most significant influence on total profit, because it is the most widely used material in the majority of electronic devices. We have released the decision support tool as open-source software under a general public license. It could be customized for other recycling industries beyond e-waste to achieve business sustainability by making their operations more efficient.
Dynamic assignment of a multi-skilled workforce in job shops: An approximate dynamic programming approach
2023, European Journal of Operational Research
Citation Excerpt :
Sotskov, Gholami, & Werner (2013) proposes a conflict resolution RL algorithm to define dispatching rules with a two-phase algorithm (learning and exploration), while Zhou & jun Yang (2019) study the machine sequence definition with a two-part genetic algorithm with simulation based fit testing, and Kuhnle, Schfer, Stricker, & Lanza (2019) show general guidelines to implement RL algorithms in the solution of JSSP with a special focus on Artificial Neural Networks. Furthermore, one of the most common methods among RL solution approaches is the application of the learning algorithm using Q-Learning, as a way to approximate the value functions that simultaneously relate the state and decision (see Karimi-Majd, Mahootchi, & Zakery, 2017; Shahrabi, Adibi, & Mahootchi, 2017; Shiue, Lee, & Su, 2018 and Aissani, Beldjilali, & Trentesaux, 2009), which can be specifically applied to manufacturing systems with multi-machine and multi-technician settings (Qu, Wang, Govil, & Leckie, 2016). These DP and RL papers consider the scheduling problem directly or indirectly, but they have specific differences when compared to our proposed approach.
We propose an approximate algorithm to dynamically assign a multi-skilled workforce to the stations of a job shop, with demand uncertainty and variability in the availability of the resources, to maximize productivity.
Our proposed model is inspired by automotive glass manufacturing, where maximizing the surface area of manufactured safety glass during a given time frame is the key performance measure. We first develop the model of a traditional job shop with a set of stations, each with a particular number of machines, with distinct production performance levels, according to their utilization stage. Each product type needs to be processed on a subset of these stations according to a predefined sequence. Customers place their orders independently over time, specifying the units required of each product type. The inter-arrival of orders (demand) and processing times are assumed to be stochastic. We also suppose that the technicians have varied skill sets, according to which they can only work at a certain subgroup of stations, and variable availability depending on sick leave, vacations, etc. Hence, in order to maximize the predefined productivity index, the optimal assignment of technicians to the stations based on their skill sets and availability during each shift becomes a complex decision-making process.
Given the stochastic and dynamic nature of this problem, we model the setting as a Markov Decision Process (MDP). Given its size, we propose to solve it using Approximate Dynamic Programming (ADP). We address the exponential growth of the action space by using a hill-climbing algorithm for action selection. To show the performance and effectiveness of the proposed algorithm, we use real company data and compare the results of the algorithm with the current policy in use, as well as other proposed policies. Applying our proposed method resulted in an average improvement of $15 %$ in productivity compared to the best performing benchmark policy.
A multi-tier architecture for data analytics in smart metering systems
2020, Simulation Modelling Practice and Theory
Citation Excerpt :
The learners learn by trial and error. RL is a set formed by agents, environments, actions, and rewards [11]. The RL implementation of this work is based on a Markov Decision Process (MDP).
With the proliferation of smart meters in smart grids, new challenges have emerged in the energy sector and applications are continuously developed, mainly concerning data analytics to address those challenges. Traditionally, data analytics in smart grid systems is performed in server-side tier; however, it is necessary to process data analytics close to the smart meter to achieve better performance. In order to process data effectively, it is also necessary to implement methodologies to facilitate the integration of data analysis processes in the Advanced Metering Infrastructure (AMI). This paper presents a novel architecture for data analytics in Smart Metering Systems based on an edge-fog-cloud computing architecture that permits different types of data analytics in a multi-tier context. The proposed architecture has the capability of learning and adapting to different contexts in smart metering systems using a reinforcement learning approach. The architecture was tested with three different analytic applications: forecasting energy consumption, prediction of power quality and prediction of energy theft. The results indicate that the methodology can be feasible solution for direct implementation in Smart Metering Systems.
Evaluating deep models for absenteeism prediction of public security agents
2020, Applied Soft Computing Journal
Citation Excerpt :
Although not aimed at investigating the absenteeism phenomenon, such a study provided evidence that it is feasible to predict at least some kinds of workers behavior by analyzing their characteristics. Karimi-Majd et al. [9] developed a reinforcement learning model to maximize the expected profit for the high-tech industry. By taking into account that during the production process a worker can promote to the upper knowledge level and consequently increase its productivity, the proposed model decides the optimal numbers of workers in various knowledge levels to fulfill both production and training requirements.
Absenteeism is a complex phenomenon characterized by the physical absence of the individual, usually at his workplace. Such absences generally lead to innumerable personal, social, and economic losses, particularly in public security institutions, where incidence is higher than the one verified in other occupational categories. Identifying preponderant absenteeism factors and allowing preventive actions to be carried out effectively may be beneficial to these institutions and their agents. Such knowledge could be acquired hypothetically by exploiting large human resources data sets. In this paper, we investigate the potential of machine learning classifiers to identify security workers prone to long-term absenteeism. Such predictors shall make decisions based on the professional history of each agent, which is extracted from databases of public security institutions. In our study, we performed experiments on a database comprised of 6 years of professional data from workers of the Military Police of Alagoas, Brazil. We evaluated deep models, including variations of Multilayer Perceptrons (MLP), Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM), and compared with baseline Support-Vector Machines (SVM) classifiers. We show results revealing that the best architectures achieve up to 78% of accuracy. Also, experiments indicated that the use of data accumulated over several years improves the accuracy of the prediction of absenteeism. Finally, we conclude that such results encourage the usage of deep learning techniques to predict absenteeism and support the implementation of effective prevention measures in these institutions.
Stakeholders' expressions of tech layoffs: A text mining analysis on the "Balance of Arguments"
2023, HR Analytics in an Era of Rapid Automation
Optimization of enterprise human resource management system by using information search and machine learning
2023, Soft Computing

View all citing articles on Scopus

¹: Tel.: +98 21 64545387.

View full text

A reinforcement learning methodology for a human resource planning problem considering knowledge-based promotion

Abstract

Introduction

Section snippets

Related work

Stochastic dynamic programming and reinforcement learning

Optimization model

Numerical results

Conclusion and future work

Eur. J. Oper. Res.

Comput. Ind. Eng.

Eur. J. Oper. Res.

Simul. Modell. Pract. Theory

Expert Syst. Appl.

Expert Syst. Appl.

Simul. Modell. Pract. Theory

Comput. Ind. Eng.

Expert Syst. Appl.

Eur. J. Oper. Res.

Eur. J. Oper. Res.

Scientia Iranica E

Staffing decisions for heterogeneous workers, with turnover

Math. Meth. Oper. Res.

Statistical Techniques for Manpower Planning

second ed.

Dynamic Programming

A control rule for recruitment planning in engineering consultancy

J. Prod. Anal.